Design an SOC in Vivado to receive a video feed in YUV format (1080p, 30 FPS), write to memory, read it, apply two 7x7 fixed point kernels using a custom cache and a custom SIMD processor, write both outputs to memory and stream them out as two YUV video streams.
I was the group leader. In addition, I designed the overall architecture of SIMD processor, defined the structure of the set-associative cache and the way data is written to memory. Then I designed the inner workings of the cores and coded the controller of SIMD. I received an A+ grade for the module.
Other members refined my design of cache and cores, implemented, debugged and integrated them, built memory read and write pipelines.
The cache was loosely based on the set-associative cache. 8 lines of cache stored the last-accessed 8 rows of the 1080p image.
SIMD Core Design
8 shift registers, each of depth 8 stored an 8x8 neighborhood of image and processed them through an 8x8 bank of processing units to perform convolution (if SIMD instructions are given so). Two such cores perform two convolutions in parallel.
Shortcomings and Improvements
I designed the architecture in a few hours, without putting much thought into optimizing it for resources and power. Since the deadline was short, as a team leader and architect, I decided to design something that is straightforward and easily understandable by other members, so the tasks can be finished and integration testing can be started as soon as possible.
The SOC design experience I got from this helped tremendously in designing the highly optimized convolution accelerator for my next project: YOLOv2 on FPGA.