摘要:Convolution filtering applications range from image recognition and video surveillance. Two observations drive the design of a new buffering architecture for convolution filters. First, the convolutional operations are inherently local; hence every pixel of the output feature maps is calculated by the neighboring pixels of the input feature maps. Even though the operation is simple, the convolution filtering is both computation-intensive and memory-intensive. For real-time applications, large amounts of on-chip memories are required to support massively parallel processing architectures. Second, to avoid access to external memories directly, the data that are already stored in on-chip memories should be used as many times as possible. Based on these two observations, we show that for a given throughput rate and off-chip memory bandwidth, a rotation-based data buffering architecture provide the optimum area-utilization results for a particular design point, which are commonly used applications in recognition area.