期刊名称:IEEE Transactions on Emerging Topics in Computing
印刷版ISSN:2168-6750
出版年度:2017
卷号:5
期号:4
页码:551-562
DOI:10.1109/TETC.2016.2520888
出版社:IEEE Publishing
摘要:Object detection applications often require the algorithms to execute on embedded processing platforms, such as multiprocessor SoCs. One way these algorithms can search input images for objects-of-interest is by consulting a detection library that contains a list of features describing the objects. The processing of large volumes of image data and consultation with a library can decrease the performance of processing platforms, as contention for cache-able resources leads to varied data locality and reuse: software-based techniques have been investigated in the literature with varied success. This paper addresses this issue head-on through a novel hardware accelerator designed to overcome the disadvantages of shared resources contention while optimizing on-chip memory consumption. Detection libraries are compressed and stored onchip within the accelerator that decompresses the data and writes it to dedicated dual-port memories ensuring optimal library data locality and reuse for all processors. By allowing the accelerator to manipulate library data, application performance can be improved by reducing the computation carried out by processors. Our evaluation revealed that by eliminating contention within caches, the application performance was drastically improved without over-consuming on-chip resources or power.