|An Efficient Hardware Architecture for Sparse Convolution using Linear Feedback Shift Registers
|Qasaimeh, M., J. Zambreno, and P. Jones
|Proceedings of the International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Deep convolutional neural networks (CNNs) have shown remarkable success in many computer vision tasks. However, their intensive storage, bandwidth and computational requirements limit their deployment to embedded platforms. Although several research efforts have shown that pruning redundant weights could significantly reduce storage and computations, working with sparse weights remains challenging. The irregular computation of sparse weights and the overhead of managing their representation limit the efficiency of the underlaying hardware. To address these issues, we propose a hardware-friendly pruning algorithm that generates structured sparse weights. In this algorithm, locations of non-zero weights are derived on-chip in real-time using Linear Feedback Shift Registers (LFSRs) to eliminate the overhead of managing sparse weight representations. In this paper, we also propose a hardware inference engine for sparse convolution on FPGAs. It uses LFSRs to localize non-zero weights within weights tensors and avoids copying sparse weights indices by generating them on-chip. Experimental results show that the proposed pruning method can reduce the size of VGG16, ResNet50, and InceptionV3 models by 80%, 76% and 65% with less than 2% accuracy loss. Experiments also demonstrate that our accelerator can achieve 456-534 effective GOP/s for the modern CNNs on Xilinx ZCU102, which provides a 1.2-2.7× speedup over previous sparse CNN accelerators on FPGAs.