|Title||A Configurable Architecture for Sparse LU Decomposition on Matrices with Arbitrary Patterns|
|Publication Type||Journal Articles|
|Authors||X. Wang, P. Jones and J. Zambreno|
|Journal||ACM Computer Architecture News (CAN)|
Sparse LU decomposition has been widely used to solve sparse linear systems of equations found in many scientific and engineering applications, such as circuit simulation, power system modeling and computer vision. However, it is considered a computationally expensive factorization tool. While parallel implementations have been explored to accelerate sparse LU decomposition, irregular sparsity patterns often limit their performance gains. Prior FPGA-based accelerators have been customized to domain-specific sparsity patterns of pre-ordered symmetric matrices. In this paper, we present an efficient architecture for sparse LU decomposition that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. The control structure of our architecture parallelizes computation and pivoting operations. Also, on-chip resource utilization is configured based on properties of the matrices being processed. Our experimental results show a 1.6 to 14x speedup over an optimized software implementation for benchmarks containing a wide range of sparsity patterns.