Optimizing dense linear algebra algorithms on heterogeneous machines

Jorge Barbosa


This paper addresses the execution of inherently sequential linear algebra algorithms namely LU factorization, tridiagonal reduction and the symmetric QR factorization algorithm used for eigenvector computation, which are significant building blocks for applications in our target image processing and analysis domain. These algorithms present additional difficulties to optimize the processing time due to the fact that the computational load for data matrix columns increases with their index, requiring a fine tuned load assignment and distribution. We present an efficient methodology to determine the optimal number of processors to be used in a computation, as well as a new static load distribution strategy that achieves better results than other algorithms developed for the same purpose.



  • There are currently no refbacks.