Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system

T. Hauser, A. Dasu, A. Sudarsanam, S. Young

    Research output: Contribution to journalArticlepeer-review

    12 Scopus citations

    Abstract

    Lower/Upper triangular (LU) factorization plays an important role in scientific and high performance computing. This paper presents an implementation of the LU decomposition algorithm for double precision complex numbers on a star topology based multi-FPGA platform. The out of core implementation moves data through multiple levels of a hierarchical memory system (hard disk, DDR SDRAMs and FPGA block RAMS) using completely pipelined data paths in all steps of the algorithm. Detailed performance numbers for all phases of the algorithm are presented and compared to a highly optimized implementation for a low power microprocessor based system. We also compare the performance/Watt for the FPGA and the microprocessor system. Finally, recommendations will be given on how improvements of the FPGA design would increase the performance of the double precision complex LU factorization on the FPGA based system.

    Original languageEnglish
    Pages (from-to)373-385
    Number of pages13
    JournalScalable Computing
    Volume8
    Issue number4
    StatePublished - 2007

    Keywords

    • Benchmarking
    • LU factorization
    • Multi-FPGA system

    Fingerprint

    Dive into the research topics of 'Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system'. Together they form a unique fingerprint.

    Cite this