High performance reconfigurable computing has become a crucial tool for designing application-specific processors/cores in a number of areas. Improvements in reconfigurable devices such as FPGAs (field programmable gate arrays) and their inclusion in current computing products have created new opportunities, as well as new challenges for high performance computing (HPC).
An FPGA is an integrated circuit that contains tens of thousands of building blocks, known as configuration logic blocks (CLBs) connected by programmable interconnections. FPGAs tend to be an excellent choice when dealing with algorithms that can benefit from the high parallelism offered by the FPGA fine-grained architecture. In particular, one of the most valuable features of FPGAs is their reconfigurability, i.e., the fact that they can be used for different purposes at different stages of a computation and they can be, at least partially, reprogrammed at run-time.
HPC applications with reconfigurable computing (RC) have the potential to deliver enormous performance, thus they are especially attractive when the main design goal is to obtain high performance at a reasonable cost. Furthermore they are suitable for use in embedded systems. This is not the case for other alternatives such as grid-computing.
The problem of accelerating HPC applications with RC can be compared to that of porting uniprocessor applications to massively parallel processors (MPPs). However, MPPs are better understood to most software developers than reconfigurable devices. Moreover, tools for porting codes to reconfigurable devices are not yet as well developed as for porting sequential code to parallel code. Nevertheless, in recent years considerable progress has been in developing HPC applications with RC in such areas as signal processing, robotics, graphics, cryptography, bioinformatics, evolvable and biologically-inspired hardware, network processors, real-time systems, rapid ASIC prototyping, interactive multimedia, machine vision, computer graphics, robotics, and embedded applications, to name a few. This special issue contains a sampling of the progress made in some of these areas.
In the first paper, Lieu My Chuong, Lam Siew Kei, and Thambillai Srikanthan propose a framework that can rapidly and accurately estimate the hardware area- time measures for implementing C-applications on FPGAs. Their method is able to predict the delays with average accuracy of the 97%. The estimation computation of this approach can be done in the order of milliseconds. This is an essential step to facilitate rapid design exploration for FGPA implementations and significantly helps in the implementation of FPGA systems using high-level description languages.
In the second paper, T. Hausert, A. Dsu, A. Sudarsanam, and S. Young design an FPGA based system to solve linear systems for scientific applications. They analyze the FPGA performance per wait (MFLOPS/W) and compare the performance with microprocessor-based approaches. Finally as the main outcome of this analysis, they propose helpful recommendations for speeding up FPGA computations with low power consumption.
In the third paper, S. Mota, E. Ros, and F. de Toro describe a computing architecture that finely pipelines all the processing stages of a space variant mapping strategy to reduce the distortion effect on a motion-detection based vision system. As an example, they describe the results of correcting perspective distortion in a monitoring system for vehicle overtaking processes.
In the fourth paper, Sadaf R. Alam, Pratul K. Agarwal, Melissa C. Smith, and Jeffrey S. Vetter describe an FPGA acceleration of molecular dynamics using the Particle-Mesh Ewald method. Their results show that time-to-solution of medium scale biological system simulations are reduced by a factor of 3X and they predict that future FPGA devices will reduce the time-to-solution by a factor greater than 15X for large scale biological systems.
In the fifth paper, Nazar A. Saqib presents a space complexity analysis of two Karatsuba-Ofman multiplier variants. He studies the number of FPGA hardware resources employed by those two multipliers as a function of the operands' bitlength. He also provides a comparison table against the school (classical) multiplier method, where he shows that the Karatsuba-Ofman method is much more economical than the classical method for operand bitlengths greater than thirty two bits. The complexity analysis presented in this paper is validated experimentally by implementing the multiplier designs on FPGA devices.
The help of the follow reviewers, who ensured the quality of this issue, is gratefully acknowledged:
- Mancia Anguita, University of Granada, Spain,
- Beatriz Aparico, Andalucia Astrophysics Institute, CSIC, Spain,
- AbdSamad Benkrid, Queen's University, Northern Ireland,
- Eunjung Cho, Georgia State University, USA,
- Nareli Cruz-Cortés,CIC-IPN, Mexico,
- Sergio Cuenca, University of Alicante, Spain,
- Jean-Pierre Deschamps, University Rey Juan Carlos, Spain,
- Edgar Ferrer, University of Puerto Rico at Mayaguez,
- Luis Gerardo de la Fraga, CINVESTAV-IPN, Mexico,
- Antonio Garcia, University of Granada, Spain,
- Javier Garrigos, University of Cartagena, Spain,
- Miguel Angel León-Chávez, BUAP, Mexico,
- Adriano de Luca-Pennacchia CINVESTAV-IPN, Mexico,
- Antonio Martinez, University of Alicante, Spain,
- Christian Morillas, University of Granada, Spain,
- Daniel Ortiz-Arroyo, Aalborg University, Denmark.
University of Puerto Rico at Mayaguez.
University of Granada, Spain.
Center for Research and Advanced Study,
National Polytechnical Institute, Mexico.