During the last five years, the interest of the scientific computing community towards accelerating devices and, specifically, Graphical Processing Units (GPUs) has been rapidly growing. The reason for this interest lies in the massive computational power delivered by these devices which are originally meant and designed to perform image processing and rendering operations. Much research has been devoted to porting numerical codes on such devices which significantly contributed to the development of production-quality scientific libraries as well as of General Purpose programming models for GPUs (GPGPU). The design of GPU cards and other accelerator devices like intel Xeon-Phi (MIC), at the same time, quickly steered towards the needs of the scientific computing community; as a result modern accelerators can execute double-precision floating point arithmetical operations at rates that outperform general purpose CPU chips by a typical factor of 8. Several software libraries for dense linear algebra have been produced; the MAGMA project at University of Tennessee Knoxville (co-authored by one of this project’s partner institutions) can be cited among the most successful ones. The most common dense linear algorithms are extremely rich in computation and exhibit a very regular pattern of access to data which makes them extremely good candidates for execution on accelerators. The most common sparse linear algebra algorithms are the methods for the solution of linear systems which, contrary to the dense linear algebra variants, usually have irregular, indirect memory access patterns that adversely interact with typical accelerator throughput optimizations. These solution methods can be roughly classified in two families:
This project aims at studying and designing algorithms and parallel programming models for implementing direct methods for the solution of sparse linear systems on emerging computing platforms equipped with accelerators. The ultimate aim of this project is to achieve the implementation of a software package providing a solver based on sparse, direct methods. Several attempts have been made to port these methods on such architectures; the proposed approaches are mostly based on a simple offloading of some computational tasks (the coarsest grained ones) to the accelerators and requires a fine hand-tuning of the code and accurate performance modeling to achieve efficiency. This project proposes an innovative approach which relies on the efficiency and portability of runtime systems, such as the StarPU tool developed by the runtime team (Bordeaux). Although the SOLHAR project will focus on heterogeneous computers equipped with GPUs due to their wide availability and affordable cost, the research accomplished on algorithms, methods and programming models will be readily applicable to other accelerator devices such as Intel MIC boards or Cell processors. The development of a production-quality, sparse direct solver requires a considerable research effort along three distinct axis:
Given the wide availability of computing platforms equipped with accelerators and the numerical robustness of direct solution methods for sparse linear systems, it is reasonable to expect that the outcome of this project will have a considerable impact on both academic and industrial scientific computing. This project will moreover provide a substantial contribution to the computational science and high-performance computing communities, as it will deliver an unprecedented example of a complex numerical code whose parallelization completely relies on runtime scheduling systems and which is, therefore, extremely portable, maintainable and evolvable towards future computing architectures. Finally, research on preconditioning methods for iterative solvers as well as on hybrid, domain-decomposition solvers for heterogeneous computing platforms will naturally benefit from the methods developed in this project.