User Tools

Site Tools



Kickoff meeting, November 15th, 2013 @ LaBRI (Bordeaux)

9h-9h30 A. Guermouche Présentation du projet
9h30-10h S. Thibault Travaux récents autour de StarPU
10h-10h30 A. Buttari et F. Lopez Travaux récents autour du solveur qr_mumps au dessus de StarPU
10h30-11h Pause
11h-11h30 B. Lizé Présentation application EADS-IW
11h30-12h O. Beaumont Algorithmes d'ordonnancement dynamique
12h-12h30 M. Faverge Travaux récents autour du solveur Pastix au dessus de StarPU&Parsec
12h30-14h Pause déjeuner
14h-14h30 D. Goudin Présentation application CEA-CESTA
14h30-15h L. Marchal Algorithmes d'ordonnancement avec contrainte mémoire

Telecon on Solvers and RunTime systems March 12th, 2014

Task granularity issues are discussed: the needs from the solvers' side are expressed and the developments on the run time systems' side are briefly summarized.

Focused meeting on the scheduling needs, April 10--11, 2014 @ Lyon

Program for Thursday April 10:

9h-9h30 Welcome & coffee
9h30-10h30 L. Marchal Scheduling task-graphs under memory constraints:a short state of the art
10h30-11h30 F. Lopez Scheduling and ordering issues in Sequential Task Flow parallel multifrontal methods
13h00-13h35 B. Simon Scheduling malleable task graphs with memory constraints
13h35-14h10 A. Hugo Outils pour l'ordonnancement dans StarPu
14h10-15h E. Agullo Exploiting clusters of hybrid nodes with a sequential task-based programming paradigm
15h00-15h30 B. Uçar Acyclic orientation and scheduling
15h30-17h30 A. Guermouche Concluding the talks and lauching discussions

Friday, April 11: Discussion and small work groups.

This meeting has allowed to give to all the participants a wide overview of all scheduling issues that are currently considered, either at a theoretical level or at the runtime/solver level. The first outcome was to give a better understanding of the memory behavior of the various solvers which allows deriving a more accurate model of this memory behavior. Then, among the numerous scheduling issues that deserve to be studied, we have identify two directions on which we will first concentrate:

  • The first research direction concerns the minimization of the peak memory of an application represented as a task tree. This problem has been studied both from the theoretical side and practically in QR MUMPS. Both approaches have studied how to guarantee that a parallel tree traversal will not use more than the available memory. This work will be continued by trying to implement the strategies studied in the theoretical framework such as Deepest First (Critical Path), and by the search for memory-guaranteed strategies that do not rely on strong assumptions on the task tree (such as the reduction property) and that follow an activation pattern, as implemented in QR MUMPS. Future research directions include modeling computations as malleable tasks.
  • The second research thrust consists in the optimization of the processing of a task tree made of malleable tasks on a hybrid platform, made of two (or more) resource types (CPU, GPU, Xeon Phi,…). The questions to be answered concern both the granularity of the tasks (the optimal task granularity depends on the resource type), and how to split the tree on the different resource types.

Plenary meeting, June 4th, 2014 @ Toulouse

9:30-10:00 L. Marchal Recent scheduling results for malleable task trees and memory constraints
10:00-10:30 S. Thibault Controlling memory consumption of dynamic execution - a runtime-application collaboration
10:30-11:00 break
11:00-11:30 A. Hugo A runtime approach to dynamic resource allocation for sparse direct solvers
11:30-12:00 E. Agullo Exploiting clusters of hybrid nodes with a sequential task-based programming paradigm
12:00-14:00 Lunch
14:00-14:30 M. Faverge 3D Cartesian Transport Sweep for Massively Parallel Architectures on top of PaRSEC
14:30-15:00 C. Augonnet Besoins liés aux solveurs directs creux dans les codes de furtivité du CEA CESTA
15:00-15:30 break
15:30-16:00 A. Buttari Progress report on qr_mumps
16:00-16:30 A. Guermouche Solhar: autour des performances des ordonnanceurs dynamiques

Plenary meeting, November 28th, 2014 @ Bordeaux

Plenary meeting, June 12th, 2015 @ Lyon

9:00-9:30 Luka Stanisic starPU/SimGRID: How does it Really Work?
9:30-10:00 Terry Cojean Implementation and evaluation of moldable tasks using StarPU
10:00-10:30 break & coffee
10:30-11:00 Samuel Thibault StarPU recent advances
11:00-11:30 Mathieu Faverge Hierarchical DAG Scheduling for Hybrid Distributed Systems
11:30-13h00 lunch break
13:30-14:00 Florent Lopez Task-based multifrontal QR solver for GPU-accelerated multicore architectures
14:00-14:30 Florent Pruvost Towards a solver software stack on top of runtime systems
14:30-15:00 Grégoire Pichon Blocking strategy optimizations for sparse direct linear solver on heterogeneous architectures
15:00-15:30 break & coffee
15:30-16:00 Thomas Lambert Exact Partitioning of a Discrete Square
16:00-16:30 Lionel Eyraud-Dubois Heterogeneous Scheduling for Cholesky factorization

Details of the talks

Samuel Thibault: StarPU recent advances

I will discuss various recent StarPU work related with SOLHAR. Control over memory consumption has been improved to better cope with dynamic data sizes. The consumption of memory by the MPI layer can also be more controlled. Data eviction heuristics for Out of core support have been improved to better anticipate transfers. CUDA multistream is also now fully supported.

Mathieu Faverge: Hierarchical DAG Scheduling for Hybrid Distributed Systems

Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak computational capacity. Despite significant advances in the programming interfaces to such hybrid architectures, traditional programming paradigms struggle with mapping the resulting multi-dimensional heterogeneity and the expression of algorithm parallelism, resulting in sub-optimal effective performance. Task-based programming paradigms have the capability to alleviate some of the programming challenges on distributed hybrid many-core architectures. In this paper we take this concept a step further by showing that the potential of task-based programming paradigms can be greatly increased with minimal modification of the underlying runtime combined with the right algorithmic changes. We propose two novel recursive algorithmic variants for one-sided factorizations and describe the changes to the PaRSEC task-scheduling runtime to build a framework where the task granularity is dynamically adjusted to adapt the degree of available parallelism and kernel efficiency according to runtime conditions. Based on an extensive set of results, we show that, with one-sided factorizations, i.e. Cholesky, and QR, a carefully written algorithm, supported by an adaptive tasks-based runtime, is capable of reaching a degree of performance and scalability never achieved before in distributed hybrid environments.

Florent Pruvost: Towards a solver software stack on top of runtime systems.

Exploiting efficiently modern supercomputers requires to use many advanced software libraries. Their choice, tuning and interactions are often platform-dependent and may induce a very high software complexity. Designing such a software stack requires many experts to cooperate with each other. While it is almost impossible for a developer to tune all the components, he often needs to highly tune a few components. In this presentation, I will present the roadmap we have tackled to design and maintain a solver stack. The originality is that we want to allow the developer to tune as highly as he wants the pieces he is a specialist of (may they be a runtime sytem, a scheduler, a numerical algorithm) in a way that is interoperable with other components that are automatically installed and tuned for the target platform.

Grégoire Pichon: Blocking strategy optimizations for sparse direct linear solver on heterogeneous architectures.

In the context of solving sparse linear systems, the nested dissection process partitions the matrix graph to minimize both the fill-in and the computational cost. We found that the classic Reverse Cuthill McKee algorithm used to order unknowns in supernodes might be enhanced to reduce the number of off-diagonal blocks by increasing their sizes. This turns into the same complexity for the factorization algorithm, but allows for more efficient BLAS kernels. On the other side, one might want to split the larger supernode to introduce more parallelism. The regular splitting strategy when applied locally impacts significantly the number of off-diagonal blocks and might have negative effect on the efficiency. In this talk, we present both a new strategy to improve supernodes ordering and splitting strategy that both enlarge the average off-diagonal block sizes without changing the computational cost of the factorization. Performance improvement gains on the supernodal solver PaStiX are shown on multi-cores and heterogeneous architectures.

Florent Lopez: Task-based multifrontal QR solver for GPU-accelerated multicore architectures.

Recent studies have shown the potential of task-based programming paradigms for implementing robust, scalable sparse direct solvers for modern computing platforms. Yet, designing task flows that efficiently exploit heterogeneous architectures remains highly challenging. In this talk we first discuss the data partitioning using a method suited to heterogeneous platforms allowing task granularity to be sufficiently large to obtain a good acceleration factor on GPU but capable of generating enough parallelism in the task graph. Secondly we handle the task scheduling with a strategy capable of taking into account workload and architecture heterogeneity at a reduced cost. Finally we propose an original evaluation of the performance obtained in our solver on a test set of matrices.

Plenary meeting, January 25th, 2016 @ Bordeaux

Plenary meeting, December 2nd, 2016 @ Toulouse

09:30-10:00 Bertrand Simon Malleable task-graph scheduling with a practical speed-up model
10:00-10:30 Luka Stanisic Modeling and Simulation of Dynamic Task-Based Applications
11:00-11:30 Bora Ucar Matrix symmetrization and sparse direct solvers
11:30-12:00 Terry Cojean Exploiting two-level parallelism on manycore architectures
12:00-14:00 Lunch
14:00-14:30 Loris Marchal Dynamic memory-aware task-tree scheduling
15:00-15:30 Hatem Ltaief High Performance Low Rank Cholesky Factorization for Weather Prediction Applications
14:30-15:00 Abdou Guermouche Programming with hierarchical tasks: Control the task flow
15:30-16:00 Lucas Schnorr Composite Views for Performance Analysis of Dense/Sparse Task-based Codes
16:00-16:30 Marc Sergent Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System
16:30-17:00 Closing and discussions
meetings.txt · Last modified: 2017/02/01 22:03 (external edit)