UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL
INSTITUTO DE INFORMÁTICA
PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO
———————————————-
Aluno: Laércio Lima Pilla
Orientador: Prof. Dr. Philippe Olivier Alexandre Navaux
Cotutela: Université de Grenoble, França
Título: Topology-Aware Load Balancing for Performance Portability over Parallel High Performance Systems.
Linha de Pesquisa: Processamento Paralelo e Distribuído
Data: 11/04/2014
Horário: 13:30h
Local: Prédio 43412 – sala 215 (Sala de Videoconferência), Instituto de Informática
Banca Examinadora:
Prof. Dr. Raymond Namyst (Université de Bordeaux 1)
Prof. Dr. Wagner Meira Júnior (UFMG)
Prof. Dr. Alfredo Goldman Vel Lejbman (USP)
Prof. Dr. Nicolas Bruno Maillard (UFRGS)
Resumo:
This thesis presents our research to provide performance portability and scalability to complex scientific applications running over hierarchical multicore parallel platforms. Performance portability is said to be attained when a low core idleness is achieved while mapping a given application to different platforms, and can be affected by performance problems such as load imbalance and costly communications, and overheads coming from the task mapping algorithm. Load imbalance is a result of irregular and dynamic load behaviors, where the amount of work to be processed varies depending on the task and the step of the simulation. Meanwhile, costly communications are caused by a task distribution that does not take into account the different communication times present in a hierarchical platform. This includes nonuniform and asymmetric communication costs at memory and network levels. Lastly, task mapping overheads come from the execution time of the task mapping algorithm trying to mitigate load imbalance and costly communications, and from the migration of tasks.
Our approach to achieve the goal of performance portability is based on the hypothesis that precise machine topology information can help task mapping algorithms in their decisions. In this context, we proposed a generic machine topology model of parallel platforms composed of one or more multicore compute nodes. It includes profiled latencies and bandwidths at memory and network levels, and highlights asymmetries and nonuniformity at both levels. This information is employed by our three proposed topology-aware load balancing algorithms, named NucoLB, HwTopoLB, and HierarchicalLB. Besides topology information, these algorithms also employ application information gathered during runtime. NucoLB focuses on the nonuniform aspects of parallel platforms, while HwTopoLB considers the whole hierarchy in its decisions, and HierarchicalLB combines these algorithms hierarchically to reduce its task mapping overhead. These algorithms seek to mitigate load imbalance and costly communications while averting task migration overheads.
Experimental results with the proposed load balancers over different platform composed of one or more multicore compute nodes showed performance improvements over state of the art load balancing algorithms: NucoLB presented improvements of up to 19% on one compute node; HwTopoLB experienced performance improvements of 19% on average; and HierarchicalLB outperformed HwTopoLB by 22% on average on parallel platforms with ten or more compute nodes. These results were achieved by equalizing work among the available resources, reducing the communication costs experienced by applications, and by keeping load balancing overheads low. In this sense, our load balancing algorithms provide performance portability to scientific applications while being independent from application and system architecture.
Palavras-chave: Computer architecture, Parallel programming, Profiling, Scheduling.
____________
Divulgação PPGC