I keep track of some seminars and thesis defenses taking place at the Informatics Institute of UFRGS.
26th October 2018, 11AM.
Reproducible Research: where do we stand?
Arnaud Legrand and Jean-Marc Vincent
Polaris INRIA Team, Laboratoire LIG
Sala 109 do Prédio 43425
- Reproducibility of experiments and analysis by others is one of the pillars of modern science. Yet, the description of experimental protocols, software, and analysis is often lacunar and rarely allows a third party to reproduce a study. Such inaccuracies has become more and more problematic and are probably the cause of the increasing number of article withdrawal even in prestigious journals and the realization by both the scientific community and the general public that many research results and studies are actually flawed and misleading. Open science is the umbrella term of the movement that strives to make scientific research, data and dissemination accessible to all levels of an inquiring society. Reproducible research encompasses the technical and social aspects of science allowing and promoting better research practices. In this talk, I will give a broad overview of the challenges at stake and of emerging solutions. I will also particularly discuss the role computer science can play in this topic.
19th March, 2018, 10:15 AM.
Improved Static Analysis to Generate More Efficient Code for Execution of Loop Nests in GPUs.
Prof. J. Nelson Amaral (University of Alberta, Canada).
Auditório 67 Inferior.
- OpenMP is a high-level programming model where parallel loops are best suited for translation into data-parallel code. The OpenMP standard 4.0 introduced support for offloading computing kernels to accelerator devices in an architecture-agnostic fashion. When the target is a modern GPU architecture, the program must conform to the data-parallel execution model. Thus, it is critical to understand the performance implications of transforming parallel loops into GPU kernels. Recognizing and manipulating memory access patterns is especially important on GPUs due to the massive impact of memory coalescing on performance. In this talk we present a new static analysis framework that can be used to determine the memory coalescing characteristics of parallel loops destined for GPU offloading and to ascertain both safety and profitability of loop transformations with the goal of improving their memory access characteristics. This talk discusses how this new analysis framework can be used to guide loop code transformation for more efficient execution in GPUs. This talk also demonstrates how target-architecture-aware compilers can reduce the burden of hand-tuning OpenMP loop code, improving code portability and reducing programmer effort.
October, 2017 (PPGC Short 15h Course - Laboratory).
Literate Programming and Statistics.
Prof. Jean-Marc Vincent (Université Grenoble Alpes) and myself.
October 30th, 2015, 12h (Brazilian Time) (Room 220 - Videoconference).
A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic Task-based Scientific Applications (Thesis Defense).
Luka Stanisic (PhD Candidate at the University Grenoble-Alpes, France).
The evolution of High-Performance Computing systems has taken a sharp turn in the last decade. Due to the enormous energy consumption of modern platforms, miniaturization and frequency scaling of processors have reached a limit. The energy constraints has forced hardware manufacturers to develop alternative computer architecture solutions in order to manage answering the ever-growing need of performance imposed by the scientists and the society. However, efficiently programming such diversity of platforms and fully exploiting the potentials of the numerous different resources they offer is extremely challenging. The previously dominant trend for designing high performance applications, which was based on large monolithic codes offering many optimization opportunities, has thus become more and more difficult to apply since implementing and maintaining such complex codes is very difficult. Therefore, application developers increasingly consider modular approaches and dynamic application executions. A popular approach is to implement the application at a high level independently of the hardware architecture as Directed Acyclic Graphs of tasks, each task corresponding to carefully optimized computation kernels for each architecture. A runtime system can then be used to dynamically schedule those tasks on the different computing resources.
Developing such solutions and ensuring their good performance on a wide range of setups is however very challenging. Due to the high complexity of the hardware, to the duration variability of the operations performed on a machine and to the dynamic scheduling of the tasks, the application executions are non-deterministic and the performance evaluation of such systems is extremely difficult. Therefore, there is a definite need for systematic and reproducible methods for conducting such research as well as reliable performance evaluation techniques for studying these complex systems.
In this thesis, we show that it is possible to perform a clean, coherent, reproducible study, using simulation, of dynamic HPC applications. We propose a unique workflow based on two well-known and widely-used tools, Git and Org-mode, for conducting a reproducible experimental research. This simple workflow allows for pragmatically addressing issues such as provenance tracking and data analysis replication. Our contribution to the performance evaluation of dynamic HPC applications consists in the design and validation of a coarse-grain hybrid simulation/emulation of StarPU, a dynamic task-based runtime for hybrid architectures, over SimGrid, a versatile simulator for distributed systems. We present how this tool can achieve faithful performance predictions of native executions on a wide range of heterogeneous machines and for two different classes of programs, dense and sparse linear algebra applications, that are a good representative of the real scientific applications.
October 29th, 2015, 12h45 (Amphi 67).
How to find a Needle in a Haystack ? or On the Detection of Anomalies in Large Traces.
Prof. Jean-Marc Vincent (University Grenoble-Alpes, France).
- Large-scale high-performance applications are involving an ever-increasing number of threads to explore the extreme concurrency of today’s systems. The performance analysis through visualization techniques usually suffers severe semantic limitations due, from one side, to the size of parallel applications and, from another side, to the challenges regarding the visualization of large-scale traces. Most of performance visualization tools rely therefore on data aggregation to work at scale. Even if this abstraction technique is frequently used, to the best of our knowledge, there has not been any attempt to evaluate the quality of aggregated data for performance visualization. This presentation presents an approach which fills the gap. We propose to build optimized macroscopic visualizations using measures inherited from information theory, and in particular the Kullback-Leibler divergence. These measures are exploited to estimate the complexity reducion and the information loss during the aggregation. We first illustrate the applicability of our approach by exploiting these two measures for the analysis of work stealing traces using squarified treemaps. We then report the effective scalability of our approach by visualizing known anomalies in a synthetic trace file recording the behavior of one million processes. And recently we apply this approach on very long traces (>1G events) extracted from embedded systems. This approach has been fruitfully applied in other scientific domains as multi-agent systems, geography or media sciences, but it is another story.
October 27th, 2015, 14h (Adm 220).
Latest advances on TreeMatch our process placement tool and algorithm.
Prof. Emmanuel Jeannot (INRIA Bordeaux, France).
- TreeMatch is a tool and an algorithm for performing process placement of parallel application. It is being developed in pur team for several years. After presenting this tool and it use-cases (MPI process reordering, load-balancing, I/O optimisation, etc.) We will present the latest advances in this topic: optimisation and parallelisation of the exact solution algorithm; experiments on different affinity metrics.
August 11-20, 2015.
Scientific Methodology and Performance Evaluation for Experimental Computer Scientists.
Prof. Arnaud Legrand (CNRS, France).
- The aim of this course is to provide the fundamental basis for sound scientific methodology of performance evaluation of computer systems. Some of the topics include some maths (in particular probabilities and statistics) but I will adopt a very pragmatic presentation particularly suited to experimental computer scientists and will re-explain any notion that would not be clear. Every lecture will be backed up with practical sessions and worked out examples.
- Tuesday 11: 08h30 - 12h10. Intro. to reproducible research and open science (43413-104)
- Thursday 13: 08h30 - 12h10. Data presentation, reporting results. (43413-104)
- Friday 14: 08h30 - 12h10. Basic notions of statistics (43413-106)
- Tuesday 18: 08h30 - 12h10. More advanced notions of statistics (43413-104)
- Thursday 20: 08h30 - 12h10. Design of Experiments (43413-104)
- Nowadays, lots of applications are available as free software. In order to manage them easily, various distributions have been created. One of them, Debian, focus explicitly on free software and it has several unique characteristics. Debian is a (very) old Linux distribution, Debian is not lead by a company but by a open community, Debian is really widespread and used either directly or through other distributions such as Ubuntu for example. In this talk, I will try to present you the Debian project, its organization, how you can be involved and contribute, etc. The Debian project is very wide, so I will focus on a few aspects of the Project and present them more in details.
April 30th, 2014, 13h30.
Parallel Architectures: hardware and software evolution.
Prof. Vincent Danjean (University of Grenoble I, France).
Room 220 - Building 43412.
- For a long time, processors became more powerful by increasing their speed (frequency) and by adding internal resources (more registers, more computational units, more cache, etc.) These improvements immediately benefit to all applications: using an updated processor automatically enables the applications to run faster. However, over the last few years, we observe a great evolution of processors core architecture. Frequency does not increase any more and processors become multi-cores or even many-cores. To be efficient on such processors, applications must be rewritten to explicitly exhibit parallelism that can be exposed to processors (multi-threaded applications, etc.). In this talk, I will present an overview of the evolution of processor designs, and explain why such a path has been taken by hardware manufacturers. I will also present some middlewares, languages or tools that have been developed in order to help application programmers to easily exploits this new hardware, such as, for example, OpenCL, Cuda, OpenMP, Cilk, etc.