DEFESA DE TESE DE DOUTORADO
Aluno: Jean Luca Bez
Orientador: Prof. Dr. Philippe Olivier Alexandre Navaux
Coorientador: Prof. Dr. Antonio Cortes Rosseló
Título: Dynamic Tuning and Reconfiguration of the I/O Forwarding Layer in HPC Platforms
Linha de Pesquisa: Computação de Alto Desempenho e Sistemas Distribuídos
Esta banca ocorrerá excepcionalmente de forma totalmente remota. Interessados em assistir a defesa poderão acessar a sala virtual através do link: https://mconf.ufrgs.br/webconf/00259119
– Prof. Dr. Altigran Soares da Silva (UFAM)
– Prof. Dr. Luciano Paschoal Gaspary (INF-UFRGS)
– Prof. Dr. Wagner Meira Junior (UFMG)
Presidente da Banca: Prof. Dr. Philippe Olivier Alexandre Navaux
Abstract: Input and output (I/O) operations are a bottleneck for an increasing number of applications in High-Performance Computing (HPC) platforms. Furthermore, it has the potential of critically impacting performance on the next generation of supercomputers. I/O optimization techniques can provide improvements for specific system configurations and application access patterns, but not for all of them. We call the access pattern the way an application performs its I/O operations. These techniques frequently rely on the precise tune of parameters, which commonly falls back to the users. In such large scale systems, we have an ever-changing application set running with distinct characteristics and demands. Hence, to improve performance successfully, it is essential to adapt the system to a changing workload dynamically. In this work, we seek to guide optimization and tuning strategies by identifying the application’s I/O access pattern. We evaluate three machine learning techniques to detect such patterns at runtime automatically: decision trees, random forests, and neural networks. Using the detected pattern, we propose a tuning strategy that uses a reinforcement learning technique (contextual bandits) to make the system capable of learning the best parameter value to each observed access pattern during its execution. That eliminates the need for a complicated and time-consuming previous training phase. Finally, we argue in favor of a dynamic on-demand allocation of I/O nodes considering the application’s I/O characteristics. We show that the forwarding layer’s global deployment combined with the existing static allocation policy based solely on application size should instead be dynamic and consider the applications’ access patterns to improve global performance. We presented a user-level I/O forwarding solution named GekkoFWD that does not require application modifications and allows a dynamic remapping of forwarding resources to compute nodes. We proposed a novel I/O forwarding allocation policy based on the Multiple-Choice Knapsack Problem. We demonstrate our dynamic MCKP policy’s applicability to arbitrate I/O nodes through extensive evaluation and experimentation. We show it could transparently improve global I/O bandwidth by up to 23x compared to the existing static policy.
Keywords: High Performance I/O. Parallel I/O. I/O Forwarding. I/O Scheduling. Dynamic Tuning. Dynamic Reconfiguration.