Português English
Contato
Publicado em: 03/07/2014

Dissertação de Mestrado em Sistemas Distribuidos

UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL
INSTITUTO DE INFORMÁTICA
PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO

———————————————————

DEFESA DE DISSERTAÇÃO DE MESTRADO

Aluno: Ivan Carrera Izurieta

Orientador: Prof. Dr. Cláudio Fernando Resin Geyer

Título: Performance Modeling of MapReduce Applications for the Cloud

 

Linha de Pesquisa: Sistemas Distribuídos e Avaliação de Desempenho

Data: 10/07/2014

Hora: 10:30h

Local: Prédio 43412 – Sala 220 (sala do Conselho), Instituto de Informática

 

Banca Examinadora:

Prof. Dr. Luciano Paschoal Gaspary (UFRGS)

Prof. Dr. Philippe Olivier Alexandre Navaux (UFRGS)

Profa. Dra. Patricia Kayser Vargas Mangan (UNILASALLE)

 

Presidente da Banca: Prof. Dr. Cláudio Fernando Resin Geyer

 

Resumo:

In the last years, Cloud Computing has become a key technology that made possible running applications without needing to deploy a physical infrastructure with the advantage of lowering costs to the user by charging only for the computational resources used by the application. The challenge with deploying distributed applications in Cloud Computing environments is that the virtual machine infrastructure should be planned in a way that is time and cost-effective.
Also, in the last years we have seen how the amount of data produced by applications has grown bigger than ever. This data contains valuable information that has to be extracted using tools like MapReduce. MapReduce is an important framework to analyze large amounts of data since it was proposed by Google, and made open source by Apache with its Hadoop implementation.
The goal of this work is to show that the execution time of a distributed application, namely, a MapReduce application, in a Cloud computing environment, can be predicted using a mathematical model based on theoretical specifications. This prediction is made to help the users of the Cloud Computing environment to plan their deployments, i.e., quantify the number of virtual machines and its characteristics in order to have the lesser cost and/or time. After measuring the application execution time and varying parameters stated in the mathematical model, and after that, using a linear regression technique, the goal is achieved finding a model of the execution time which was then applied to predict the execution time of MapReduce applications with satisfying results.
The experiments were conducted in several configurations: namely, private and public clusters, as well as commercial cloud infrastructures, running different MapReduce applications, and varying the number of nodes composing the cluster, and the amount of workload given to the application. Experiments showed a clear relation with the theoretical model, revealing that the model is in fact able to predict the execution time of MapReduce applications. The developed model is generic, meaning that it uses theoretical abstractions for the computing capacity of the environment and the computing cost of the MapReduce application. Further work in extending this approach to fit other types of distributed applications is encouraged, as well as including this mathematical model into Cloud services offering MapReduce platforms, in order to aid users plan their deployments.

 

Palavras-chave:  Performance Evaluation, Cloud Computing, MapReduce, Capacity Planning.

 

_____________

Divulgação PPGC