UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL
INSTITUTO DE INFORMÁTICA
PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO
DEFESA DE PROPOSTA DE TESE
Aluno: Carlos Eduardo Benevides Bezerra
Orientador: Prof. Dr. Cláudio Fernando Resin Geyer
Título: Scalable State-Machine Replication
Linha de Pesquisa: Sistemas Distribuídos
Data: 29/07/2013
Horário: 16h
Local: Auditório José Mauro Volkmer de Castilho. Prédio 43424 – Instituto de Informática
Banca Examinadora:
Prof. Dr. Alberto Egon Schaeffer Filho (UFRGS)
Profa. Dra. Fabíola Gonçalves Pereira Greve (UFBA)
Profa. Dra. Taisy Silva Weber (UFRGS)
Presidente da Banca: Prof. Dr. Cláudio Fernando Resin Geyer
Resumo:
Redundancy provides fault-tolerance. A service can run on multiple servers that replicate each other, in order to provide service availability even in the case of crashes. A way to implement such a replicated service is by using techniques like state-machine-replication, which is able to provide fault tolerance, while being linearizable, that is, not allowing the clients to distinguish the behaviour of the replicated system to that of a single-site, unreplicated one. However, having a fully replicated, linearizable system comes at a cost, namely, scalability. By scalability we mean that, for some workloads, adding servers will always increase maximum system throughput. Even with a careful setup and using optimizations that avoid unnecessary redundant actions to be taken by servers, at some point the system throughput cannot be increased by adding servers; instead, adding replicas may even degrade performance. A way to achieve scalability is by partitioning the service and then allowing partitions to work independently. On the other hand, having a partitioned, yet linearizable and reasonably performant service is not trivial, and this is the topic of research proposed here. We attempt to reach this goal using three different techniques that can be used together: (i) a model for scalable state machine replication (S-SMR), (ii) a low-latency, quasi-genuine, optimistic atomic multicast algorithm on which S-SMR can build and (iii) a fine-granularity kd-tree based load balancing algorithm that attemps to keep partition servers under loads proportional to their capacities.
Palavra-Chave: partial replication, scalability, linearizability, multicast.