UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL
INSTITUTO DE INFORMÁTICA
PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO
———————————————-
Aluno: Ronaldo Rodrigues Ferreira
Orientador: Prof. Dr. Luigi Carro
Coorientador: Prof. Dr. Álvaro Freitas Moreira
Título: The Transactional HW/SW Stack for Fault Tolerant Embedded Computing
Linha de Pesquisa: Engenharia da Computação – Sistemas Embarcados
Data: 28/08/2014
Local: Prédio 43424 – Auditório Prof. Castilho, Instituto de Informática
Banca Examinadora:
Prof. Dr. Flávio Rech Wagner (UFRGS)
Prof. Dr. Fernanda Gusmão de Lima Kastensmidt (UFRGS)
Prof. Dr. José Rodrigo Furlanetto de Azambuja (FURG)
Resumo:
Fault tolerance implementation in embedded systems is challenging because the physical constraints of area occupation, power dissipation, and energy consumption an usual embedded design has to meet. The need for optimizing these three physical constraints while doing computation within the available performance goals and real-time deadlines creates a conundrum that is hard to solve. Classical fault tolerance solutions such as triple and dual modular
redundancy are not feasible due to their high power overhead or lack of efficient and deterministic error recovery. New proposals in the literature, although some of them reduce the power and area overhead, incur in heavy performance penalties and most of the time do not offer a feasible solution in terms of the fault model these methods accept. This work proposes the Transactional HW/SW Stack, or simply Stack, to efficiently manage the area, power, coverage, and performance conundrum. The Stack introduces a new compilation strategy that assembles programs into Transactional Basic Blocks, together with a novel microprocessor, the TransactOnal Basic Block Architecture, which provides fine-grained error detection and deterministic error rollback and elimination using the Transactional Basic Blocks both as a container for errors and as a small unit of data checkpointing. Area, power, performance, and coverage of the Stack were evaluated using the hardware implementation model of the proposed architecture. The Stack attains an error coverage of 99.9% with a power overhead of 2 (comparable to dual modular redundancy) within an area overhead of 1.6 in average. The Stack also presents a performance overhead considerably smaller than competing techniques in the literature.
Palavras-chave: Compiler Design, Coverage, Error Detection, Error Recovery, Fault Injection, Hardening By Design, Latency, LLVM, Modular Redundancy, Power Wall, Redundancy, Register File, Rollback, Single Event Effects