Adaptive processor to dynamically balance fault tolerance, energy consumption, and performance

Energy optimization and fault tolerance techniques focusing on adaptive processors.

Traditional processor designs have been mainly focused on performance. However, as technology has evolved, other axes became of extreme importance. Reducing energy consumption is now mandatory: while most of the embedded devices are heavily dependent on battery power, general-purpose processors (GPPs) are being pulled back by the limits of thermal design power (TDP). In addition, the need for fault tolerance on both space and ground-level systems is increasingly present in current processor designs. As the feature size of transistors decreases, their reliability is also compromised and so they get more susceptible to soft errors. Therefore, all these three axes (performance, energy consumption, and fault tolerance) should be considered and balanced to address the aforementioned issues according to given design constraints. On the other hand, current processors are designed to focus on one or, at most, two of these axes. Achieving the ideal balance between them is challenging, due to their conflicting nature. Let us consider fault tolerance, in which replication techniques are widely used to detect or mask faults during execution: while hardware-based techniques increase the power dissipation, software-based ones increase execution time; and both increase the total energy consumption. In the same way, reducing energy consumption will likely reduce performance; and improving performance will affect the energy consumption and possibly reduce fault tolerance [*]. Examples of such techniques are presented next: software-based dual modular redundancy (DMR) based on checkpoints with rollback was used by the authors in [*] and [*] to detect and correct errors. Whenever an error is detected, the state in which the execution was correct is recovered. Another common approach is to triplicate hardware components and use a majority voter to mask the faults (triple modular redundancy - TMR), as implemented in [*] and [*]. In these cases, they only triplicate the functional units of a VLIW processor rather than the entire processor. DIVA [*] proposes to increase the reliability of a superscalar processor by augmenting the commit phase of the pipeline with a checker unit. The checker verifies and commits the results if the computation is correct, and flush the computation and restart the processor in case of an error. Only a reduced number of works considers all three axes (fault tolerance, energy consumption, and performance). TSH (tricriteria scheduling heuristics) [*] proposes an offline scheduling heuristic that produces a static multiprocessor schedule. In order to increase reliability, the instructions are replicated; and to reduce the energy consumption, DVFS (dynamic frequency and voltage scaling) is applied.

Considering this scenario, our main goal is to propose a new processor capable of transparently adapting the execution of the application at run-time, considering performance, fault tolerance, and energy consumption altogether, in which the weight (priority) of each one is defined a priori by the designer. For this, a DMR with rollback that exploits idle issue-slots is used to improve fault tolerance, while a power gating mechanism is used to turn off certain functional units from the datapath to reduce both static and dynamic power, which also reduces the sensitive area of the processor. A dynamic ILP controller was also developed so issue-slots are artificially freed by automatically moving operations to the next cycle, offering opportunities to maximize the power gating phases and duplicate more instructions. A decision module is responsible for evaluating the application’s phases at runtime and deciding which is the most appropriate technique to be applied considering the system requirements. We have explored the application of the chosen fault tolerance mechanism in different granularities and evaluated their trade-off in [*]. The energy consumption of such techniques and the ILP control mechanism is assessed in [*], a power gating mechanism is evaluated in [*]. The exploitation of multiple threads rather than only adapting the execution at instruction-level is being developed. In addition to the DMR technique, configurations of a Diverse TMR mechanism – heterogeneous TMR with processors that have distinct issue-width – are being evaluated.