GME - Education

Topics of my Current Research

· Modeling of SET Propagation in Integrated Circuits (ASICS and FPGAs)

· Soft Error Mitigation Techniques for Integrated Circuits:

o Built-in Current Sensor connected to the bulk combined with high level mitigation technique

o Analog Majority Voter to implement combinational and sequential logic

o NoC Routers protected by SEU and crosstalk

· SEU and SET Mitigation Techniques for SRAM-based FPGAs

· NoC testing for transient and permanent faults

· Reconfigurable Architectures for Fault-Tolerant Systems

Fernanda Lima Kastensmidt

Professor at the Computer Science Department

More Information

HOME

EDUCATION
RESEARCH
STUDENTS
PUBLICATIONS

PERSONAL

Call for Papers (links)

Motivation

Integrated circuits (ICs) operating in space applications are susceptible to radiation, whose effects can be permanent or transient as discussed by G. Srinivasan in 1996, by P. Shivakumar et al in 2002 and by P. Dodd et al in 2003. The radiation environment is composed of various particles generated by sun activity, as presented by J. Barth in 1997. The particles can be classified as two major types: (1) energetic particles such as electrons, protons and heavy ions, and (2) electromagnetic radiation (photons), which can be x-ray, gamma ray, or ultraviolet light. The main sources of energetic particles that contribute to radiation effects are protons and electrons trapped in the Van Allen belts, heavy ions trapped in the magnetosphere, galactic cosmic rays and solar flares. The charged particles interact with the silicon atoms causing excitation and ionization of atomic electrons.

At the ground level, the neutrons are the most frequent cause of upset as shown in Normand in 1996. Neutrons are created by cosmic ion interactions with the oxygen and nitrogen in the upper atmosphere. The neutron flux is strongly dependent on key parameters such as altitude, latitude and longitude. There are high-energy neutrons that interact with the material generating free electron hole pairs and low energy neutrons. Those neutrons interact with certain Boron isotopes present in semiconductor material creating others particles. Alpha particles are secondary type of sources emitted due to the interaction with radioactive impurities present in the device itself or in the packaging materials and they are the greatest concern. In principle, a very careful selection of materials can minimize alpha particles. However, this solution is very expensive and never eliminates the problem completely.

The Charge Collection Mechanism

If an energetic particle strikes a sensitive region in a semiconductor device, the resulting electron-hole pair generation can cause a transient current that may alter the logical state of the circuit, as shown in figure 1a. The charge deposition mechanism, as described in details by Messenger in 1982, produces a transient pulse that lasts until the deposited charge is conducted away via open current paths to VDD or ground, returning the logic node to its original state. If transient pulse amplitude is high enough and its duration is long enough, compared to the gate delays, the pulse may propagate through circuit stages and change the results of a computation. Hence, not only the amount of deposited charge but also the transient pulse amplitude and duration are key parameters for evaluation of circuit sensitivity to soft errors.

(a) Silicon substrate ionization

(b) Charge Collection Mechanism

Figure 1 – Soft Error Phenomena

The sensitive sites are the surroundings of the reverse-biased drain junctions of a transistor biased in the off state as described by Dodd, as for instance the drain junction of the p-channel transistor in figure 1b. As current flows through the pn-junction of the struck transistor, the transistor in the on-state (n-channel transistor in figure 1b) conducts a current that attempts to balance the current induced by the particle strike. If the current induced by the particle strike is high enough the on-transistor can not balance the current and a voltage change at the node will occur. This voltage change lasts until the charge is conducted away by the current feed through the on-transistor. The width of the induced transient voltage pulse is dependent on the energy of the incident particle, the charge stored at the affected node and the charge collection efficiency of the affected junction.

At the electrical spice level, the charge deposition mechanism can be modeled by a double exponential current pulse at the particle strike site, as shown by Messenger:

, (1)

where I₀ is approximately the maximum charge collection current, t_a is the collection time constant of the junction and t_b is the time constant for initially establishing the ion track. In the circuit simulations and modeling, t_b is assumed to be much smaller than t_a, while t_a is used as a variable parameter, which is in line with experimental findings, as explained by Messenger. Spice transient analysis is performed injecting a double exponential current pulse as given by (1), with the values of I₀ and t_a being used as the variable parameters to determine the minimum charge Q_C corresponding to a given t_a.

The maximum charge collection current I₀ depends on the energetic particle linear energy transfer (LET) value and process parameters. Once the values of I₀, t_b, and t_a are determined for a given technology and particles of interest, any circuit designed in that technology may be evaluated at the circuit level modeling the charge deposition mechanism by (1).

Single Event Upset (SEU) and Single Event Transient (SET) Effects

A single particle can hit either the combinational logic or the sequential logic in the silicon. When a memory cell holds a value, it has two transistors in “on” state and two transistors in “off” state; consequently there are always two SEU sensitive nodes in the cell. When a particle strikes one of these nodes, the energy transferred by the particle can provoke a transistor to switch “on”. This event will flip the value stored in the memory, in other words, a bit flip in the memory cell. This is called Single Event Upset (SEU). Figure 2 (a) and (b) depicts the SEU mechanism in a static memory cell.

When an energetic particle hits one of the sensitive sites of the combinational logic block, it also generates a transient current pulse. This phenomenon is called Single Event Transient (SET). For combinational logic blocks with registered outputs, the SET may eventually appear at the input of the flip-flops placed at the combinational logic outputs, if the induced transient pulse is neither logically nor electrically masked by the logic inputs nor masked by the latch or flip-flop window. Logical masking occurs when the input stimulus are holding controlled values in the logical path in such way that the SET can not be propagated to the outputs. Figure 2(c) exemplifies this logical masking, note that the output holds the value one, independently to the SET value because the nand gate has one of the inputs at logical zero and the nor gate presented in the SET path has consequently one of the inputs at logical one. Electrical masking occurs if the pulse is attenuated as it propagates through the logic chain and fades out before it reaches the registered output, as shown in figure 2(d). If a SET is either logically or electrically masked, it is interpreted as a valid signal at the register input and it can be captured by the element memory according to the latching window (usually based on the setup time and hold time of the memory element), figure 2(e). Once a SET is captured, a wrong value will be stored in the register provoking a soft error.

(a) Memory cell

(b) Transient current pulse generating the bit-flip

(d) Electrical Masking

(e) Latching window masking

Figure 2 - Single Event Upset (SEU) and Single Event Transient (SET) in integrated circuits

Challenges for Future Technologies

Soft errors are becoming more likely to be generated at a node and more frequently to occur and to become an error in future technologies. As the process technology shrinks and supply voltage decreases, the charge stored at logic circuit nodes reduces roughly according to Q_node = C_node × V_dd, which increases the sensitivity of nodes to radiation induced upsets. Further, since the inherent delay of MOS transistors is decreasing with rapid technology scaling, the frequencies at which circuits are operated is continuously increasing. As a result, the rate at which SETs get latched as errors depends on the operating frequencies and the logic structure of the circuit. This increases the probability of SETs getting latched as errors. Additional reasons are the reduction in electrical and timing masking. The electrical masking decreases with the technology scaling because of the short gate delays and reduced logic depth between pipeline registers. Reduced logical depth also decreases logical masking. The reduction in timing masking is a consequence of higher operating frequencies which increases the probability of a SEU pulse being latched. Thus, in Very Deep Sub-Micron (VDSM) technologies soft errors in logic circuits are becoming a reliability problem, as pointed out by P. Shivakumar et al in 2002.

Fault tolerant techniques for integrated circuits can be applied at different moments in the circuit design flow. There are techniques based on the process technologies such as using new materials to decrease the susceptibility to ionization of the substrate. Other techniques are applied in the electrical design phase, such as transistor dimension, transistor redundancy and by adding electrical sensors. Some techniques can be added at logic design step, such as by adding hardware and time redundancy in the logic blocks and in the software application.

Figure 3 represents the sequence of events that may occur once an energetic particle hit the substrate provoking ionization, as it was discussed previously. This ionization generates a set of electron-hole pairs that creates a transient current that is injected or extracted to that node. According to the amplitude and duration of this current pulse, a transient voltage pulse may appear at the hit node. This is characterized as the fault. There is a fault latency period that defines the time needed for that fault become an error in the circuit. This will only occur if this transient voltage node changes the logic of a storage element (flip-flop), generating a bit-flip. This bit-flip may generate an error if the content of this flip-flop is used for a certain operation. But for the application point of view, it is not mandatory that this error is manifested as a failure in the system. There is also an error latency that defines the time needed for that error become a failure in the system. For each phase a different fault tolerant technique can be used.

For example, at the ionization and transient current phase, sensors can be built in the silicon substrate to detect ionization currents. At the transient voltage pulse generation, time redundancy can be used to detect the transient pulse in time. To mitigate the bit-flips, hardware redundancy and error correcting codes can be used to correct the data. To correct an error, it is possible to use self-checking blocks with recovery mechanisms or recomputation to restore the correct data. Finally, spare chips may be used to guarantee operation of the system if a failure occurs. In this paper, we present the bulk-BICS sensor that is capable to detect ionization in the circuit substrate, warning a potential cause of a fault in a very early stage.

Figure 3 – Sequence of events from ionization to failure and a set of fault tolerant techniques applied at different time.