|
Motivation Integrated
circuits (ICs) operating in space applications are susceptible to radiation,
whose effects can be permanent or
transient as discussed by G. Srinivasan in 1996, by
P. Shivakumar et al in 2002 and by P. Dodd et al in
2003. The radiation environment is composed of various particles generated by
sun activity, as presented by J. Barth in
1997. The particles can be classified
as two major types: (1) energetic particles such as electrons, protons and
heavy ions, and (2) electromagnetic radiation (photons), which can be x-ray,
gamma ray, or ultraviolet light. The main sources of energetic particles that
contribute to radiation effects are protons and electrons trapped in the Van
Allen belts, heavy ions trapped in the magnetosphere, galactic cosmic rays
and solar flares. The charged particles interact with the silicon atoms
causing excitation and ionization of atomic electrons. At the
ground level, the neutrons are the most frequent cause of upset as shown in Normand in 1996. Neutrons are created by cosmic ion
interactions with the oxygen and nitrogen in the upper atmosphere. The
neutron flux is strongly dependent on key parameters such as altitude,
latitude and longitude. There are high-energy neutrons that interact with the
material generating free electron hole pairs and low energy neutrons. Those
neutrons interact with certain Boron isotopes present in semiconductor
material creating others particles. Alpha particles are secondary type of
sources emitted due to the interaction with radioactive impurities present in
the device itself or in the packaging materials and they are the greatest
concern. In principle, a very careful selection of materials can minimize
alpha particles. However, this solution is very expensive and never
eliminates the problem completely. The Charge Collection Mechanism If an
energetic particle strikes a sensitive region in a semiconductor device, the
resulting electron-hole pair generation can cause a transient current that
may alter the logical state of the circuit, as shown in figure 1a. The charge
deposition mechanism, as described in details by Messenger in 1982, produces
a transient pulse that lasts until the deposited charge is conducted away via
open current paths to VDD or ground, returning the logic node to its original
state. If transient pulse amplitude is high enough and its duration is long
enough, compared to the gate delays, the pulse may propagate through circuit
stages and change the results of a computation. Hence, not only the amount of
deposited charge but also the transient pulse amplitude and duration are key
parameters for evaluation of circuit sensitivity to soft errors.
(a) Silicon substrate ionization (b) Charge
Collection Mechanism Figure 1 – Soft
Error Phenomena The
sensitive sites are the surroundings of the reverse-biased drain junctions of
a transistor biased in the off state as described by Dodd, as for instance
the drain junction of the p-channel transistor in figure 1b. As current flows
through the pn-junction of the struck transistor,
the transistor in the on-state (n-channel transistor in figure 1b) conducts a
current that attempts to balance the current induced by the particle strike.
If the current induced by the particle strike is high enough the
on-transistor can not balance the current and a voltage change at the node
will occur. This voltage change lasts until the charge is conducted away by
the current feed through the on-transistor. The width of the induced
transient voltage pulse is dependent on the energy of the incident particle,
the charge stored at the affected node and the charge collection efficiency
of the affected junction. At
the electrical spice level, the charge deposition mechanism can be modeled by
a double exponential current pulse at the particle strike site, as shown by
Messenger: , (1) where I0 is approximately the maximum charge collection current,
ta is the collection time constant of the junction and tb is the time constant for initially establishing the ion track. In the
circuit simulations and modeling, tb is assumed to be much smaller than ta, while ta is used as a variable parameter,
which is in line with experimental findings, as explained by Messenger. Spice
transient analysis is performed injecting a double exponential current pulse
as given by (1), with the values of I0 and ta being used as the variable parameters to determine the minimum charge
QC corresponding to a given ta. The maximum charge collection
current I0 depends on the energetic particle linear energy
transfer (LET) value and process parameters. Once the values of I0,
tb, and ta are determined for a given technology and particles of interest, any
circuit designed in that technology may be evaluated at the circuit level
modeling the charge deposition mechanism by (1). Single
Event Upset (SEU) and Single Event Transient (SET) Effects A
single particle can hit either the combinational logic or the sequential
logic in the silicon. When
a memory cell holds a value, it has two transistors in “on” state
and two transistors in “off” state; consequently there are always
two SEU sensitive nodes in the cell. When a particle strikes one of these
nodes, the energy transferred by the particle can provoke a transistor to
switch “on”. This event will flip the value stored in the memory, in other words, a bit flip in the memory cell. This is called Single
Event Upset (SEU). Figure 2 (a) and (b) depicts the SEU mechanism in a static
memory cell. When
an energetic particle hits one of the sensitive sites of the combinational
logic block, it also generates a transient current pulse. This phenomenon is
called Single Event Transient (SET). For combinational logic blocks with
registered outputs, the SET may eventually appear at the input of the
flip-flops placed at the combinational logic outputs, if the induced
transient pulse is neither logically nor electrically masked by the logic
inputs nor masked by the latch or flip-flop window. Logical masking occurs when
the input stimulus are holding controlled values in the logical path in such
way that the SET can not be propagated to the outputs. Figure 2(c)
exemplifies this logical masking, note that the output holds the value one,
independently to the SET value because the nand
gate has one of the inputs at logical zero and the nor gate presented in the
SET path has consequently one of the inputs at logical one. Electrical masking
occurs if the pulse is attenuated as it propagates through the logic chain
and fades out before it reaches the registered output, as shown in figure 2(d). If a SET is either logically or electrically
masked, it is interpreted as a valid signal at the register input and it can
be captured by the element memory according to the latching window (usually
based on the setup time and hold time of the memory element), figure 2(e).
Once a SET is captured, a wrong value will be stored in the register
provoking a soft error.
(a) Memory cell
(b) Transient current pulse
generating the bit-flip
(c) Logical masking
(d) Electrical Masking
(e) Latching
window masking Figure 2 - Single Event Upset
(SEU) and Single Event Transient (SET) in integrated circuits Challenges for Future Technologies Soft errors are becoming more likely to be
generated at a node and more frequently to occur and to become an error in
future technologies. As the process technology shrinks and supply voltage decreases, the charge stored at logic
circuit nodes reduces roughly according to Qnode
= Cnode × Vdd,
which increases the sensitivity of nodes to radiation induced upsets. Further,
since the inherent delay of MOS transistors is decreasing with rapid
technology scaling, the frequencies at which circuits are operated is continuously
increasing. As a result, the rate at which SETs get
latched as errors depends on the operating frequencies and the logic
structure of the circuit. This increases the probability of SETs getting latched as errors. Additional reasons are the reduction in electrical and timing masking.
The electrical masking decreases with the technology scaling because of the
short gate delays and reduced logic depth between pipeline registers. Reduced
logical depth also decreases logical masking. The reduction in timing masking
is a consequence of higher operating frequencies which increases the
probability of a SEU pulse being latched. Thus, in Very Deep Sub-Micron
(VDSM) technologies soft errors in logic circuits are becoming a reliability
problem, as pointed out by P. Shivakumar et
al in 2002. Fault tolerant techniques for integrated
circuits can be applied at different moments in the circuit design flow. There
are techniques based on the process technologies such as using new materials
to decrease the susceptibility to ionization of the substrate. Other
techniques are applied in the electrical design phase, such as transistor
dimension, transistor redundancy and by adding electrical sensors. Some
techniques can be added at logic design step, such as by adding hardware and
time redundancy in the logic blocks and in the software application. Figure 3 represents
the sequence of events that may occur once an energetic particle hit the
substrate provoking ionization, as it was discussed previously. This
ionization generates a set of electron-hole pairs that creates a transient
current that is injected or extracted to that node. According to the amplitude and duration of
this current pulse, a transient voltage pulse may appear at the hit node. This
is characterized as the fault. There is a fault latency period that defines
the time needed for that fault become an error in the circuit. This will only
occur if this transient voltage node changes the logic of a storage element
(flip-flop), generating a bit-flip. This bit-flip may generate an error if
the content of this flip-flop is used for a certain operation. But for the
application point of view, it is not mandatory that this error is manifested
as a failure in the system. There is also an error latency that defines the
time needed for that error become a failure in the system. For each phase a different fault tolerant
technique can be used. For example,
at the ionization and transient current phase, sensors can be built in the
silicon substrate to detect ionization currents. At the transient voltage
pulse generation, time redundancy can be used to detect the transient pulse
in time. To mitigate the bit-flips, hardware redundancy and error correcting
codes can be used to correct the data. To correct an error, it is possible to
use self-checking blocks with recovery mechanisms or recomputation
to restore the correct data. Finally,
spare chips may be used to guarantee operation of the system if a failure
occurs. In this paper, we present the bulk-BICS sensor that is capable to
detect ionization in the circuit substrate, warning a potential cause of a
fault in a very early stage.
Figure 3 – Sequence of events from ionization
to failure and a set of fault tolerant techniques applied at different time. |