Understanding Soft Errors
Introduction
This document describes the principles, identification method, and countermeasures of soft errors.
Understanding Soft Errors
When high-energy subatomic particles traverse silicon in an ultra-large circuit storage unit (flip-flop, register unit, or RAM unit), free charges are generated. These charges accumulate in a circuit node in a very short interval (about 15 ps). When the charges are accumulated more than a certain degree, the stored data changes, causing a system error. Because its damage to the circuit is not permanent, the error is called a soft error.
Soft errors may be caused by cosmic rays, Boron fission, alpha rays, system noise, or electromagnetic interference.
Only soft errors caused by high energy neutrons (> 1 MeV) in cosmic rays may occur on CE12800 switches. Neutron bombardment separates the originally stable charges. The charges drift under the electric field in the depletion region, forming a pulse of current interference. Generally, this interference lasts for a short time, which is less than 100 ps. However, this small pulse may change the current logic status of the circuit (depending on the pulse strength and the noise margin of the circuit).
The SRAM is composed of latch units, and the state of each latch unit is either 0 or 1. The pulse current causes the latch in the SRAM to jump, which appears to be a soft error.
S |
R |
Q |
¬Q |
---|---|---|---|
0 |
0 |
Latch |
Latch |
0 |
1 |
0 |
1 |
1 |
0 |
1 |
0 |
The S and R operations can be performed only in turn. S and R cannot be set to 1 simultaneously. |
Identification Method and Countermeasures
You can identify whether a soft error occurs based on the following characteristics:
- The error can be fixed after a reset is performed.
- Long-term observation shows that the errors are evenly distributed and the locations are random. They rarely occur at the same location.
- The chip can pass the ATE system test.
- The error occurrence probability is related to the altitude and solar cycle.
High-energy neutrons have strong penetration capability. Currently, there is no complete shielding method in the industry. The common practices are as follows:
- Improve the manufacturing process and increase the noise margin.
- Provide soft error detection methods such as parity check, ECC check, and CRC check.
- Devices automatically fix detected soft errors. (The CE switch resets a card after detecting multi-bit errors.)