Below error is occurred during ES3000 inspection:
Average EC: 209
Max bad block rate: 0.080%
Event log: 1 error(s)
# hio_info -d /dev/hioa
hioa Size(GB): 1204
Max size(GB): 1204
Serial number: 030PXS10D2000011
Driver version: 220.127.116.11
Bridge firmware version: 228
Controller firmware version: 228
Battery firmware version: 105
Battery status: Warning
Run time (sec.): 73346200
Total IO read: 4067017862
Total IO write: 4815334486
Total read(MB): 87612080
Total write(MB): 178392684
IO timeout: 0
R/W error: 0
Max bit flip: 8
# hio_log -d /dev/hioa
2014-07-20 03:57:38 <0x93> hioa controller 0: SEU fault
1. Power off the server then power on, to start the system regularly.
2. Backup the data on the ES3000 if requires. Ignore the step if no need to save the data.
3. Execute the command to delete the data on SSD:
hio_cleardata -d /dev/hioa
4. Execute hio_clear command to clear the log information as below steps:
1) cd /usr/local/hio
2) tar -xvf toolsd
3) /usr/local/hio /hio_clear -d /dev/hioa -il
// Please notice that is half-angle of “-” in step3,4. The parameters of log deletion is lower-case letter i and l.
Result after clearing:
5. After reboot the system, check the status by command of hio_info. If it shows OK that means recover successfully.
This fault is caused by soft failure of FPGA. It’s a general phenomenon which used to happen to RAM devices in industry.
The causes of soft failure of FPGA are as below:
1. Soft failure is the specific phenomenon of all of the semiconductor devices but especial RAM. And it will cause instant Bit inverting but not permanent damage.
2. FPGA is the structure base on SRAM which is possible to cause soft failure.
3. FPGA soft failure is caused by Bit inverting when the neutron of cosmic ray shocked the bit space of RAM. It will be recovered after reload the configuration.
4. ES3000 is doing SEU inspection to make sure the correctness and consistency of data. It will report immediately once error appears during scanning to all the spaces of FPGA by internal dedicated engine.
5. The probability of failure which is given by vendor is 65Years/Time for one chip. We do statistics for all the SEU have been delivered and current is within the target for FPGA.
Reboot the server and delete the ES log information.