Backups Fail Due to a VTL Engine Server Failure

Publication Date:  2012-07-18 Views:  135 Downloads:  0
Issue Description
The KVM cannot show the status information about the engine server. Meanwhile, the VTL3605 engine system fails to be connected to the network. After you restart the VTL engine, the engine and backup services are restored.
Alarm Information
None
Handling Process
Replace the VTL3605 engine server mainboard.
Root Cause
After you restart the VTL engine, capture and analyze VTL XRAY. Then, you can obtain the following information:
a. Much USB alarm information and capacity lack information are recorded on the log. The USB HID is not detected. VTL software and system software are running properly before the failure.
b. The filter capacity C1212 of the power circuit on the VTL engine server mainboard BMC has bad quality so that the power supply ripple of P3V3_STBY of the BMC reaches 102 mV that is much higher than the regulated 66 mV. Therefore, the KVM has no any output and the mouse and keyboard are intermittently faulty.
c. This problem leads to a system breakdown in Linux and UNIX but not in Windows. Intermittent mouse failures enable the Linux to frequently load and delete the USB in the kernel and every single load causes allocation and release of kernel resources.
d. The log tells that when the system cannot find the USB, it will uninstall USB HID driver. However, before the system finishes uninstalling the USB HID driver, it has to load it. This results in hidden waste of kernel resources. If this situation frequently occurs, kernel resources become insufficient and the kernel is locked.
e. The kernel is not actually locked but only a seeming lock. However, such a lock looks like the system breaks down.
Suggestions
1. Release prealarm information in time.
2. If the hardware of the same batch fails, locate and replace the hardware in time.
3. Observe the quality motoring procedure and enhance quality audit.
4. Optimize the procedure for installing software and ensure delivery quality.
5. Share and update information in time.

END