2.29 TC-A2029 A Device Breaks Down Due to Overheating of the Cabinet

Publication Date:  2012-07-18 Views:  152 Downloads:  0
Issue Description
Related information about the product and version: the T3500
Two T3500 devices break down when they are running in the customer's equipment room.
Alarm Information
None
Handling Process
Open the valve at the bottom of the cabinet. Follow the operation regulations and guarantee normal heat dissipation.
Root Cause
1.         Check the IPMI alarm log and sensor information of the device and find that the overheating alarm information, as shown in Figure 2-25, is in the IPMI alarm log of the two device.
Figure 1-1  IPMI alarm log and sensor information

 

In the earlier version, you need to run the IPMI commands to collect the IPMI alarm log and sensor information; in the current version, you can directly collect the information on the ISM, which is much convenient. For details, see 3.12 Viewing Alarms Of The T3500 and 3.13 Checking the Sensor Status Of The T3500.
2.         If the information that the fan is absent cannot be found in the IPMI alarm log and sensor information, it indicates that the fan is working normally. The alarm that the internal environment of the chassis overheats may be caused by the overheating of the external environment of the chassis.

l  In the IPMI alarm log, if the status of the fan is Absent, it indicates that the fan is absent.
l  In the sensor list, if the status of the fan is Critical or Disabled, it indicates that the fan is absent.
3.         Analyze the cause for the overheating of the external environment of the chassis. The two devices are located at the bottom of the cabinet and hard to dissipate heat. If the valve used to dissipate heat is closed due to maloperation, the temperature in the cabinet keeps become higher. At last, the device breaks down due to the overheating of the CPU core.

l   The non-critical alarm caused by the overheating of the CPU0/1 core in the IPMI alarm log cannot lead to the system breakdown.
l  The critical alarm caused by the overheating of the CPU core can lead to the device power-off.
l  If the memory overheats because of the overheating of the ambient environment, the system will break down.
Suggestions
None

END