No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

RH1288 V3 Server cannot start

Publication Date:  2017-11-29 Views:  104 Downloads:  0
Issue Description
Customer can't power on server. Nothing happen when they try to power on server.
Alarm Information

"226","Normal","Button","The power button on the panel is pressed.","2017-10-24 13:01:44","Asserted","0x31000001","N/A"
"225","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-19 18:30:38","Deasserted","0x0300000E","N/A"
"224","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-19 18:10:21","Asserted","0x0300000D","1. Check whether the power cables are disconnected or loose.@#AB;2. Replace the power cables.@#AB;3. Replace the PSU. "
"223","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-18 23:11:20","Deasserted","0x0300000E","N/A"
"222","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-18 19:50:01","Asserted","0x0300000D","1. Check whether the power cables are disconnected or loose.@#AB;2. Replace the power cables.@#AB;3. Replace the PSU. "
"220","Normal","Button","The power button on the panel is pressed.","2017-10-15 21:52:17","Asserted","0x31000001","N/A"
"219","Normal","Button","The UID button on the panel is pressed.","2017-10-15 21:52:16","Asserted","0x31000003","N/A"
"213","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-15 18:57:26","Deasserted","0x0300000E","N/A"
"212","Major","System","The power failure results host power-on timed out.","2017-10-15 18:37:19","Asserted","0x2C00002B","1. Check whether the power supply meets server operation requirements.@#AB;2. Remove and reconnect power cables or remove and reinstall the board in the chassis.@#AB;3. Replace the mainboard. "
"209","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-15 18:37:14","Asserted","0x0300000D","1. Check whether the power cables are disconnected or loose.@#AB;2. Replace the power cables.@#AB;3. Replace the PSU. "
"208","Major","System","The power failure results abnormal power-off.","2017-10-15 18:37:13","Asserted","0x2C000007","1. Check whether the power supply meets server operation requirements.@#AB;2.Check for power cables that were disconnected or improperly connected to the PSUs before the alarm was generated.@#AB;3. Remove and reconnect power cables or remove and reinstall the board in the chassis.@#AB;4. Replace the mainboard. "

Handling Process

1. Check alarms, there is information about power lost

"225","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-19 18:30:38","Deasserted","0x0300000E","N/A"
"224","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-19 18:10:21","Asserted","0x0300000D","1. Check whether the power cables are disconnected or loose.@#AB;2. Replace the power cables.@#AB;3. Replace the PSU. "
"223","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-18 23:11:20","Deasserted","0x0300000E","N/A"
"222","Critical","PSU","The AC/DC input of PSU 2 is lost or out-of-range.","2017-10-18 19:50:01","Asserted","0x0300000D","1. Check whether the power cables are disconnected or loose.@#AB;2. Replace the power cables.@#AB;3. Replace the PSU. "

We can see this alarm many times in the latest logs. This is a record about power lost. So, we can confirm that the power module is normal, there maybe some problem in mainboard.

 

2. Check maintenance_log from \dump_info\LogDump\maintenance_log. There is following message, that means that VDDQ fail, it is mainboard fail:

There also can be other messages, such:

pg_vddq_ab_fail_n asserted(1->0)
pg_vddq_cd_fail_n asserted(1->0)
pg_vddq_ef_fail_n asserted(1->0)
pg_vddq_gh_fail_n asserted(1->0)

Root Cause

Mainboard hardware failure.

Solution
Mainboard replacement solved the problem.
Also we should send CPU and memory to customer site, if mainboard replacement won't help to solve problem.
Suggestions

I have three same SRs for 2 months.

 
 
1472237 

END