IPMB Commands are Timed out due to NX110 Switch Board Problems

Publication Date:  2012-07-17 Views:  252 Downloads:  0
Issue Description
At a commercial deployment of the E6000, the user switching module (USM) cannot access the virtual KVM (Keyboard, Video, Mouse) through remote control, and the following information is returned when the serial port of an MM is accessed:
write_ipmb_channel: wait event interruptible timeout!
write_ipmb_channel: wait event interruptible timeout!
write_ipmb_channel: wait event interruptible timeout!
Alarm Information
None
Handling Process
Considering the possibility that the address of any component mounted by the IPMB bus could be unreachable or conflicted, users can remove each device (including the board, SMM, switch board, and power supply) one by one and then execute an IPMB command on the SMM to see whether certain information or error message would be returned. By doing this, the questionable component can be identified if the command can be successfully executed and no error message is displayed.
In this case, the questionable component is the NX110 switch board in slot A2. By looking into the NX110 switch board in slot A2, a foil joint is found, as shown below.

The foil joint causes I2C channel disconnection, resulting in the inaccessible switch board by an IPMB command. In this condition, no data would be returned through the I2C channel, inducing the SMM to continuously resend IPMB commands to the unresponsive address, which in turn leads to long bus employment time, continuous occupation of IPMC channel resources, and eventually the command timeout.
Root Cause
The following figure shows the architecture of the intelligent platform management bus (IPMB).
From the previous figure, we can see that the BMC module is actually a processor module that functions independently of the board processor. The BMC module provides a system interface, storage interface, GPIO interface, serial port, and inter-integrated circuit (I2C) bus, hosts the IPMI protocol stack, and monitors boards through the connection between the IPMB (I2C) bus and MM.
When users attempt to execute an IPMB command on the SMM but then the command is somehow timed out, two possibilities present. First, certain information would be returned when the number of timeout boards is smaller than five. Second, no information would be returned when the number is greater than seven with the error message "write_ipmb_channel: wait event interruptible timeout!"
That is to say, the problem is probably caused by the timeout during a device address access by the IPMC bus. Specifically, either the IPMB address is unreachable or the IPMB address is conflicted.
Suggestions
Such problems are normally misdiagnosed as MM failure or backplane failure.

END