1. CentOS used by the RH2285 is incompatible with network interface cards (NICs).
Based on the onsite feedback, the onsite OS is the 64-bit CentOS5.4, and the kernel version is 2.6.18-164.el5.
Linux version 2.6.18-164.el5 (firstname.lastname@example.org)
(gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Thu Sep 3 03:28:30 EDT 2009
CentOS is the free and compiled version of the RedHat OS, and is equivalent to the RedHat OS. After communicating with the RedHat OS manufacturer and the NIC manufacturer, Huawei engineers learn that the RedHat OS in 5.3 or later versions has a compatibility bug for supporting the bnx2 driver of Broadcom 5709. In certain special scenarios, when the service data flow is too large (that is, the network port load is too high), NICs fail with a small possibility, which causes service interruption. For details about the problem description (shown in Figure 1), visit http://kbase.redhat.com/faq/docs/DOC-26837.
Figure 1 Incompatibility between RHEL5.3 and BCM5709
Based on the description in the preceding figure, the incompatibility problem persists in servers that use Broadcom 5709 chips in the RedHat and CentOS with the kernels earlier than kernel-2.6.18-194.3.1.el5.
As an interruption mechanism used by peripheral component interconnect (PCI) devices, the message signaled interrupts (MSI) is applicable to NICs instead of a multi-core system. MSI-X is the enhanced version of MSI. Enabling MSI-X for an NIC driver improves the network performance, but increases the OS load. If the OS cannot adapt to the NIC running status or process data (especially for data with extremely large traffic), the OS becomes abnormal. After MSI-X is disabled, the NIC works in IO-APIC-level mode. In this way, the OS load is light, and the OS is normal.
"Message signaled interrupts (MSI) is an optional feature that enables PCI devices to request service by writing a system-specified message to a system-specified address (PCI DWORD memory write transaction). The transaction address specifies the message destination while the transaction data specifies the message. System software is expected to initialize the message destination and message during device configuration, allocating one or more non-shared messages to each MSI capable function."
2. The solution provided in the RedHat website contains "Disable C-state in BIOS" (shown in red rectangular boxes in Figure 1). In addition, Broadcom5709 drivers are used for installing 64-bit Redhat5.4 in IBM X3652M2 and Dell R710; however, the incompatibility problem persists. edHat engineers reply that the driver of Broadcom NetXtreme II BCM5709 for RHEL 5.3/5.4 has a bug for managing Advanced Configuration and Power Interface (ACPI) power supplies. Therefore, when the NIC is working properly, the ACPI mistakes that the NIC is idle and disables the NIC.
For details about the IBM and Dell fault information, visit
3. Disable the ACPI power management system, that is, disable the C state. Disabling the P state ensures that the power management module disables Broadcom NetXtreme II BCM5709 correctly.