One customer submitted multiple service requests connected with hardware problems detected on RH8100 servers right after their installation.
The problems reported by were as follows:
1. PCIe 10G card does not operate in hotswap slot (on 2 servers).
The cards are not detected by OS, no power indication present on hotswap tray.
2. CPU CAT alarms were detected by iBMC for 8 CPUs (on 1 server).
3. Server power state incorrectly detected by iBMC. Visually confirmed the server is powered on and operational, OS is loaded and responsive, but iBMC shows the server state is power off.
The probability of hardware fault for so many components at different servers in single shipping is negligibly small.
The suggestion is to completely remove power from problemous servers then remove and reinstall Rear IO modules + hotswap PCIe trays for first case, remove and reinstall all 8 CPU boards + all 16 Memory boards for second case, remove and reinstall both BMC management boards for third case. Then power on servers again.
In all 3 cases reseating of modules helped to resolve the issue, no problems are reported since that time. The conclusion is high vibration cargo problem, that led to contact degradation between server modules. Simple reseating of modules does solve the case.
As it is seen from cases above, cargo problem connected with high vibration during delivery to customer may occur. Due to these servers have module structure, contact between modules may degrade in such cargo conditions. When you power on a new RH8100 server and see a hardware problem with one or several modules, do not hurry to replace it, just reseat, this may solve the problem.
To avoid such problems occurrence, when you install a new RH8100, reseat all main server modules (CPU boards, Memory boards, Front and Rear IO modules, PCIe hotswap trays) to guarantee their tight locatin in slots before powering on the server.