Checking the Server
Check the server in the sequence shown in Figure 5-34. The method to be used depends on the actual situation.
For details about CLI commands, see the Atlas 800 AI Training Server iBMC (V3.01.00.00 or Later) User Guide (Model 9000).
Procedure
- Check indicator status.
Ensure that hardware devices are working correctly.
For details, see Front Panel Indicators and Buttons and Rear Panel Indicators (Full Configuration of NPUs).
- Check the server.
- Check the server using the iBMC WebUI.
- Log in to the iBMC WebUI. For details, see Logging In to the iBMC WebUI.
You are advised to change the default password when logging in to the iBMC for the first time. For details, see Changing Initial Passwords.
- Choose iBMC Settings > Firmware Upgrade, and view the version information, as shown in Figure 5-35.
Check that the server versions meet site requirements.
- The server health status is displayed on the menu bar, as shown in Figure 5-36.
No.
Health Status
Description
1
Health indicator status
Displays the number of critical, major, and minor alarms.
2
Power status
Displays server power status. You can click
on the right of the indicator to power on or off the server.
3
UID indicator status
Pinpoints the location of the server in a chassis. You can click
on the right of the indicator to control the state of the UID indicator.
- Clear any alarms if present. For details, see the
Atlas 800 AI Training Server iBMC Alarm Handling (Model 3000 and 9000).
- Log in to the iBMC WebUI. For details, see Logging In to the iBMC WebUI.
- Check the server using the iBMC CLI.
- Set an IP address for the PC. This IP address must be on the same network segment as the iBMC management network port.
- Connect a network cable from the PC to the iBMC management network port of the server.
- Start a Secure Shell (SSH, PuTTY), tool on the PC and log in with the IP address of the iBMC management network port and iBMC user name and password.
The SSH service is enabled by default. If the SSH service is disabled, enable it by choosing Services > Port Services on the iBMC WebUI.
- Run the ipmcget -d version command to query the server version information. Ensure that the versions meet the site requirements.
------------------- iBMC INFO ------------------- IPMC CPU: Hi1711 IPMI Version: 2.0 CPLD Version: (U151)15.13 Active iBMC Version: (U68)6.53 Active iBMC Build: 002 Active iBMC Built: 21:38:24 Feb 12 2020 Backup iBMC Version: 6.53 Available iBMC Version: 6.51 Available iBMC Build: 001 SDK Version: 5.0.70.5 SDK Built: 16:15:25 Feb 7 2020 Active Uboot Version: 5.0.70.5 (16:25:33 Feb 07 2020) Backup Uboot Version: 5.0.70.5 (16:25:33 Feb 07 2020) Active Secure Bootloader Version: 5.0.70.5 (16:25:32 Feb 07 2020) Backup Secure Bootloader Version: 5.0.70.5 (16:25:32 Feb 07 2020) Active Secure Firmware Version: 5.0.70.5 (16:25:33 Feb 07 2020) Backup Secure Firmware Version: 5.0.70.5 (16:25:33 Feb 07 2020) ----------------- Product INFO ----------------- Product ID: 0x0002 Product Name: Atlas 800 (Model 9000) Active BIOS Version: (U249)1.15 Backup BIOS Version: 1.15 -------------- Mother Board INFO --------------- Mainboard BoardID: 0x0092 Mainboard PCB: .A ------------------- NIC INFO ------------------- FLEX IO B3 BoardID: 0x00b8 FLEX IO B3 PCB: .A vsprintf_s content failed!(0) --------------- Riser Card INFO ---------------- Riser1 BoardName: IT21R31A Riser1 BoardID: 0x003d Riser1 PCB: .A Riser2 BoardName: IT21R31A Riser2 BoardID: 0x003d Riser2 PCB: .A -------------- HDD Backplane INFO -------------- Disk BP1 BoardName: IT21BP8A Disk BP1 BoardID: 0x00c1 Disk BP1 PCB: .A Disk BP1 CPLD Version: (U1014)0.11 -------------------- PSU INFO ------------------- PSU1 Version: DC:113 PFC:113 PSU2 Version: DC:113 PFC:113 ------------- Security Module INFO ------------- Specification Type: TPM Specification Version: 2.0 Manufacturer Name: IFX Manufacturer Version: 7.62 ---------------- PCIe Card INFO ---------------- PCIe1 ProductName: IT21SHSC PCIe1 BoardID: 0x005e PCIe1 PCB: .A -------------- NPU/GPU Board INFO -------------- NPUBoard1 BoardName: IT21SD4A NPUBoard1 BoardID: 0x0093 NPUBoard1 PCB: .A NPUBoard1 CPLD Version: (U1152)0.11 NPUBoard2 BoardName: IT21SD4A NPUBoard2 BoardID: 0x0093 NPUBoard2 PCB: .A NPUBoard2 CPLD Version: (U1152)0.11
- CPLD Version: complex programmable logical device (CPLD) version of the server
- BIOS Version: BIOS version of the server
- Active iBMC Version: active iBMC version of the server
- Backup iBMC Version: backup iBMC version of the server
- Query the health status of the server.
iBMC:/->ipmcget -d health System in health state
- If "System in health state" is displayed, no further action is required.
- If alarm information is displayed, go to the next step.
- Query any generated alarms.
iBMC / # ipmcget -d healthevents Event Num | Event Time | Alarm Level | Event Code | Event Description 1 | 2019-02-10 00:52:23 | Minor | 0x12000021 | get description failed. 2 | 2019-02-10 01:37:42 | Minor | 0x12000013 | Failed to obtain data of the air inlet temperature. 3 | 2019-02-10 00:52:23 | Minor | 0x12000019 | Right mounting ear is not present. 4 | 2019-02-10 00:52:19 | Major | 0x28000001 | The SAS or PCIe cable to front disk backplane is incorrectly connected.
- Clear alarms. For details, see the Atlas 800 AI Training Server iBMC Alarm Handling (Model 3000 and 9000).
- Check the server using the iBMC WebUI.