Checking the Server
Check the server in the sequence shown in Figure 5-37. The method to be used depends on the actual situation.
For details about CLI commands, see Atlas 800 AI Training Server iBMC (V3.01.00.00 or Later) User Guide (Model 9010).
Procedure
- Check indicator status.
Ensure that hardware devices are working correctly.
For details, see Front Panel Indicators and Buttons and Rear Panel Indicators (Full Configuration of NPUs).
- Check the server.
- Check the server using the iBMC WebUI.
- Log in to the iBMC WebUI. For details, see Logging In to the iBMC WebUI.
You are advised to change the default password when logging in to the iBMC for the first time. For details, see Changing Initial Passwords.
- Choose iBMC Settings > Firmware Upgrade, and view the version information, as shown in Figure 5-38.
Check that the server versions meet site requirements.
- The server health status is displayed on the menu bar, as shown in Figure 5-39.
No.
Health Status
Description
1
Health indicator status
Displays the number of critical, major, and minor alarms.
2
Power status
Displays server power status. You can click
on the right of the indicator to power on or off the server.
3
UID indicator status
Pinpoints the location of the server in a chassis. You can click
on the right of the indicator to control the state of the UID indicator.
- Clear any alarms if present. For details, see Atlas 800 AI Training Server iBMC (V3.01.00.00 or Later) Alarm Handling (Model 9010).
- Log in to the iBMC WebUI. For details, see Logging In to the iBMC WebUI.
- Check the server using the iBMC CLI.
- Set an IP address for the PC. This IP address must be on the same network segment as the iBMC management network port.
- Connect a network cable from the PC to the iBMC management network port of the server.
- Start a Secure Shell (SSH, PuTTY), tool on the PC and log in with the IP address of the iBMC management network port and iBMC user name and password.
The SSH service is enabled by default. If the SSH service is disabled, enable it by choosing Services > Port Services on the iBMC WebUI.
- Run the ipmcget -d version command to query the server version information. Ensure that the versions meet the site requirements.
------------------- iBMC INFO ------------------- IPMC CPU: Hi1711 IPMI Version: 2.0 CPLD Version: (U4451)1.03 Active iBMC Version: (U4433)3.01.05.01 Active iBMC Build: 001 Active iBMC Built: 12:18:11 Jun 1 2020 Backup iBMC Version: 3.01.05.01 Available iBMC Version: 3.01.05.01 Available iBMC Build: 001 SDK Version: 8.0.30.3 SDK Built: 17:14:59 May 26 2020 Active Uboot Version: 8.0.30.3 (17:35:42 May 26 2020) Backup Uboot Version: 8.0.30.3 (17:35:42 May 26 2020) Active Secure Bootloader Version: 8.0.30.3 (17:35:41 May 26 2020) Backup Secure Bootloader Version: 8.0.30.3 (17:35:41 May 26 2020) Active Secure Firmware Version: 8.0.30.3 (17:35:41 May 26 2020) Backup Secure Firmware Version: 8.0.30.3 (17:35:41 May 26 2020) ----------------- Product INFO ----------------- Product ID: 0x0002 Product Name: Atlas 800 (Model 9010) Active BIOS Version: (U47)5.38 Backup BIOS Version: 5.38 -------------- Mother Board INFO --------------- Mainboard BoardID: 0x0052 Mainboard PCB: .A --------------- Riser Card INFO ---------------- Riser1 BoardName: IT21R11A Riser1 BoardID: 0x003e Riser1 PCB: .A Riser2 BoardName: IT21R11A Riser2 BoardID: 0x003e Riser2 PCB: .A -------------------- PSU INFO ------------------- PSU1 Version: DC:115 PFC:115 PSU2 Version: DC:115 PFC:115 PSU3 Version: DC:115 PFC:115 PSU4 Version: DC:113 PFC:113 -------------- NPU/GPU Board INFO -------------- NPUBoard1 BoardName: IT21SD4A NPUBoard1 BoardID: 0x0093 NPUBoard1 PCB: .C NPUBoard1 CPLD Version: (U1152)1.02 NPUBoard2 BoardName: IT21SD4A NPUBoard2 BoardID: 0x0093 NPUBoard2 PCB: .C NPUBoard2 CPLD Version: (U1152)1.02
- CPLD Version: complex programmable logical device (CPLD) version of the server.
- BIOS Version: BIOS version of the server.
- Active iBMC Version: active iBMC version of the server.
- Backup iBMC Version: backup iBMC version of the server.
- Query the health status of the server.
iBMC:/->ipmcget -d health System in health state
- If "System in health state" is displayed, no further action is required.
- If alarm information is displayed, go to the next step.
- Query any generated alarms.
iBMC / # ipmcget -d healthevents Event Num | Event Time | Alarm Level | Event Code | Event Description 1 | 2019-02-10 00:52:23 | Minor | 0x12000021 | get description failed. 2 | 2019-02-10 01:37:42 | Minor | 0x12000013 | Failed to obtain data of the air inlet temperature. 3 | 2019-02-10 00:52:23 | Minor | 0x12000019 | Right mounting ear is not present. 4 | 2019-02-10 00:52:19 | Major | 0x28000001 | The SAS or PCIe cable to front disk backplane is incorrectly connected.
- Clear alarms. For details, see Atlas 800 AI Training Server iBMC (V3.01.00.00 or Later) Alarm Handling (Model 9010).
- Check the server using the iBMC WebUI.