Replacing a Controller (Including the Coin Battery)
This section describes how to replace a controller.
Impact on the System
- During the replacement of a controller, if there is more than one controller, all services on the to-be-replaced controller are switched to the peer-end controller. The service switchover duration increases if the service load is heavy.
- During the replacement of a controller, system performance and reliability are compromised. Therefore, you are advised to replace a controller during off-peak hours.
Prerequisite
- A replacement controller is ready.
- The controller that you want to replace has been located.
You can locate a component in either of the following ways:
- Locating a component based on alarms in DeviceManager: On the Alarms and Events page of DeviceManager, view the help information and determine the ID of the component to be replaced. Then locate the component at your site based on its ID.
- Locating a component based on the status of its Running/Alarm indicator: For details about the indicators on different components, see Indicator Introduction of each component in the Product Description specific to your product model and version.
- Each controller is equipped with three fan modules. Before replacing a controller, ensure that the fan modules of the other controllers work properly.
- After a user manually replaces the public/private key pair of ibc_os_hs, the public/private key pairs of all other functional controllers must be initialized before controller replacement.
To initialize the public/private key pairs of controllers, log in to the serial port of the controller enclosure with the user account _super_admin (the default password is Admin@revive) and run the initibckey command.
- If the storage system in the current version allows SSH port IDs to be configured, the SSH port ID of each controller in the storage system must be 22 by default.
Run show system server_port server_name=SSH to check whether the SSH port ID is 22. If the port ID is not 22, run change system server_port server_name=SSH port_num=22 to change it to 22.
- If the controller to be replaced is in minisystem state (After login to the CLI of the controller, Storage: minisystem> is displayed), contact Huawei technical support engineers for assistance.
Precaution
- Remove and insert a controller with even force. Excessive force may damage the appearance or connectors of the controller.
- Remove only one controller at a time.
- A controller must be replaced within ten minutes. Otherwise, the system heat dissipation is compromised.
- A controller enclosure's controller and assistant cooling unit (ACU) have the same appearance, but can be distinguished according to the silkscreen in the lower left corner of the front panel. CTM refers to a controller and ACU refers to an assistant cooling unit.
- If a single controller fails, use the SmartKit to replace it when the other controller is working properly. If the entire controller enclosure has been powered off, power on the functioning controller before replacing the faulty one.
- If both controllers of a controller enclosure fail, contact Huawei technical support engineers for assistance.
- If the FRU tool fails to identify a controller fault, skip 1, go to 2 to 8.
Tools and Materials
- ESD wrist strap
- ESD bag
- Labels
Procedure
- Check status before replacement.
- Check system status.
Start SmartKit and choose Routine Maintenance > Health Check. On the Health Check page, perform inspection as instructed. For details, see Checking System Status.
If any items fail the inspection, rectify the faults by performing the recommended actions in the inspection reports. Ensure that all other parts except the parts to be replaced are working properly.
- Evaluate the replacement.
Start SmartKit and choose Home > Storage > Parts Replacement > Parts Replacement Evaluation. If any check item fails, rectify the fault as instructed.
- Check the status of the parts to be replaced.
Start SmartKit and choose Parts Replacement > Parts Replacement. On the Parts Replacement page, click FRU Replacement. Then complete the check before the replacement as prompted. For details, see Replacing an FRU.
You can proceed to next steps only when all items pass the pre-replacement check and the replacement page is displayed. If any item fails, rectify the fault as prompted.
- Check system status.
- Wear an ESD wrist strap.
- Press the latch on the controller to release the handles and pull out the controller, as shown in Figure 5-12.
- Take the spare part out of its ESD bag.
- Remove the three fan modules from the original controller and install them on the replacement controller.
- Press the latch on the panel, open the panel, and pull out the three fan modules one by one, as shown in Figure 5-13.
- Insert the three fan modules into the replacement controller and close the panel, as shown in Figure 5-14.
- Place the removed controller into an ESD bag.
- Fully open the handles of the replacement controller, insert the controller into the empty slot until it clicks into place, and close the handles, as shown in Figure 5-15.
- Wait about five minutes and check the status of the Power indicator on the controller to determine whether the controller has been powered on, as shown in Figure 5-16.
- If the Power indicator is steady green, the controller has been powered on.
- If the Power indicator is blinking green and the Alarm indicator is blinking red, the controller is being located.
- If the Power indicator is blinking green (0.5 Hz), the controller is powered on and in the BIOS boot process.
- If the Power indicator is blinking green (2 Hz), the controller is in the operating system boot process or power-off process.
- If the Power indicator is off, the controller is absent or powered off.
1
Power indicator of the controller
2
Alarm indicator of the controller
- Confirm the replacement.
- Perform a post-replacement inspection.
- If the FRU tool can identify the faulty controller before the replacement:
After the parts replacement, return to the SmartKit page and click Replaced. Then complete the parts check after the replacement as prompted.
- If the FRU tool fails to identify the faulty controller before the replacement:
Check the status of the controller Alarm indicator to determine whether the controller is working properly. Figure 5-16 shows the location of the indicators.
- If the Alarm indicator is steady red, an alarm is generated on the controller.
- If the Alarm indicator is blinking red and the Power indicator is blinking green, the controller is being located.
- If the Alarm indicator is off, the controller is working properly.
If the version of the newly installed controller is inconsistent with that of the faulty controller, the controller will synchronize the version after the replacement. This process takes about 30 minutes and the maximum duration is 60 minutes. If the duration exceeds 60 minutes, contact Huawei technical support. During the synchronization, do not remove the controller or power off the storage system.
- If the FRU tool can identify the faulty controller before the replacement:
- Check the system status after the replacement.
On the Parts Replacement page, click Inspection to check the system status again. If any item fails inspection, rectify the fault based on the suggestions in the inspection report.
- Check services on a host for storage-related errors.
- Perform a post-replacement inspection.
Follow-up Procedure
After a controller is replaced, label it to facilitate subsequent operations.