Replacing a Controller (for OceanStor 18500 V5, 18500F V5, 18800 V5, or 18800F V5)
This section describes how to replace a controller.
Impact on the System
During the replacement of a controller, system performance and reliability may be compromised. Therefore, replace a controller during off-peak hours.
Prerequisites
Before replacement, verify that the spare part is intact without any physical damage as well as is not squeezed and deformed. If the spare part is damaged or deformed, contact Huawei technical support to check whether the spare part can be used for replacement.
- The power modules in the controller enclosure that houses the controller you want to replace are working properly (Running/Alarm indicator is steady green). If any power module is abnormal, replace the power module first.
- If the controller that you want to replace is processing services, ensure that its CPU usage does not exceed 40% before replacing it.
- Start SmartKit and choose Home > Storage > Parts Replacement > Parts Replacement Evaluation. If any check item fails, rectify the fault as instructed.
- The spare part is on hand.
- The controller that you want to replace has been located.
You can locate a component in either of the following ways:
- Locating a component based on alarms in DeviceManager: On the Alarms and Events page of DeviceManager, view the help information and determine the ID of the component to be replaced. Then locate the component at your site based on its ID.
- Locating a component based on the status of its Running/Alarm indicator: For details about the indicators on different components, see "Indicator Introduction" of each component in the Product Description specific to your product model and version.
- The cable connection positions of the controller that you want to replace are labeled on the cables.
- After a user manually replaces the public/private key pair of ibc_os_hs, the public/private key pairs of all other functional controllers are initialized before controller replacement.
To check whether ibc_os_hs public/private key pairs are the default ones, see the check message of the FRU tool. To initialize the public/private key pairs, log in to the serial port of the controller enclosure using user account _super_admin and run the initibckey command.
- For storage systems earlier than V500R007C60SPC300, if SSH port IDs can be configured, the SSH port ID of each controller in the storage system must be the default value 22. For V500R007C60SPC300 and later versions, no modification is required.
Run show system server_port server_name=SSH to check whether the SSH port ID is 22. If the port ID is not 22, run change system server_port server_name=SSH port_num=22 to change it to 22.
Precautions
- In scenarios where three controllers in a controller enclosure are faulty and need to be replaced:
- If the three faulty controllers are controllers A, B, and C or controllers A, C, and D, replace controller A or C first.
- If the three faulty controllers are controllers A, B, and D or controllers B, C, and D, replace controller B or D first.
- The four controllers in a controller enclosure are controllers A, C, B, and D from top to bottom.
- If a controller is faulty, you are advised to replace it in a timely manner to avoid scenarios where multiple faulty controllers exist in the same controller enclosure.
- The exterior of a controller is similar to that of an assistant cooling module. They can be distinguished according to the silkscreens on their levers. CTM indicates a controller while ACM indicates an assistant cooling module.
- Remove and insert a controller with even force. Excessive force may damage the appearance of the controller or cause faults.
- Remove only one controller at a time to ensure service continuity.
- A controller must be replaced within ten minutes. Otherwise, the system heat dissipation is compromised.
- If a single controller fails, use the SmartKit to replace it when the other controller is working properly. If the entire controller enclosure has been powered off, power on the functioning controller before replacing the faulty one.
- If multiple controllers of a controller enclosure fail, contact Huawei technical support engineers for assistance.
- If the FRU tool fails to identify a controller fault, skip 1, go to 3 to 9.
Tools and Materials
- ESD wrist strap
- ESD bag
- Labels
Procedure
- Check status before replacement.
- Check system status.
Start SmartKit and choose Home > Storage > Routine Maintenance > Health Check. On the Health Check page, perform inspection as instructed. For details, see Checking System Status.
If any items fail the inspection, rectify the faults by performing the recommended actions in the inspection reports. Ensure that all other parts except the parts to be replaced are working properly.
- Evaluate the replacement.
Start SmartKit and choose Home > Storage > Parts Replacement > Parts Replacement Evaluation. If any check item fails, rectify the fault as instructed.
- Check the status of the parts to be replaced.
Start SmartKit and choose Home > Storage > Parts Replacement > Parts Replacement. On the Parts Replacement page, click FRU Replacement. Then complete the check before the replacement as prompted. For details, see Replacing an FRU.
You can proceed to next steps only when all items pass the pre-replacement check and the replacement page is displayed. If any item fails, rectify the fault as prompted.
- Check system status.
- (Optional) If you need to clear the data of a faulty controller to be replaced, perform operations as instructed in How Can I Clear Data of a To-Be-Replaced Controller?.
- Wear an ESD wrist strap.
- Open the latches on the ejector levers on both sides of the controller, and hold the ejector levers to pull out the controller, as shown in Figure 8-62.
- Put the removed controller into an ESD bag.
- Take the spare part out of its ESD bag.
- Open the ejector levers of the spare part, insert the spare part into the empty slot, and push it into the slot as much as possible, as shown in Figure 8-63.
- Press and hold the ejector levers and push the spare part inwards until the spare part is completely inserted into the slot, as shown in Figure 8-64.
Hold the ejector levers on both sides and push the controller into the slot with even force to prevent poor contact between the controller and the backplane.
- Wait about five minutes and check the status of the Power indicator on the controller to determine whether the controller has been powered on, as shown in Figure 8-65.
- If the Power indicator is steady green, the controller has been powered on.
- If the Power indicator is blinking green and the Alarm indicator is blinking yellow, the controller is being located.
- If the Power indicator is blinking green (0.5 Hz), the controller is powered on and in the BIOS boot process.
- If the Power indicator is blinking green (2 Hz), the controller is in the operating system boot process or power-off process.
- If the Power indicator is off, the controller is absent or powered off.
1
Power indicator of the controller
2
Alarm indicator of the controller
- Confirm the replacement.
- Perform a post-replacement inspection.
- If the FRU tool can identify the faulty controller before the replacement:
After the parts replacement, return to the SmartKit page and click Replaced. Then complete the check after the replacement as prompted.
- If the FRU tool fails to identify the faulty controller before the replacement:Check the status of the controller Alarm indicator to determine whether the controller works properly. Figure 8-65 shows the location of the indicators.
- If the Alarm indicator is steady yellow, an alarm is generated on the controller.
- If the Alarm indicator is blinking yellow and the Power indicator is blinking green, the controller is being located.
- If the Alarm indicator is off, the controller is working properly.
If the version of the newly installed controller is inconsistent with that of the faulty controller, the controller will synchronize the version after the replacement. This process takes about 30 minutes and the maximum duration is 60 minutes. If the duration exceeds 60 minutes, contact Huawei technical support. During the synchronization, do not remove the controller or power off the storage system.
- If the FRU tool can identify the faulty controller before the replacement:
- Check system status.
On the Parts Replacement page, click Inspection to check the system status again. If any item fails inspection, rectify the fault based on the suggestions in the inspection report.
- After the preceding procedure is complete, check services on the host for storage-related errors.
- Perform a post-replacement inspection.
Follow-up Procedure
After the controller is replaced, label it to facilitate subsequent operations.