No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionStorage V100R006C10 Block Storage Service Parts Replacement 06

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Replacing SSD Cache Devices Online

Replacing SSD Cache Devices Online

Scenarios

This operation is used to guide maintenance engineers to replace SSD cache devices in scenarios that the SSD devices are not faulty but their lifecycles are about to expire or the devices have defects. During the replacement, do not stop system services.

NOTE:
If the to-be-replaced SSD device is used by multiple storage pools, do not replace the device online.

Impact on the System

If the system has heavy service loads, this operation will increase the service response time.

Perform this operation during the off-peak hours. You can run the iostat -xm l command on a properly running storage node as user root to view the CPU usage, and you are advised to perform this operation when the %util value of all disks is smaller than 60%.

Prerequisites

Conditions

  • You have obtained the spare part whose specification is no less than the to-be-replaced device.
  • The location of the target device has been confirmed. A replacement label has been attached to the front panel of the device.
  • You have installed the SSD device driver on the server accommodating the to-be-replaced SSD devices.
  • The FusionStorage storage pool is in the normal state and no reconstruction task is running before disk replacement.

    You can check the pool state and reconstruction tasks as follows: On the FusionStorage Block self-service maintenance platform, choose Resource Pool > Storage Pools and view the status of desired storage pools and reconstruction tasks. For example, Figure 2-12 shows that the storage pool is in the normal state and one reconstruction task is running with a progress of 57%.

    Figure 2-12  Checking the pool state and reconstruction tasks

Data

Category

Name

Description

Example

FusionStorage system information

Login address

Used to perform operations, such as, querying the storage pool ID and the management IP address of the server and placing or removing the server into or from the maintenance mode.

https://192.168.40.10/fsportal

Administrator username and password

admin/IaaS@PORTAL-CLOUD8!

Procedure

    Start the storage pool data pre-flush.

    Before you start the data pre-flush in the storage pool where the to-be-replaced SSD cache device belongs, check that the storage pool is running properly. If the pool is not running properly, rectify the fault and then perform this step.

    NOTE:
    A data flush specifies to flush the cached data to the main storage during the target SSD device replacement.

    A data pre-flush specifies to flush certain cached data to the main storage before the SSD cache device replacement to reduce the data flush time required during the replacement.

    1. Use PuTTY to log in to the host accommodating the active FSM node as user dsware.

      If the public and private keys are used to authenticate the login, perform the operations based on Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

    2. Run the following command to start the data pre-flush:

      NOTE:
      • Since the system has been hardened, you need to enter the username and password for login authentication after running the dswareTool command of FusionStorage Block. The default username is cmdadmin, and its default password is IaaS@PORTAL-CLOUD9!.

      • The system supports authentication using environment variables so that you do not need to repeatedly enter the username and password for authentication each time you run the dswareTool command. For details, see Authentication Using Environment Variables.

      • On FusionStorage Block Self-Maintenance Platform, choose Resource Pool > Storage Pools to query the storage pool ID.

      /opt/dsware/client/bin/dswareTool.sh --op startPreflush -id Storage pool ID

      Example: /opt/dsware/client/bin/dswareTool.sh --op startPreflush -id 0

      This operation is risky. Enter y to continue with the operation.

      Enter username cmdadmin and its password as prompted. The default password is IaaS@PORTAL-CLOUD9!.

    3. Run the following command to check the data pre-flush progress and whether the data pre-flush on the server that accommodating the target SSD device completes.

      /opt/dsware/tools/ops_tool/replace_ssd/query_preflush_process.sh

      Enter username cmdadmin and its password as prompted. The default password is IaaS@PORTAL-CLOUD9!.

      Information similar to the following is displayed:
      preflush ssd process[##########][3/3]:100.00%
      pool preflush completed.
      
      If the preflush ssd process value is 100%, the data pre-flush completes.

      After the data pre-flush completes, go to 4.

    Replace the SSD cache device.

    1. Log in to the active FSM node as user dsware and run the following command to pre-process the server accommodating the to-be-replaced SSD device:

      /opt/dsware/tools/ops_tool/replace_ssd/preprocessing_replace_ssd_cache.sh -p Storage Pool ID -a Server management IP address

      Example: /opt/dsware/tools/ops_tool/replace_ssd/preprocessing_replace_ssd_cache.sh -p 0 -a 192.168.2.11

      This operation is risky. Enter y to continue with the operation.

      Enter username cmdadmin and its password as prompted. The default password is IaaS@PORTAL-CLOUD9!.

      Information similar to the following is displayed:
      set server 192.168.2.11 maintain mode success.
      start flush ssd cache on server 192.168.2.11 success.
      flush ssd process[##########]:100.00%
      stop osd service on server on server 192.168.2.11 success.
      preprocessing for replace ssd cache on server 192.168.2.11 completed!
      
      If the flush ssd process value is 100%, the pre-processing completes.

      After the pre-processing completes, go to 5.

      NOTE:
      On FusionStorage Block Self-Maintenance Platform, choose Hardware > Servers. On the displayed server list, query Management IP Address.

      After the pre-processing completes, the server is placed into maintenance mode, do not repeatedly run the command on the same server.

    2. Perform the required operation to replace the target SSD device with a new one based on the actual condition.

      • If the target device to be replaced is an SSD, replace the device without powering off the server.
      • If the target device to be replaced is a NVMe SSD, you do not need to power off the server. However, you need to perform the following operations to logically power off the faulty NVMe SSD for replacement:
        1. On FusionStorage Block Self-Maintenance Platform, choose Hardware > Disks.
        2. Locate the row that contains the NVMe SSD to be replaced and click Power Off, as shown in Figure 2-13.
          Figure 2-13  Logical power-off
        3. Click Yes.
          NOTE:
          If the logical power-off fails, power off the server and then replace the faulty device.
      • If the target device can be replaced only when the server is powered off, power off the server through the server BMC and then replace the device.
      NOTE:

      If a server is placed into maintenance mode, the timeout duration in which the server is removed will be prolonged by 45 minutes. Therefore, you will have 75 minutes to replace the faulty parts. If a server is removed from the storage pool for more than 75 minutes, the storage pool will reconstruct data.

    3. After the new SSD device is installed, power on the server.

      If the server is not powered off, you do not need to power on the server. If a logical power-off is performed to an NVMe SSD device, the server will be automatically powered on.

    4. Check whether the replaced device is an NVMe device and whether hardware DIF was enabled before the fault occurs.

      • If yes, go to 8.
      • If no, go to 9.

    5. Enable hardware DIF.

      If the replaced device is an NVMe device and hardware DIF was enabled before the fault occurs, you need to enable hardware DIF again. For details, see the FusionStorage Block Storage Service Hardware DIF Configuration Guide.

    6. On FusionStorage Block Self-Maintenance Platform, choose Resource Pool > Storage Pools > Disk Topology.

      The Disk Topology page is displayed.

    7. Click the removed SSD device and choose Replace SSD Cache > Yes.

      The SSD device selection dialog box is displayed.

    8. Select an SSD device and click OK to complete the online replacement of the SSD device.

      NOTE:
      If multiple SSD devices on the server are to be replaced, repeat 10 to 11 replace other SSD devices.

    9. Log in to the active FSM node and run the following command to post-process the server accommodating the replaced SSD devices.

      NOTE:

      The switchover of the owning MDC takes a certain period of time. Therefore, if a server of the owning MDC is powered off and the execution of this command fails, you need to wait several minutes before you can run this command again.

      /opt/dsware/tools/ops_tool/replace_ssd/postprocessing_replace_ssd_cache.sh -p Storage pool ID -a Server management IP address

      Example: /opt/dsware/tools/ops_tool/replace_ssd/postprocessing_replace_ssd_cache.sh -p 0 -a 192.168.2.11

      This operation is risky. Enter y to continue with the operation.

      Enter username cmdadmin and its password as prompted. The default password is IaaS@PORTAL-CLOUD9!.

      Information similar to the following is displayed:
      start osd service on server on server 192.168.2.11 success.
      cancel server 192.168.2.11 maintain mode success.
      pool 0 crb process[##########]:100.00%
      pool 0 crb finished.
      postprocessing for replace ssd cache on server 192.168.2.11 completed.
      
      If completed is displayed, the post-processing completes.

      After the post-processing completes, go to 13.

    10. Repeat 4 to 12 to replace the SSD device on another server within the same storage pool.

    Stop the storage pool data pre-flush.

    After all the to-be-replaced SSD devices within the storage pool are replaced, stop the storage pool data pre-flush.

    1. Log in to the active FSM node and run the following command to stop the storage pool data pre-flush:

      /opt/dsware/client/bin/dswareTool.sh --op stopPreflush -id Storage pool ID

      Example: /opt/dsware/client/bin/dswareTool.sh --op stopPreflush -id 0

      This operation is risky. Enter y to continue with the operation.

      Enter username cmdadmin and its password as prompted. The default password is IaaS@PORTAL-CLOUD9!.

      If information similar to the following is displayed, the storage pool data pre-flush is successfully stopped:
      ...
      Operation finish successfully. Result Code:0
      ...

Translation
Download
Updated: 2019-02-01

Document ID: EDOC1000175242

Views: 10650

Downloads: 0

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next