No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionStorage OBS 7.0 Parts Replacement 05

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Replacing Both System Disk Modules

Replacing Both System Disk Modules

System disk modules are used to boot operating systems.

Impact on the System

If both system disk modules of a system fail, the system will break down.

Prerequisites

  • Spare disk modules are ready.
  • The faulty system disk modules have been located.
NOTE:

For details about the slot numbers of system disk modules, see Slot Numbers.

Precautions

  • To prevent damaging disk modules or connectors, remove or install disk modules with even force.
  • When removing a disk module, first remove it from its connector. Wait at least 30 seconds and then remove the disk module completely from the chassis.
  • To prevent disk module damage, wait at least one minute between removal and insertion actions.
  • To avoid system failures, do not reuse disk modules.
  • If a hot patch has been installed on a node, install the patch on the node again after replacing the two system disk modules on the node.

Tools and Materials

  • ESD gloves
  • ESD wrist straps
  • ESD bags
  • Labels

Replacement Process

Replace two system disk modules by following the process shown in Figure 4-1.

Figure 4-1 Process of replacing two system disk modules

Procedure

  1. Optional: Power off the faulty node.

    Log in to DeviceManager and choose Cluster > Hardware. In the faulty node area, click and select Power Off.
    NOTE:

    If you cannot access the faulty node by DeviceManager, use the KVM to log in to the faulty node as user root, and run the poweroff command to power off the node.

  2. Perform scale-in for the service layer.

    1. Log in to the management node as user dfvmanager and run the su - root command to switch to user root.
    2. Run the following command to log in to the CLI of the service layer:

      sudo -u oam service_cli_start -u admin

    3. Run the following command to perform scale-in:

      delete service_cluster_info x.x.x.x

      x.x.x.x indicates the floating IP address of the storage plane of the faulty node.

    4. Run the following command to check all services on the node:

      show service_cluster_info x.x.x.x

      x.x.x.x indicates the floating IP address of the storage plane of the faulty node.

  3. Replace the two system disk modules and power on the faulty node. For details, see Replacing a System Disk Module.
  4. Configure the system disk RAID, install the operating system, and configure networks by referring to the FusionStorage OBS Software Installation Guide.
  5. Optional: If the faulty node is a management node, reinstall the DeployManager tool.

    1. Change the passwords of the root and dfvmanager accounts for logging in to the faulty node to be the same as those for logging in to the remote management node.
    2. Log in to the remote management node as user dfvmanager and go to the software directory of DeployManager.
    3. Run the sh action/install.sh remote command and enter the passwords of accounts root and dfvmanager as prompted. The system automatically installs DeployManager on the faulty node.

  6. Log in to DeployManager on the management node. On the Host Management page, locate the row that the faulty node resides and click . Then select Reinstall Host and Reinstall Microservice to reinstall the host and microservices, respectively.
  7. If certificate replacement has been performed on the faulty node, replace the certificate of the node with the one used before the node became faulty by referring to Management of Security Certificates and Keys in the FusionStorage OBS Security Maintenance.
  8. Add a dsware client.

    Log in to the CLI of the primary management node as user dfvmanager, switch to the /opt/dfv/oam/oam-p/client/bin/ directory, and run the following command to add a client. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

    ./dswareTool.sh --op createDSwareClient -ip x.x.x.x -nodetype 0

    x.x.x.x indicates the floating IP address of the storage plane of the faulty node.

  9. Restore storage resources.

    1. Obtain the storage pool ID. Log in to the CLI of the primary management node as user dfvmanager, and run the following command to obtain the storage pool ID. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

      sh /opt/dfv/oam/oam-p/client/bin/dswareTool.sh --op queryStoragePool

      Information about all storage pools is displayed in the command output. poolId in the leftmost column displays IDs of all storage pools.

    2. Log in to the CLI of the primary management node as user dfvmanager, and run the following command to check whether the node is removed from the storage pool. If the node is faulty for a period of time (usually seven days), the node will be removed from the storage pool. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

      sh /opt/dfv/oam/oam-p/client/bin/dswareTool.sh --op queryStorageNodeInfo -id poolId

      Check whether the node is included in the command output. If yes, the node has not been removed from the storage pool. In this case, go to 9.c to 9.e. If no, the node has been removed from the storage pool. In this case, go to 9.f to 9.j.

    3. Check whether the node is a control node. Log in to the CLI of the primary management node, run the su - root command to switch to user root, and run the following command to check the role of the faulty node:

      sh /opt/dfv/oam/oam-p/tools/ops_tool/emergency/fsa/recover_conf/get_server_role.sh x.x.x.x

      x.x.x.x indicates the management IP address of the faulty node. The command is successfully executed if the following information is displayed:

      ZK:1,MDC:1,VBS:1,VFS:1,OSD:1
      MDC_ID:1,MDC_PORT:10530,STORAGE_IP1:192.168.10.2,STORAGE_IP2:192.168.10.2
      VFS_ID:2,VFS_PORT:11901,VFS_DEV:Bond0  POOL_ID:0

      In the preceding information, 1 in the first line indicates that the node has the specific role, and 0 indicates that the node does not have the specific role. If the MDC value is 1, the node is a control node. If the MDC value is 0, the node is not a control node.

    4. If the node is a control node, log in to the CLI of the primary management node as user dfvmanager and run the following command to recover the faulty node. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

      sh /opt/dfv/oam/oam-p/client/bin/dswareTool.sh --op restoreControlNode -ip x.x.x.x

      x.x.x.x indicates the management IP address of the faulty node.

    5. No matter whether the node is a control node or not, log in to the CLI of the primary management node as user dfvmanager and run the following command to restore storage resources. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

      sh /opt/dfv/oam/oam-p/client/bin/dswareTool.sh --op restoreStorageNode -ip x.x.x.x -p poolId

      x.x.x.x indicates the floating IP address of the storage plane of the faulty node, and poolId indicates the ID of the storage pool to which the node belongs. If the node is added to multiple storage pools, run this command for multiple times using the IDs of the storage pools to recover storage resources.

    6. Log in to the CLI of the primary management node as user dfvmanager, and run the following command to add a host. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

      sh /opt/dfv/oam/oam-p/client/bin/dswareTool.sh --op addHost -manageIp x.x.x.x -serverType Server type -rackId rackId

      x.x.x.x indicates the management IP address of the faulty node, Server type indicates the server model of the node, and rackId indicates the cabinet ID.

    7. Run the following command to scan for available disks. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

      sh /opt/dfv/oam/oam-p/client/bin/dswareTool.sh --op scanMediaForPool -ip x.x.x.x

      x.x.x.x indicates the management IP address of the faulty node.

    8. Modify the generated dsware.xml file based on the number of nodes and disks to be added to the storage pool.

      vim /opt/dfv/oam/oam-p/client/conf/dsware.xml

      Locate the corresponding node and modify the dsware.xml file. For example, find the node whose management IP address is 10.40.246.28.

      <server>
      <ip needModify="false">10.40.246.28</ip>
      <mainStorageStartSlot needModify="When main storage is sas or sata disk">0</mainStorageStartSlot> <mainStorageEndSlot needModify="When main storage is sas or sata disk">4</mainStorageEndSlot>
      <media>
      <devName needModify="false">hioa</devName>
      <phySlot needModify="false">99</phySlot>
      <phyEsn needModify="false"/>
      <mediaType needModify="false">ssd_card</mediaType>
      <mediaSize needModify="false">4</mediaSize>
      <mediaRole enum="main_storage;osd_cache;no_use" needModify="true">osd_cache</mediaRole>
      <usedCacheSize needModify="false">0</usedCacheSize>
      </media>
      </server>
      NOTE:
      • If HDDs (SATA or SAS disks) or SSDs are used as main storage, you need to enter the start slot number.

        mainStorageStartSlot: indicates the start slot number for adding disks.

        mainStorageEndSlot: indicates the end slot number for adding disks.

      • SSD card usage: When SSD cards are used as main storage, and mediaRole is set to main_storage, you do not need to enter the start slot number. When HDD disks are used as main storage and SSD cards are used as cache, set mediaRole to osd_cache.
      • SSD usage: When SSDs are used as main storage, set mediaRole to no_use. When SSDs are used as the cache, set mediaRole to osd_cache.
    9. Run the following command to expand the capacity of the storage pool. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

      sh /opt/dfv/oam/oam-p/client/bin/dswareTool.sh --op addStorageNode -poolId poolId -mediaType Disk type of main storage -cacheType Cache type -f /opt/dfv/oam/oam-p/client/conf/dsware.xml

      In the preceding command, poolId indicates the ID of the storage pool to be expanded. Disk type of main storage is the disk type of main storage, and the value can be sata_disk, sas_disk, ssd_disk, or ssd_card. Cache type is the disk type of caches, and the value can be ssd_disk or ssd_card. Cache type is mandatory only when the storage pool has caches.

      After the capacity expansion command is executed, a task ID is displayed in the command output. You can use the task ID to query the progress of the capacity expansion task.

    10. Query the progress of the capacity expansion task. To run this command, you need to enter super administrator account admin of the CLI and the corresponding password.

      sh /opt/dfv/oam/oam-p/client/bin/dswaretool.sh --op querytaskloginfo -taskid Task ID

      Task ID is the task ID displayed after you run the capacity expansion command in 9.i.

  10. Check whether the shardserver component at the index layer is recovered successfully.

    1. Log in to any node where the indexclient component is deployed, and run the su - root command to switch to user root. Then run the following command:

      sh /opt/dfv/index_layer/client/27010/action/mongoshell client

    2. After entering the mongoshell CLI, run the following command:

      use config

      db.shards.find()

      MongoDB shell version v3.4.0-release
      connecting to: mongodb://127.0.0.1:27010
      MongoDB server version: 3.4.0-release
      Server has startup warnings:
      [2017-12-06 09:31:26.262+0800][INFO][CONTROL ][main][81909,13][]
      [2017-12-06 09:31:26.263+0800][INFO][CONTROL ][main][81909,13][** WARNING: Access control is not enabled for the database.]
      [2017-12-06 09:31:26.263+0800][INFO][CONTROL ][main][81909,13][**          Read and write access to data and configuration is unrestricted.]
      [2017-12-06 09:31:26.263+0800][INFO][CONTROL ][main][81909,13][]
      mongos> use config
      switched to db config
      mongos> db.shards.find()
      { "_id" : "shard00000000000000000009", "host" : "shard00000000000000000009/192.168.183.14:27021", "state" : 1, "extendIPs" : "10.40.183.14", "processIdentity" : "95347_1512570785008_63744198" }
      { "_id" : "shard00000000000000000010", "host" : "shard00000000000000000010/192.128.183.20:27021", "state" : 1, "extendIPs" : "10.40.183.20", "processIdentity" : "4938_1512570876995_80642166" }

      If the IP address of the faulty node is displayed in the extendIPs column and the value of state is 1, the shardserver component is successfully recovered. Otherwise, contact Huawei technical support.

  11. Check whether the configserver component at the index layer is recovered successfully.

    1. Log in to any node where the configserver component is deployed, and run the su - root command to switch to user root. Then run the following command:

      sh /opt/dfv/infrastructure/zookeeper4sl/bin/zkCli.sh -server x.x.x.x:10000

      x.x.x.x indicates the control plane IP address of any other node.

      [root@dfv ~]# sh /opt/dfv/infrastructure/zookeeper4sl/bin/zkCli.sh -server 192.168.230.104:10000
      Connecting to 192.168.230.104:10000
      WATCHER::
      WatchedEvent state:SyncConnected type:None path:null
      Welcome to ZooKeeper!
      JLine support is enabled
      [zk: 192.168.230.104:10000(CONNECTED) 0]
    2. Run the following commands:

      ls /DFV/Index_layer/election

      ls /DFV/Index_layer/configserverlist

      [zk: 192.168.230.104:10000(CONNECTED) 0] ls /DFV/Index_layer/election
      [192.168.230.1060000000021, 192.168.230.760000000020, 192.168.230.1040000000019]
      [zk: 192.168.230.104:10000(CONNECTED) 1] ls /DFV/Index_layer/configserverlist
      [192.168.230.104:27015;10.10.230.104:27016,10.10.185.202:7443;az1, 192.168.230.76:27015;10.10.230.76:27016,10.10.185.202:7443;az1, 192.168.230.106:27015;10.10.230.106:27016,10.10.185.202:7443;az1]
      [zk: 192.168.230.104:10000(CONNECTED) 2]

      Check whether the IP address of the faulty node exists in the command output. If the IP address exists, the recovery is successful. Otherwise, contact Huawei technical support.

  12. Check whether the indexclient component at the index layer is recovered successfully.

    Log in to the faulty node and run the su - root command to switch to user root. Then run the following command. If the mongoshell CLI window is displayed, the indexclient component is recovered successfully.

    sh /opt/dfv/index_layer/client/27010/action/mongoshell client

    MongoDB shell version v3.4.0-release
    connecting to: mongodb://127.0.0.1:27010
    MongoDB server version: 3.4.0-release
    Welcome to the MongoDB shell.
    For interactive help, type "help".
    For more comprehensive documentation, see
          http://docs.mongodb.org/
    Questions? Try the support group
          http://groups.google.com/group/mongodb-user
    Server has startup warnings:
    [2017-12-21 01:52:03.636-0500][INFO][CONTROL ][main][5132,4][]
    [2017-12-21 01:52:03.637-0500][INFO][CONTROL ][main][5132,4][** WARNING: Access control is not enabled for the database.]
    [2017-12-21 01:52:03.637-0500][INFO][CONTROL ][main][5132,4][**          Read and write access to data and configuration is unrestricted.]
    [2017-12-21 01:52:03.637-0500][INFO][CONTROL ][main][5132,4][]
    mongos> 

  13. Check the system status.

    On SmartKit, choose Home > Storage > Routine Maintenance > More > Inspection and check the system status.
    • If all inspection items pass the inspection, the inspection is successful.
    • If some inspection items fail, the inspection fails. Rectify the faults by taking recommended actions in the inspection reports. Perform inspection again after fault rectification. If the inspection still fails, contact Huawei technical support.

    For details, see the FusionStorage OBS Administrator Guide.

Follow-up Procedure

Label the replaced system disk modules to facilitate subsequent operations.

Translation
Download
Updated: 2019-07-05

Document ID: EDOC1100051325

Views: 4908

Downloads: 2

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next