No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionStorage 8.0.0 Block Storage Parts Replacement 03

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Replacing Both System Disk Modules

Replacing Both System Disk Modules

System disk modules are used to boot operating systems.

Impact on the System

If both system disk modules of a system fail, the system will break down.

Prerequisites

  • Spare disk modules are ready.
  • The faulty system disk modules have been located.
NOTE:

For details about the slot numbers of system disk modules, see Slot Numbers.

Precautions

  • To prevent damaging disk modules or connectors, remove or install disk modules with even force.
  • When removing a disk module, first remove it from its connector. Wait at least 30 seconds and then remove the disk module completely from the chassis.
  • To prevent disk module damage, wait at least one minute between removal and insertion actions.
  • To avoid system failures, do not reuse disk modules.
  • If a hot patch has been installed on a node, install the patch on the node again after replacing the two system disk modules on the node.

Tools and Materials

  • ESD gloves
  • ESD wrist straps
  • ESD bags
  • Labels

Context

Log in to the node as user dsware and run the following command to check whether the node to be restored is the management node:

Run the su - dmdbadmin -c "/opt/fusionstorage/deploymanager/gaussdb/app/bin/gsql -p 7018 -d cmdb command, enter the password (default password: Huawei12#$), and run the select role from BLOCK_NODE_ROLE,BLOCK_NODE_DETAIL where BLOCK_NODE_ROLE.NODE_ID=BLOCK_NODE_DETAIL.ID and BLOCK_NODE_DETAIL.MANAGEMENT_INTERNAL_IP='X.X.X.X' command.

In the preceding command, X.X.X.X indicates the management IP address of the node on which faulty system disks are to be replaced.

Replacing System Disks on Management Nodes (Deployed Independently on Physical Nodes)

  1. Replace the two system disk modules and power on the faulty node. For details, see Replacing a System Disk Module.
  2. Configure RAID for system disks, install the OS, and configure networks. For details, see FusionStorage Block Storage Product Documentation.

    1. Install the OS and NVMe SSD driver. If no NVMe SSDs or cards are used, you do not need to install the NVMe driver. For details, see "Installing a Server Operating System" in FusionStorage Block Storage Product Documentation.
    2. Configure the node IP address. For details, see "Configuring a Server IP Address" in FusionStorage Block Storage Software Installation Guide.
    3. Run the following commands to perform security hardening for the system:

      sh /opt/reinforce_os/reinforce_os.sh install

      sh /opt/reinforce_os/reinforce_os.sh check

  3. If the certificate used by the faulty node is to be replaced, see "Security Configuration Guide" in FusionStorage Block Storage Product Documentation.
  4. Restore the services on the management node.

    1. Download the product package in the current storage system and upload the product package to the faulty management node. If the product package has been upgraded, use the upgraded version.
    2. On the primary management node, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool --disabledsync --name=upgrade command to disable file synchronization.
    3. Run the tar -xvzf FusionStorage_8.0.0.*.tar.gz command to decompress the product package, and run the cd FusionStorage_8.0.0.*/deploymanager/preinstall command to go to the preinstall directory.
    4. View the /home/HAInfoFromInstallTool.properties file information on a normal node, and set the HAInfoFromInstallTool.properties file information based on the detailed deployment information. The information must be consistent with that in the detailed deployment information table. Then, copy the HAInfoFromInstallTool.properties file to the /home directory. Table 4-1 describes the detailed deployment information.
      1. If the values of ha_role and init_role on the normal management node are primary, fill in the deployment information of the faulty node by referring to related information of the normal node. Modify the values of ha_role and init_role in the HAInfoFromInstallTool.properties file of the faulty node to standby, and set path pkg_path as the absolute path for product package upload.
      2. If the values of ha_role and init_role on the normal management node are standby, fill in the deployment information of the faulty node by referring to related information of the normal node. Modify the values of ha_role and init_role in the HAInfoFromInstallTool.properties file of the faulty node to primary, and set path pkg_path as the absolute path for product package upload.
        Table 4-1 Detailed deployment information

        Field

        Description

        Example

        ha_role

        Indicates the HA role, which works with a normal node in primary/secondary mode.

        primary

        init_role

        Indicates the initial role, which is generally the same as ha_role.

        primary

        local_sn

        Indicates the SN of the primary node. You can use the default value. The value must be the same as that of a normal node.

        remote_sn

        Indicates the SN of the secondary node. You can use the default value. The value must be the same as that of a normal node.

        local_cabinet

        Indicates the cabinet number of the primary node, which must be the same as that of a normal node.

        remote_cabinet

        Indicates the cabinet number of the secondary node, which must be the same as that of a normal node.

        ha_mode

        Indicates the HA mode. Possible values are single and double, which must be the same as that of a normal node.

        double

        net_mode

        Indicates the network mode.

        single

        service_float_ip

        Indicates the floating IP address of the external management network, which must be the same as that of a normal node.

        100.2.10.76

        service_gateway

        Indicates the gateway of the external management network, which must be the same as that of a normal node.

        100.2.0.1

        service_mask

        Indicates the subnet mask of the external management network, which must be the same as that of a normal node.

        255.255.0.0

        service_local_ip

        Indicates the IP address of the external primary management node, which must be the same as that of a normal node.

        100.2.5.94

        service_local_port

        Indicates the port number of the external primary management node, which must be the same as that of a normal node.

        bond0

        service_remote_ip

        Indicates the IP address of the external secondary management node, which must be the same as that of a normal node.

        100.2.5.62

        service_remote_port

        Indicates the port number of the external secondary management node, which must be the same as that of a normal node.

        bond0

        manager_float_ip

        Indicates the floating IP address of the internal management network, which must be the same as that of a normal node.

        100.2.10.76

        manager_gateway

        Indicates the gateway of the internal management network, which must be the same as that of a normal node.

        100.2.0.1

        manager_mask

        Indicates the subnet mask of the internal management network, which must be the same as that of a normal node.

        255.255.0.0

        manager_local_ip

        Indicates the IP address of the internal primary management node, which must be the same as that of a normal node.

        100.2.5.94

        manager_local_port

        Indicates the port number of the internal primary management node, which must be the same as that of a normal node.

        bond0

        manager_remote_ip

        Indicates the IP address of the internal secondary management node, which must be the same as that of a normal node.

        100.2.5.62

        manager_remote_port

        Indicates the port number of the internal secondary management node, which must be the same as that of a normal node.

        bond0

        local_host_name

        Indicates the host name of the primary node, which must be the same as that of a normal node.

        FSM01

        remote_host_name

        Indicates the host name of the secondary node, which must be the same as that of a normal node.

        FSM02

        active_ip

        Indicates the IP address of the primary HA node, which must be the same as the value of manager_local_ip.

        100.2.5.94

        standby_ip

        Indicates the IP address of the secondary HA node, which must be the same as the value of manager_remote_ip.

        100.2.5.62

        float_ip_for_ha

        Indicates the HA floating IP address, which must be the same as that of a normal node.

        100.2.10.76

        install_ha

        Indicates whether HA is installed.

        true

        service_ip_list

        Indicates the list of the internal management IP addresses to be installed. Use commas (,) to separate the IP addresses.

        100.2.5.94,100.2.5.62

        internal_ethname

        Indicates the internal NIC name, which must be the same as that on a normal node.

        bond0

        external_ethname

        Indicates the external NIC name, which must be the same as that on a normal node.

        external_service_gateway

        Indicates the external gateway, which must be the same as that of a normal node.

        external_service_mask

        Indicates the external mask, which must be the same as that of a normal node.

        external_service_float_ip

        Indicates the floating IP address of the external management network, which must be the same as that of a normal node.

        100.2.10.76

        pkg_path

        Indicates the absolute path of the product package.

        /home/FusionStorage_8.0.0.20190714165359.tar.gz

  5. Go to the deploymanager/preinstall directory where the product package is decompressed and run the sh install.sh manage /home/HAInfoFromInstallTool.properties command to reinstall the management node software.
  6. Run the rm -rf /home/.preinstall command to clear the pre-installation flag and delete the product package and decompressed package uploaded during the restoration.
  7. After the installation is complete, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool --enabledsync --name=upgrade command on the primary management node to resume file synchronization.
  8. Run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –syncallfile command on the primary management node to trigger file synchronization. After data synchronization is complete, run the following command:

    /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –getsyncfilestatus;

    Run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –getsyncfileprocess command to check the synchronization result.

    After data synchronization is complete, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool --cancelforbidswitch --name=oam_u command to enable the HA function.

  9. Check the system status.

    On SmartKit, choose Home > Storage > Routine Maintenance > More > Inspection and check the system status.
    • If all inspection items pass the inspection, the inspection is successful.
    • If some inspection items fail, the inspection fails. Rectify the faults by taking recommended actions in the inspection reports. Perform inspection again after fault rectification. If the inspection still fails, contact Huawei technical support.

    For details, see the FusionStorage Block Storage Administrator Guide.

Replacing System Disks on a Non-Management Node

  1. Log in to the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op setServerStorageMode -ip x.x.x.x-mode 1 command to switch to the maintenance mode. In the command, x.x.x.x indicates the IP address of the node on which faulty system disks are to be replaced. To run this command, enter the name and password of CLI super administrator account admin as prompted.
  2. Replace the two system disk modules and power on the faulty node. For details, see Replacing a System Disk Module.
  3. Configure the system disk RAID, install the operating system, and configure networks by referring to FusionStorage Block Storage Software Installation Guide.

    1. Install the OS and NVMe SSD driver. If no NVMe SSDs or cards are used, you do not need to install the NVMe driver. For details, see "Installing a Server Operating System" in FusionStorage Block Storage Product Documentation.
    2. Configure the node IP address and bond ports. For details, see "Configuring a Server IP Address" in FusionStorage Block Storage Software Installation Guide.

      You can obtain the node IP address in the following ways:

      1. Log in to the primary management node as user root and run the /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9! command to connect to the database. In the command, 21600 indicates the default port number and IaaS@DATABASE-PublicCLOUD9! indicates the default password.
      2. Run the set search_path=dsware_1; command to set the query path.
      3. Run the following commands to query the database:

        Select STORAGEIP, CLUSTERIP from TBL_SERVER_INFO where MANAGEIP='x.x.x.x ';

        The command outputs correspond to the back-end storage IP address (STORAGEIP) and front-end storage IP address (CLUSTERIP).

    3. Run the following commands to perform security hardening for the system:

      sh /opt/reinforce_os/reinforce_os.sh install

      sh /opt/reinforce_os/reinforce_os.sh check

  4. If certificate replacement has been performed on the faulty node, replace the certificate used by the node with the one used before the node became faulty by referring to Security Configuration in the FusionStorage Block Storage Security Configuration Guide.
  5. Log in to the primary management node as an administrator and run the ismcli -u admin command to log in to the CLI. To run the command, you need to enter the password of the CLI super administrator admin.
  6. Run the following commands to restore the software on the faulty node:

    restore node software node_ip=x.x.x.x user_name=xxxx

    password:**********

    root_password:**********

    In the preceding commands, node_ip indicates the internal management IP address of the faulty node, user_name indicates a user name for logging in to the faulty node, password indicates the password of the user name, and root_password indicates the password of user root on the faulty node. After the commands are executed successfully, you can obtain the task type and task ID.

  7. Run the show node deploy_task task_id=* command to query the software restoration progress of the faulty node. In the preceding command, task_id indicates the ID of the task to be queried. You can proceed with the next step only after the task is successfully executed.
  8. Restore basic services such as SNM and FDSA.

    Log in to the CLI on the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op restoreBasicServices -ip x.x.x.x command to add a client. In the command, x.x.x.x indicates the management IP address of the storage node. To run the preceding command, you need to enter super administrator account admin and its password as prompted.

  9. Restore the management cluster.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the following command to set the query path:

      set search_path=dsware_1;

    3. Run the following command to check whether the host to be restored has the control node role:

      Select * from tbl_mdc_server_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x');. In the command, x.x.x.x indicates the management IP address of the node to be restored.

      If the command output is not 0, the node to be restored has the control node role. In this case, go to 9.d. Otherwise, run the following command:

      Select * from TBL_ZK_SERVER_INFO where serverid = (select id from tbl_server_info where manageip='x.x.x.x');. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the node to be restored has a control node role.

    4. Run the \q command to exit the database. If the node has a control node role in 9.c, go to 9.e. Otherwise, go to 10 to restore the dsware client.
    5. Log in to the CLI of the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op restoreControlNode -ip x.x.x.x-formatZkDiskFlag false command to add a client. In the command, x.x.x.x indicates the management IP address of the node to be restored. To run the preceding command, you need to enter super administrator account admin and its password as prompted.

  10. Restore the dsware client.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the set search_path=dsware_1; command to set the query path.
    3. Run the Select * from tbl_vbs_info where serverid = (select id from tbl_server_info where manageip= ' x.x.x.x'); command on the primary management node to check whether the dsware client needs to be restored. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the dsware client needs to be restored. Run the \q command to exit the database and go to 10.d. Otherwise, run the \q command to exit the database.
    4. Log in to the CLI of the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op createDSwareClient -ip x.x.x.x-nodetype 0 command to add a client. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the node to be restored.

  11. Restore storage resources.

    1. Log in to the primary management node as user root, and run the following command to connect to the database (21600 indicates the default port number and IaaS@DATABASE-PublicCLOUD9! indicates the default password).

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the set search_path=dsware_1; command to set the query path.
    3. Run the Select * from tbl_service_process_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x') and processtype='EBSCtrl' and servicetype='eds'; command on the primary management node to check whether the dsware client needs to be restored. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, run the \q command to exit the database and go to 11.d. Otherwise, run the \q command to exit the database.
    4. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op queryStoragePool command to obtain the storage pool ID. To run this command, enter the name and password of CLI super administrator account admin as prompted. Information about all storage pools is displayed in the command output. Pool ID in the leftmost column displays IDs of all storage pools.
    5. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op queryStorageNodeInfo -id poolId command to check whether the node has been removed from the storage pool. (If the node has been faulty for a certain period (usually 7 days), the node will be removed from the storage pool.) To run this command, enter the name and password of CLI super administrator account admin as prompted. Check whether the command output contains the node. If yes, the node has not been removed from the storage pool. In this case, go to 11.f. If no, the node has been removed from the storage pool. In this case, proceed with 11.g and 11.h.
    6. Log in to the CLI of the primary management node (control and non-control nodes as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op restoreStorageNode -ip x.x.x.x -p poolId command to restore storage resources. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the floating IP address of the storage plane on the faulty node, and poolId indicates the ID of the storage pool to which the node belongs. If the node is added to multiple storage pools, run the command multiple times using the IDs of the storage pools to restore storage resources.
    7. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op deleteStorageNode -ip x.x.x.x -id poolId command to restore storage resources. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the node to be restored, and poolId indicates the ID of the storage pool to which the node belongs.

      If the node is added to multiple storage pools, run this command multiple times using the IDs of the storage pools.

    8. On DeviceManager, re-add the node to the storage pool.

  12. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op setServerStorageMode -ip x.x.x.x -nodetype 0 command to switch to the normal mode. To run this command, enter the name and password of CLI super administrator account admin as prompted.
  13. Check the system status.

    On SmartKit, choose Home > Storage > Routine Maintenance > More > Inspection and check the system status.
    • If all inspection items pass the inspection, the inspection is successful.
    • If some inspection items fail, the inspection fails. Rectify the faults by taking recommended actions in the inspection reports. Perform inspection again after fault rectification. If the inspection still fails, contact Huawei technical support.

    For details, see the FusionStorage Block Storage Administrator Guide.

Replacing System Disks in Scenarios Where Management Nodes (Deployed on Storage or Compute Nodes) Are Deployed in Converged Mode

  1. Log in to the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh –op setServerStorageMode -ip x.x.x.x -mode 1 command to switch to the maintenance mode. In the command, x.x.x.x indicates the IP address of the node on which faulty system disks are to be replaced. To run this command, enter the name and password of CLI super administrator account admin as prompted.
  2. Replace the two system disk modules and power on the faulty node. For details, see Replacing a System Disk Module.
  3. Configure the system disk RAID, install the operating system, and configure networks by referring to FusionStorage Block Storage Software Installation Guide.

    1. Install the OS and NVMe SSD driver. If no NVMe SSDs or cards are used, you do not need to install the NVMe driver. For details, see "Installing a Server Operating System" in FusionStorage Block Storage Product Documentation.
    2. Configure the node IP address and the bond mode. For details, see section "Configuring a Server IP Address" in FusionStorage Block Storage Software Installation Guide. The node IP address and bond mode must be the same as those of a normal node. In the same storage system, the bond mode of nodes at the front end must be the same, and the bond mode of nodes at the back end must be the same as well. Therefore, you can obtain the front- and back-end bond modes from nodes that are running properly. For example, you can log in to a normal node, run the opt/fusionstorage/agent/script/dsware_agent_handle.sh inquiry_disk get_storage_ip_and_type command to obtain the back-end IP address, run the /opt/fusionstorage/agent/script/dsware_agent_handle.sh inquiry_disk get_cluster_ip_and_type command to obtain the front-end IP address, and run the ip addr |grep storage_ip command to obtain the network port. In the preceding commands, storage_ip indicates the front-end IP address or back-end IP address of the storage system. You can view the network port configuration file to obtain the bond mode.

      You can obtain the node IP address in the following ways:

      1. Log in to the primary management node as user root and run the /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9! command to connect to the database. In the command, 21600 indicates the default port number and IaaS@DATABASE-PublicCLOUD9! indicates the default password.
      2. Run the set search_path=dsware_1; command to set the query path.
      3. Run the following commands to query the database:

        Select STORAGEIP, CLUSTERIP from TBL_SERVER_INFO where MANAGEIP='x.x.x.x ';

        In the command output, values of STORAGEIP and CLUSTERIP indicate the back-end storage IP address and front-end storage IP address, respectively. x.x.x.x indicates the management IP address of the system disks to be replaced.

    3. Run the following commands to perform security hardening for the system:

      sh /opt/reinforce_os/reinforce_os.sh install

      sh /opt/reinforce_os/reinforce_os.sh check

  4. If certificate replacement has been performed on the faulty node, replace the certificate used by the node with the one used before the node became faulty by referring to Security Configuration in the FusionStorage Block Storage Security Configuration Guide.
  5. Restore the services on the management node.

    1. Download the product package used in the current storage system and upload the package to the faulty management node. If the storage system has been upgraded, use the product package of the upgrade target version.
    2. Run the sh /home/FusionStorage_deploymanager_*/action/change_hasync.sh upgrade command on the primary management node to disable file synchronization.
    3. View the /home/HAInfoFromInstallTool.properties file information on the normal management node. Set the HAInfoFromInstallTool.properties file of the faulty node based on the detailed deployment information. The information must be consistent with that in the detailed deployment information table. Then, copy the HAInfoFromInstallTool.properties file to the /home directory on the faulty node. Table 4-1 lists the detailed deployment information.
      • If the values of ha_role and init_role on the normal management node are primary, fill in the deployment information of the faulty node by referring to related information of the normal node. Modify the values of ha_role and init_role in the HAInfoFromInstallTool.properties file of the faulty node to standby, and set path pkg_path as the absolute path for product package upload.
      • If the values of ha_role and init_role on the normal management node are standby, fill in the deployment information of the faulty node by referring to related information of the normal node. Modify the values of ha_role and init_role in the HAInfoFromInstallTool.properties file of the faulty node to primary, and set path pkg_path as the absolute path for product package upload.
    4. Run the tar -xvzf FusionStorage_8.0.0.*.tar.gz command to decompress the product package, and run the cd FusionStorage_8.0.0.*/deploymanager/preinstall command to go to the preinstall directory.
    5. Run the sh install.sh manage /home/HAInfoFromInstallTool.properties command to reinstall the software on the management node.
    6. Run the rm -rf /home/.preinstall command to clear the pre-installation flag and delete the product package and decompressed package uploaded during the restoration.
    7. After the installation is complete, run the sh /home/FusionStorage_deploymanager_*/action/change_hasync.sh recovery command on the primary management node to resume file synchronization.
    8. On the primary management node, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –syncallfile command to trigger file synchronization. After data synchronization is complete, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –getsyncfilestatus or /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –getsyncfileprocess command to check the result. After data synchronization is complete, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool --cancelforbidswitch --name=oam_u command to enable the HA function.

  6. Log in to the primary management node as an administrator and run the ismcli -u admin command to log in to the CLI. To run the command, you need to enter the password of the CLI super administrator admin. If security hardening has been performed, you must switch to user root and use the absolute path to log in to the CLI.
  7. Run the following commands to restore the software on the faulty node:

    restore node software node_ip=x.x.x.x user_name=xxxx

    password:**********

    root_password:**********

    In the preceding commands, node_ip indicates the internal management IP address of the faulty node, user_name indicates a user name for logging in to the faulty node, fsadmin indicates the default user name, password indicates the password of the user name, IaaS@OS-CLOUD9! indicates the default password, and root_password indicates the password of user root on the faulty node. After the commands are executed successfully, you can obtain the task type and task ID.

  8. Run the show node deploy_task task_id=X command to query the software restoration progress of the faulty node. In the preceding command, task_id indicates the ID of the task to be queried. You can proceed with the next step only after the task is successfully executed.
  9. Restore basic services such as SNM and FDSA.

    Log in to the CLI on the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op restoreBasicServices -ip x.x.x.x command to add a client. In the command, x.x.x.x indicates the management IP address of the storage node. To run the preceding command, you need to enter super administrator account admin and its password as prompted.

  10. Restore the management cluster.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the following command to set the query path:

      set search_path=dsware_1;

    3. Run the following command to check whether the host to be restored has the control node role:

      Select * from tbl_mdc_server_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x');. In the command, x.x.x.x indicates the management IP address of the node to be restored.

      If the command output is not 0, the node to be restored has the control node role. In this case, go to 10.d. Otherwise, run the following command:

      Select * from TBL_ZK_SERVER_INFO where serverid = (select id from tbl_server_info where manageip='x.x.x.x');. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the node to be restored has a control node role.

    4. Run the \q command to exit the database. If the node has a control node role in 10.c, go to 10.e. Otherwise, go to 10 to restore the dsware client.
    5. Log in to the CLI of the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op restoreControlNode -ip x.x.x.x -formatZkDiskFlag false command to add a client. In the command, x.x.x.x indicates the management IP address of the node to be restored. To run the preceding command, you need to enter super administrator account admin and its password as prompted.

  11. Restore the dsware client.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the set search_path=dsware_1; command to set the query path.
    3. Run the Select * from tbl_vbs_info where serverid = (select id from tbl_server_info where manageip= ' x.x.x.x'); command on the primary management node to check whether the dsware client needs to be restored. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the dsware client needs to be restored. Run the \q command to exit the database and go to 11.d. Otherwise, run the \q command to exit the database.
    4. Log in to the CLI of the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op createDSwareClient -ip x.x.x.x -nodetype 0 command to add a client. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the node to be restored.

  12. Restore storage resources.

    1. Log in to the primary management node as user root, and run the following command to connect to the database (21600 indicates the default port number and IaaS@DATABASE-PublicCLOUD9! indicates the default password).

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the set search_path=dsware_1; command to set the query path.
    3. Run the Select * from tbl_service_process_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x') and processtype='EBSCtrl' and servicetype='eds'; command on the primary management node to check whether the dsware client needs to be restored. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, run the \q command to exit the database and go to 12.d. Otherwise, run the \q command to exit the database.
    4. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op queryStoragePool command to obtain the storage pool ID. To run this command, enter the name and password of CLI super administrator account admin as prompted. Information about all storage pools is displayed in the command output. Pool ID in the leftmost column displays IDs of all storage pools.
    5. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op queryStorageNodeInfo -id poolId command to check whether the node has been removed from the storage pool. (If the node has been faulty for a certain period (usually 7 days), the node will be removed from the storage pool.) To run this command, enter the name and password of CLI super administrator account admin as prompted. Check whether the command output contains the node. If yes, the node has not been removed from the storage pool. In this case, go to 12.f. If no, the node has been removed from the storage pool. In this case, proceed with 12.g and 12.h.
    6. Log in to the CLI of the primary management node (control and non-control nodes as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op restoreStorageNode -ip x.x.x.x -p poolId command to restore storage resources. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the floating IP address of the storage plane on the faulty node, and poolId indicates the ID of the storage pool to which the node belongs. If the node is added to multiple storage pools, run the command multiple times using the IDs of the storage pools to restore storage resources.
    7. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op deleteStorageNode -ip x.x.x.x -id poolId command to restore storage resources. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the node to be restored, and poolId indicates the ID of the storage pool to which the node belongs.

      If the node is added to multiple storage pools, run this command multiple times using the IDs of the storage pools.

    8. On DeviceManager, re-add the node to the storage pool.

  13. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op setServerStorageMode -ip x.x.x.x -nodetype 0 command to switch to the normal mode. To run this command, enter the name and password of CLI super administrator account admin as prompted.
  14. Check the system status.

    On SmartKit, choose Home > Storage > Routine Maintenance > More > Inspection and check the system status.
    • If all inspection items pass the inspection, the inspection is successful.
    • If some inspection items fail, the inspection fails. Rectify the faults by taking recommended actions in the inspection reports. Perform inspection again after fault rectification. If the inspection still fails, contact Huawei technical support.

    For details, see the FusionStorage Block Storage Administrator Guide.

Follow-up Procedure

Label the replaced system disk modules to facilitate subsequent operations.

Translation
Download
Updated: 2019-08-19

Document ID: EDOC1100081420

Views: 4881

Downloads: 4

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next