No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionStorage 8.0.0 Block Storage Parts Replacement 05

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Replacing Both System Disk Modules

Replacing Both System Disk Modules

System disk modules are used to boot operating systems.

Impact on the System

If both system disk modules of a system fail, the system will break down.

Prerequisites

  • Spare disk modules are ready.
  • The faulty system disk modules have been located.
NOTE:

For details about the slot numbers of system disk modules, see Slot Numbers.

Precautions

  • To prevent damaging disk modules or connectors, remove or install disk modules with even force.
  • When removing a disk module, first remove it from its connector. Wait at least 30 seconds and then remove the disk module completely from the chassis.
  • To prevent disk module damage, wait at least one minute between removal and insertion actions.
  • To avoid system failures, do not reuse disk modules.
  • If a hot patch has been installed on a node, install the patch on the node again after replacing the two system disk modules on the node.

Tools and Materials

  • ESD gloves
  • ESD wrist straps
  • ESD bags
  • Labels

Context

Log in to the node as user dsware and run the following command to check whether the node to be restored is the management node:

Run the su - dmdbadmin -c "/opt/fusionstorage/deploymanager/gaussdb/app/bin/gsql -p 7018 -d cmdb" command, enter the password (default password: Huawei12#$), and run the select role from BLOCK_NODE_ROLE,BLOCK_NODE_DETAIL where BLOCK_NODE_ROLE.NODE_ID=BLOCK_NODE_DETAIL.ID and BLOCK_NODE_DETAIL.MANAGEMENT_INTERNAL_IP='X.X.X.X'; command.

In the preceding command, X.X.X.X indicates the management IP address of the node on which faulty system disks are to be replaced.

Replacing System Disks on Management Nodes (Deployed Independently on Physical Nodes)

  1. Replace the two system disk modules and power on the faulty node. For details, see Replacing a System Disk Module.
  2. Configure RAID for system disks, install the OS, and configure networks. For details, see the product documentation of the desired version.

    1. Install the OS and driver file required for system running by following "Installing a Server Operating System" in the product documentation of the desired version.
    2. Configure the node IP address. For details, see "Configuring a Server IP Address" in the software installation guide of the desired version.
    3. Run the following commands to perform security hardening for the system:

      sh /opt/reinforce_os/reinforce_os.sh install

      sh /opt/reinforce_os/reinforce_os.sh check

  3. If certificate replacement has been performed on the faulty node before, replace the certificate by following "Security Configuration" in the security configuration guide of the desired version.
  4. Restore the services on the management node.

    1. Download the product package in the current storage system and upload the product package to the faulty management node. If the product package has been upgraded, use the upgraded version.
    2. On the primary management node, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool --disabledsync --name=upgrade command to disable file synchronization.
    3. Run the tar -xvzf FusionStorage_8.0.0.*.tar.gz command to decompress the product package, and run the cd FusionStorage_8.0.0.*/deploymanager/preinstall command to go to the preinstall directory.
    4. View the /home/HAInfoFromInstallTool.properties file information on a normal node, and set the HAInfoFromInstallTool.properties file information based on the detailed deployment information. The information must be consistent with that in the detailed deployment information table. Then, copy the HAInfoFromInstallTool.properties file to the /home directory. Table 4-1 describes the detailed deployment information.
      1. If the values of ha_role and init_role on the normal management node are primary, fill in the deployment information of the faulty node by referring to related information of the normal node. Modify the values of ha_role and init_role in the HAInfoFromInstallTool.properties file of the faulty node to standby, and set path pkg_path as the absolute path for product package upload.
      2. If the values of ha_role and init_role on the normal management node are standby, fill in the deployment information of the faulty node by referring to related information of the normal node. Modify the values of ha_role and init_role in the HAInfoFromInstallTool.properties file of the faulty node to primary, and set path pkg_path as the absolute path for product package upload.
        Table 4-1 Detailed deployment information

        Field

        Description

        Example

        ha_role

        Indicates the HA role, which works with a normal node in primary/secondary mode.

        primary

        init_role

        Indicates the initial role, which is generally the same as ha_role.

        primary

        local_sn

        Indicates the SN of the primary node. You can use the default value. The value must be the same as that of a normal node.

        remote_sn

        Indicates the SN of the secondary node. You can use the default value. The value must be the same as that of a normal node.

        local_cabinet

        Indicates the cabinet number of the primary node, which must be the same as that of a normal node.

        remote_cabinet

        Indicates the cabinet number of the secondary node, which must be the same as that of a normal node.

        ha_mode

        Indicates the HA mode. Possible values are single and double, which must be the same as that of a normal node.

        double

        net_mode

        Indicates the network mode.

        single

        service_float_ip

        Indicates the floating IP address of the external management network, which must be the same as that of a normal node.

        100.2.10.76

        service_gateway

        Indicates the gateway of the external management network, which must be the same as that of a normal node.

        100.2.0.1

        service_mask

        Indicates the subnet mask of the external management network, which must be the same as that of a normal node.

        255.255.0.0

        service_local_ip

        Indicates the IP address of the external primary management node, which must be the same as that of a normal node.

        100.2.5.94

        service_local_port

        Indicates the port number of the external primary management node, which must be the same as that of a normal node.

        bond0

        service_remote_ip

        Indicates the IP address of the external secondary management node, which must be the same as that of a normal node.

        100.2.5.62

        service_remote_port

        Indicates the port number of the external secondary management node, which must be the same as that of a normal node.

        bond0

        manager_float_ip

        Indicates the floating IP address of the internal management network, which must be the same as that of a normal node.

        100.2.10.76

        manager_gateway

        Indicates the gateway of the internal management network, which must be the same as that of a normal node.

        100.2.0.1

        manager_mask

        Indicates the subnet mask of the internal management network, which must be the same as that of a normal node.

        255.255.0.0

        manager_local_ip

        Indicates the IP address of the internal primary management node, which must be the same as that of a normal node.

        100.2.5.94

        manager_local_port

        Indicates the port number of the internal primary management node, which must be the same as that of a normal node.

        bond0

        manager_remote_ip

        Indicates the IP address of the internal secondary management node, which must be the same as that of a normal node.

        100.2.5.62

        manager_remote_port

        Indicates the port number of the internal secondary management node, which must be the same as that of a normal node.

        bond0

        local_host_name

        Indicates the host name of the primary node, which must be the same as that of a normal node.

        FSM01

        remote_host_name

        Indicates the host name of the secondary node, which must be the same as that of a normal node.

        FSM02

        active_ip

        Indicates the IP address of the primary HA node, which must be the same as the value of manager_local_ip.

        100.2.5.94

        standby_ip

        Indicates the IP address of the secondary HA node, which must be the same as the value of manager_remote_ip.

        100.2.5.62

        float_ip_for_ha

        Indicates the HA floating IP address, which must be the same as that of a normal node.

        100.2.10.76

        install_ha

        Indicates whether HA is installed.

        true

        service_ip_list

        Indicates the list of IP addresses to be installed. The IP addresses consist of manager_local_ip and manager_remote_ip, separated by commas (,).

        100.2.5.94,100.2.5.62

        internal_ethname

        Indicates the internal NIC name, which must be the same as that on a normal node.

        bond0

        external_ethname

        Indicates the external NIC name, which must be the same as that on a normal node.

        external_service_gateway

        Indicates the external gateway, which must be the same as that of a normal node.

        external_service_mask

        Indicates the external mask, which must be the same as that of a normal node.

        external_service_float_ip

        Indicates the floating IP address of the external management network, which must be the same as that of a normal node.

        100.2.10.76

        pkg_path

        Indicates the absolute path of the product package.

        /home/FusionStorage_8.0.0.20190714165359.tar.gz

  5. Go to the deploymanager/preinstall directory where the product package is decompressed and run the sh install.sh manage /home/HAInfoFromInstallTool.properties command to reinstall the management node software.
  6. Run the rm -rf /home/.preinstall command to clear the pre-installation flag and delete the product package and decompressed package uploaded during the restoration.
  7. After the installation is complete, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool --enabledsync --name=upgrade command on the primary management node to resume file synchronization.
  8. Run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –syncallfile command on the primary management node to trigger file synchronization. After data synchronization is complete, run the following command:

    /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –getsyncfilestatus;

    Run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool –getsyncfileprocess command to check the synchronization result.

    After data synchronization is complete, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool --cancelforbidswitch --name=oam_u command to enable the HA function.

  9. Check the system status.

    On SmartKit, choose Home > Storage > Routine Maintenance > More > Inspection and check the system status.
    • If all inspection items pass the inspection, the inspection is successful.
    • If some inspection items fail, the inspection fails. Rectify the faults by taking recommended actions in the inspection reports. Perform inspection again after fault rectification. If the inspection still fails, contact Huawei technical support.

    For details, see the FusionStorage Block Storage Administrator Guide.

Replacing System Disks on a Non-Management Node

  1. Log in to the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op setServerStorageMode -ip x.x.x.x-mode 1 command to switch to the maintenance mode. In the command, x.x.x.x indicates the IP address of the node on which faulty system disks are to be replaced. To run this command, enter the name and password of CLI super administrator account admin as prompted.
  2. Replace the two system disk modules and power on the faulty node. For details, see Replacing a System Disk Module.
  3. Configure RAID for system disks, install the OS, and configure networks by following the software installation guide of the desired version.

    1. Install the OS and driver file required for system running by following "Installing a Server Operating System" in the product documentation of the desired version.
    2. Configure the node IP address and bond ports. For details, see "Configuring a Server IP Address" in the software installation guide of the desired version.

      You can obtain the node IP address in the following ways:

      1. Log in to the primary management node as user root and run the /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9! command to connect to the database. In the command, 21600 indicates the default port number and IaaS@DATABASE-PublicCLOUD9! indicates the default password.
      2. Run the set search_path=dsware_1; command to set the query path.
      3. Run the following commands to query the database:

        Select STORAGEIP, CLUSTERIP from TBL_SERVER_INFO where MANAGEIP='x.x.x.x ';

        The command outputs correspond to the back-end storage IP address (STORAGEIP) and front-end storage IP address (CLUSTERIP).

    3. Run the following commands to perform security hardening for the system:

      sh /opt/reinforce_os/reinforce_os.sh install

      sh /opt/reinforce_os/reinforce_os.sh check

  4. If certificate replacement has been performed on the faulty node before, replace the certificate by following "Security Configuration" in the security configuration guide of the desired version.
  5. Log in to the primary management node as an administrator and run the ismcli -u admin command to log in to the CLI. To run the command, you need to enter the password of the CLI super administrator admin.
  6. Run the following commands to restore the software on the faulty node:

    restore node software node_ip=x.x.x.x user_name=xxxx

    password:**********

    root_password:**********

    In the preceding commands, node_ip indicates the internal management IP address of the faulty node, user_name indicates a user name for logging in to the faulty node, password indicates the password of the user name, and root_password indicates the password of user root on the faulty node. After the commands are executed successfully, you can obtain the task type and task ID.

  7. Run the show node deploy_task task_id=* command to query the software restoration progress of the faulty node. In the preceding command, task_id indicates the ID of the task to be queried. You can proceed with the next step only after the task is successfully executed.
  8. Restore basic services such as SNM and FDSA.

    Log in to the CLI on the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op restoreBasicServices -ip x.x.x.x command to add a client. In the command, x.x.x.x indicates the management IP address of the storage node. To run the preceding command, you need to enter super administrator account admin and its password as prompted.

  9. Restore network configuration information.

    1. Log in to a normal node, switch to the /opt/fusionstorage/agent/conf/ directory, and check the network.cfg network configuration file of the normal node.
    2. Copy the network.cfg file on a normal node and use the file to overwrite the network.cfg network configuration file in the same directory on the node to be restored.
    3. View the SystemConfiguration.xml file in /opt/omm/oms/workspace/webapps/dsware/WEB-INF/ on the FSM node, obtain the network segment values of the cluster_network_mark and storage_network_mark tags in the configuration file, assign the network segment value of the cluster_network_mark tag to the cluster_network_mark and storage_frontend fields in the network.cfg configuration file on the node to be restored, and assign the network segment value of the storage_network_mark tag to the storage_network_mark and storage_backend fields in the network.cfg configuration file on the node to be restored.
    4. Run the ps -ef | grep dsware_agent command on the node to be restored to check the agent process ID. Run the kill -9 agent process ID command to restart the agent process on the node to be restored.

  10. Restore the management cluster.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the following command to set the query path:

      set search_path=dsware_1;

    3. Run the following command to check whether the host to be restored has the control node role:

      Select * from tbl_mdc_server_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x');

      In the command, x.x.x.x indicates the management IP address of the node to be restored.

      If the command output is not 0, the node to be restored has the control node role. In this case, go to 10.d. Otherwise, run the following command:

      Select * from TBL_ZK_SERVER_INFO where serverid = (select id from tbl_server_info where manageip='x.x.x.x');

      In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the node to be restored has the control node role.

    4. Run the \q command to exit the database. If the node has a control node role in 10.c, go to 10.e. Otherwise, go to 11 to restore the dsware client.
    5. Log in to the CLI of the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op restoreControlNode -ip x.x.x.x-formatZkDiskFlag false command to add a client. In the command, x.x.x.x indicates the management IP address of the node to be restored. To run the preceding command, you need to enter super administrator account admin and its password as prompted.

  11. Restore the dsware client.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the set search_path=dsware_1; command to set the query path.
    3. Run the following command on the active management node:

      Select * from tbl_vbs_info where serverid = (select id from tbl_server_info where manageip= ' x.x.x.x');

      Check whether the dsware client needs to be restored on the faulty node. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the dsware client needs to be restored. Run the \q command to exit the database and go to 11.d. Otherwise, run the \q command to exit the database.

    4. Log in to the CLI of the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op createDSwareClient -ip x.x.x.x-nodetype 0 command to add a client. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the node to be restored.

  12. Restore storage resources.

    1. Log in to the primary management node as user root, and run the following command to connect to the database (21600 indicates the default port number and IaaS@DATABASE-PublicCLOUD9! indicates the default password).

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the set search_path=dsware_1; command to set the query path.
    3. Run the Select * from tbl_service_process_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x') and processtype='EBSCtrl' and servicetype='eds'; command on the primary management node to check whether the storage service on the faulty node needs to be restored. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, run the \q command to exit the database and go to 12.d. Otherwise, run the \q command to exit the database.
    4. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op queryStoragePool command to obtain the storage pool ID. To run this command, enter the name and password of CLI super administrator account admin as prompted. Information about all storage pools is displayed in the command output. poolId in the leftmost column displays the storage pool ID.
    5. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op queryStorageNodeInfo -id poolId command to check whether the node has been removed from the storage pool. (If the node has been faulty for a certain period (usually 7 days), the node will be removed from the storage pool.) To run this command, enter the name and password of CLI super administrator account admin as prompted. Check whether the command output contains the node. If yes, the node has not been removed from the storage pool. In this case, go to 12.f. If no, the node has been removed from the storage pool. In this case, proceed with 12.g and 12.h.
    6. Log in to the CLI of the primary management node (control and non-control nodes as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op restoreStorageNode -ip x.x.x.x -p poolId command to restore storage resources. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the faulty node, and poolId indicates the ID of the storage pool to which the node belongs. If the node has been added to multiple storage pools, repeat this step. If the storage resources are restored successfully, skip 12.g and 12.h.
    7. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op deleteStorageNode -ipx.x.x.x -id poolId command to restore storage resources. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the node to be restored, and poolId indicates the ID of the storage pool to which the node belongs.

      If the node is added to multiple storage pools, run this command multiple times using the IDs of the storage pools.

    8. On DeviceManager, re-add the node to the storage pool.

  13. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op setServerStorageMode -ip x.x.x.x -nodetype 0 command to switch to the normal mode. To run this command, enter the name and password of CLI super administrator account admin as prompted.
  14. Restore the replication management cluster.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the following command to set the query path:

      set search_path=dsware_1;

    3. Run the following command to check whether the node to be restored has the control node role:

      select * from tbl_service_process_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x') and processtype = 'cm' and servicetype = 'dr';

      In this command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the node to be stored has the control node role. In this case, go to 14.d. If the command output is 0, no further action is required.

    4. Run the following command to obtain the ID of the replication control cluster. Then, go to 14.e.

      select CLUSTERID from TBL_SERVICE_CLUSTER_INFO where servicetype = 'dr';

    5. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op drCmd -subOp resumeControlCluster -controlClusterId clusterId -resumeType type -nodeIps x.x.x.x command to restore the replication control cluster. To run this command, enter the name and password of super administrator account admin as prompted. In this command, clusterId indicates the ID of the replication cluster, and type indicates the restoration type. If the OS is to be reinstalled or DSware volumes are used as metadata disks, set the value of type to 1. If nodes or metadata disks need to be replaced, set the value of type to 0. x.x.x.x indicates the management IP address of the node to be restored.

  15. Restore the replication service cluster.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the following command to set the query path:

      set search_path=dsware_1;

    3. Run the following command to check whether the node to be restored has the service node role:

      select * from tbl_service_process_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x') and processtype = 'dms' and servicetype = 'dr';

      In this command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the node to be stored has the service node role. In this case, go to 15.d. If the command output is 0, no further action is required.

    4. Run the following command to obtain the ID of the replication control cluster. Then, go to 15.e.

      select CLUSTERID from TBL_SERVICE_CLUSTER_INFO where servicetype = 'dr';

    5. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op drCmd -subOp resumeReplicateCluster -controlClusterId clusterId -nodeIps x.x.x.x command to restore the replication service cluster. To run this command, enter the name and password of super administrator account admin as prompted. In the command, clusterId indicates the ID of the replication cluster, and x.x.x.x indicates the management IP address of the node to be restored.

  16. Check the system status.

    On SmartKit, choose Home > Storage > Routine Maintenance > More > Inspection and check the system status.
    • If all inspection items pass the inspection, the inspection is successful.
    • If some inspection items fail, the inspection fails. Rectify the faults by taking recommended actions in the inspection reports. Perform inspection again after fault rectification. If the inspection still fails, contact Huawei technical support.

    For details, see the FusionStorage Block Storage Administrator Guide.

Replacing System Disks in Scenarios Where Management Nodes (Deployed on Storage or Compute Nodes) Are Deployed in Converged Mode

  1. Log in to the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh –op setServerStorageMode -ip x.x.x.x -mode 1 command to switch to the maintenance mode. In the command, x.x.x.x indicates the IP address of the node on which faulty system disks are to be replaced. To run this command, enter the name and password of CLI super administrator account admin as prompted.
  2. Replace the two system disk modules and power on the faulty node. For details, see Replacing a System Disk Module.
  3. Configure RAID for system disks, install the OS, and configure networks. For details, see the product documentation of the desired version.

    1. Install the OS and driver file required for system running by following "Installing a Server Operating System" in the product documentation of the desired version.
    2. Configure the node IP address. For details, see "Configuring a Server IP Address" in the software installation guide of the desired version.
    3. Run the following commands to perform security hardening for the system:

      sh /opt/reinforce_os/reinforce_os.sh install

      sh /opt/reinforce_os/reinforce_os.sh check

  4. If certificate replacement has been performed on the faulty node before, replace the certificate by following "Security Configuration" in the security configuration guide of the desired version.
  5. Restore the services on the management node.

    1. Download the product package in the current storage system and upload the product package to the faulty management node. If the product package has been upgraded, use the upgraded version.
    2. On the primary management node, run the /opt/dfv/oam/oam-u/ha/ha/module/hacom/tools/ha_client_tool --disabledsync --name=upgrade command to disable file synchronization.
    3. Run the tar -xvzf FusionStorage_8.0.0.*.tar.gz command to decompress the product package, and run the cd FusionStorage_8.0.0.*/deploymanager/preinstall command to go to the preinstall directory.
    4. View the /home/HAInfoFromInstallTool.properties file information on a normal node, and set the HAInfoFromInstallTool.properties file information based on the detailed deployment information. The information must be consistent with that in the detailed deployment information table. Then, copy the HAInfoFromInstallTool.properties file to the /home directory. Table 4-2 describes the detailed deployment information.
      1. If the values of ha_role and init_role on the normal management node are primary, fill in the deployment information of the faulty node by referring to related information of the normal node. Modify the values of ha_role and init_role in the HAInfoFromInstallTool.properties file of the faulty node to standby, and set path pkg_path as the absolute path for product package upload.
      2. If the values of ha_role and init_role on the normal management node are standby, fill in the deployment information of the faulty node by referring to related information of the normal node. Modify the values of ha_role and init_role in the HAInfoFromInstallTool.properties file of the faulty node to primary, and set path pkg_path as the absolute path for product package upload.
        Table 4-2 Detailed deployment information

        Field

        Description

        Example

        ha_role

        Indicates the HA role, which works with a normal node in primary/secondary mode.

        primary

        init_role

        Indicates the initial role, which is generally the same as ha_role.

        primary

        local_sn

        Indicates the SN of the primary node. You can use the default value. The value must be the same as that of a normal node.

        remote_sn

        Indicates the SN of the secondary node. You can use the default value. The value must be the same as that of a normal node.

        local_cabinet

        Indicates the cabinet number of the primary node, which must be the same as that of a normal node.

        remote_cabinet

        Indicates the cabinet number of the secondary node, which must be the same as that of a normal node.

        ha_mode

        Indicates the HA mode. Possible values are single and double, which must be the same as that of a normal node.

        double

        net_mode

        Indicates the network mode.

        single

        service_float_ip

        Indicates the floating IP address of the external management network, which must be the same as that of a normal node.

        100.2.10.76

        service_gateway

        Indicates the gateway of the external management network, which must be the same as that of a normal node.

        100.2.0.1

        service_mask

        Indicates the subnet mask of the external management network, which must be the same as that of a normal node.

        255.255.0.0

        service_local_ip

        Indicates the IP address of the external primary management node, which must be the same as that of a normal node.

        100.2.5.94

        service_local_port

        Indicates the port number of the external primary management node, which must be the same as that of a normal node.

        bond0

        service_remote_ip

        Indicates the IP address of the external secondary management node, which must be the same as that of a normal node.

        100.2.5.62

        service_remote_port

        Indicates the port number of the external secondary management node, which must be the same as that of a normal node.

        bond0

        manager_float_ip

        Indicates the floating IP address of the internal management network, which must be the same as that of a normal node.

        100.2.10.76

        manager_gateway

        Indicates the gateway of the internal management network, which must be the same as that of a normal node.

        100.2.0.1

        manager_mask

        Indicates the subnet mask of the internal management network, which must be the same as that of a normal node.

        255.255.0.0

        manager_local_ip

        Indicates the IP address of the internal primary management node, which must be the same as that of a normal node.

        100.2.5.94

        manager_local_port

        Indicates the port number of the internal primary management node, which must be the same as that of a normal node.

        bond0

        manager_remote_ip

        Indicates the IP address of the internal secondary management node, which must be the same as that of a normal node.

        100.2.5.62

        manager_remote_port

        Indicates the port number of the internal secondary management node, which must be the same as that of a normal node.

        bond0

        local_host_name

        Indicates the host name of the primary node, which must be the same as that of a normal node.

        FSM01

        remote_host_name

        Indicates the host name of the secondary node, which must be the same as that of a normal node.

        FSM02

        active_ip

        Indicates the IP address of the primary HA node, which must be the same as the value of manager_local_ip.

        100.2.5.94

        standby_ip

        Indicates the IP address of the secondary HA node, which must be the same as the value of manager_remote_ip.

        100.2.5.62

        float_ip_for_ha

        Indicates the HA floating IP address, which must be the same as that of a normal node.

        100.2.10.76

        install_ha

        Indicates whether HA is installed.

        true

        service_ip_list

        Indicates the list of IP addresses to be installed. The IP addresses consist of manager_local_ip and manager_remote_ip, separated by commas (,).

        100.2.5.94,100.2.5.62

        internal_ethname

        Indicates the internal NIC name, which must be the same as that on a normal node.

        bond0

        external_ethname

        Indicates the external NIC name, which must be the same as that on a normal node.

        external_service_gateway

        Indicates the external gateway, which must be the same as that of a normal node.

        external_service_mask

        Indicates the external mask, which must be the same as that of a normal node.

        external_service_float_ip

        Indicates the floating IP address of the external management network, which must be the same as that of a normal node.

        100.2.10.76

        pkg_path

        Indicates the absolute path of the product package.

        /home/FusionStorage_8.0.0.20190714165359.tar.gz

  6. Log in to the primary management node as an administrator and run the ismcli -u admin command to log in to the CLI. To run the command, you need to enter the password of the CLI super administrator admin. If security hardening has been performed, you must switch to user root and use the absolute path to log in to the CLI.
  7. Run the following commands to restore the software on the faulty node:

    restore node software node_ip=x.x.x.x user_name=xxxx

    password:**********

    root_password:**********

    In the preceding commands, node_ip indicates the internal management IP address of the faulty node, user_name indicates a user name for logging in to the faulty node, fsadmin indicates the default user name, password indicates the password of the user name, IaaS@OS-CLOUD9! indicates the default password, and root_password indicates the password of user root on the faulty node. After the commands are executed successfully, you can obtain the task type and task ID.

  8. Run the show node deploy_task task_id=X command to query the software restoration progress of the faulty node. In the preceding command, task_id indicates the ID of the task to be queried. You can proceed with the next step only after the task is successfully executed.
  9. Restore basic services such as SNM and FDSA.

    Log in to the CLI on the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op restoreBasicServices -ip x.x.x.x command to add a client. In the command, x.x.x.x indicates the management IP address of the storage node. To run the preceding command, you need to enter super administrator account admin and its password as prompted.

  10. Restore network configuration information.

    1. Log in to a normal node, switch to the /opt/fusionstorage/agent/conf/ directory, and check the network.cfg network configuration file of the normal node.
    2. Copy the network.cfg file on a normal node and use the file to overwrite the network.cfg network configuration file in the same directory on the node to be restored.
    3. View the SystemConfiguration.xml file in /opt/omm/oms/workspace/webapps/dsware/WEB-INF/ on the FSM node, obtain the network segment values of the cluster_network_mark and storage_network_mark tags in the configuration file, assign the network segment value of the cluster_network_mark tag to the cluster_network_mark and storage_frontend fields in the network.cfg configuration file on the node to be restored, and assign the network segment value of the storage_network_mark tag to the storage_network_mark and storage_backend fields in the network.cfg configuration file on the node to be restored.
    4. Run the ps -ef | grep dsware_agent command on the node to be restored to check the agent process ID. Run the kill -9 x.x.x.x command to restart the agent process on the node to be restored. In the command, x.x.x.x indicates the obtained agent process ID.

  11. Restore the management cluster.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the following command to set the query path:

      set search_path=dsware_1;

    3. Run the following command to check whether the host to be restored has the control node role:

      Select * from tbl_mdc_server_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x');

      In the command, x.x.x.x indicates the management IP address of the node to be restored.

      If the command output is not 0, the node to be restored has the control node role. In this case, go to 11.d. Otherwise, run the following command:

      Select * from TBL_ZK_SERVER_INFO where serverid = (select id from tbl_server_info where manageip='x.x.x.x');

      In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the node to be restored has the control node role.

    4. Run the \q command to exit the database. If the node has a control node role in 11.c, go to 11.e. Otherwise, go to 11 to restore the dsware client.
    5. Log in to the CLI of the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op restoreControlNode -ip x.x.x.x -formatZkDiskFlag false command to add a client. In the command, x.x.x.x indicates the management IP address of the node to be restored. To run the preceding command, you need to enter super administrator account admin and its password as prompted.

  12. Restore the dsware client.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the set search_path=dsware_1; command to set the query path.
    3. Run the following command on the active management node:

      Select * from tbl_vbs_info where serverid = (select id from tbl_server_info where manageip= ' x.x.x.x');

      Check whether the dsware client needs to be restored on the faulty node. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the dsware client needs to be restored. Run the \q command to exit the database and go to 12.d. Otherwise, run the \q command to exit the database.

    4. Log in to the CLI of the primary management node as user dsware, switch to the /opt/dsware/client/bin directory, and run the ./dswareTool.sh --op createDSwareClient -ip x.x.x.x -nodetype 0 command to add a client. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the node to be restored.

  13. Restore storage resources.

    1. Log in to the primary management node as user root, and run the following command to connect to the database (21600 indicates the default port number and IaaS@DATABASE-PublicCLOUD9! indicates the default password).

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the set search_path=dsware_1; command to set the query path.
    3. Run the Select * from tbl_service_process_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x') and processtype='EBSCtrl' and servicetype='eds'; command on the primary management node to check whether the storage service on the faulty node needs to be restored. In the command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, run the \q command to exit the database and go to 13.d. Otherwise, run the \q command to exit the database.
    4. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op queryStoragePool command to obtain the storage pool ID. To run this command, enter the name and password of CLI super administrator account admin as prompted. Information about all storage pools is displayed in the command output. poolId in the leftmost column displays the storage pool ID.
    5. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op queryStorageNodeInfo -id poolId command to check whether the node has been removed from the storage pool. (If the node has been faulty for a certain period (usually 7 days), the node will be removed from the storage pool.) To run this command, enter the name and password of CLI super administrator account admin as prompted. Check whether the command output contains the node. If yes, the node has not been removed from the storage pool. In this case, go to 13.f. If no, the node has been removed from the storage pool. In this case, proceed with 13.g and 13.h.
    6. Log in to the CLI of the primary management node (control and non-control nodes as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op restoreStorageNode -ip x.x.x.x -p poolId command to restore storage resources. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the faulty node, and poolId indicates the ID of the storage pool to which the node belongs. If the node has been added to multiple storage pools, repeat this step. If the storage resources are restored successfully, skip 13.g and 13.h.
    7. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op deleteStorageNode -ipx.x.x.x -id poolId command to restore storage resources. To run this command, enter the name and password of super administrator account admin as prompted. In the command, x.x.x.x indicates the management IP address of the node to be restored, and poolId indicates the ID of the storage pool to which the node belongs.

      If the node is added to multiple storage pools, run this command multiple times using the IDs of the storage pools.

    8. On DeviceManager, re-add the node to the storage pool.

  14. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op setServerStorageMode -ip x.x.x.x -nodetype 0 command to switch to the normal mode. To run this command, enter the name and password of CLI super administrator account admin as prompted.
  15. Restore the replication management cluster.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the following command to set the query path:

      set search_path=dsware_1;

    3. Run the following command to check whether the node to be restored has the control node role:

      select * from tbl_service_process_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x') and processtype = 'cm' and servicetype = 'dr';

      In this command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the node to be stored has the control node role. In this case, go to 15.d. If the command output is 0, no further action is required.

    4. Run the following command to obtain the ID of the replication control cluster. Then, go to 15.e.

      select CLUSTERID from TBL_SERVICE_CLUSTER_INFO where servicetype = 'dr';

    5. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op drCmd -subOp resumeControlCluster -controlClusterId clusterId -resumeType type -nodeIps x.x.x.x command to restore the replication control cluster. To run this command, enter the name and password of super administrator account admin as prompted. In this command, clusterId indicates the ID of the replication cluster, and type indicates the restoration type. If the OS is to be reinstalled or DSware volumes are used as metadata disks, set the value of type to 1. If nodes or metadata disks need to be replaced, set the value of type to 0. x.x.x.x indicates the management IP address of the node to be restored.

  16. Restore the replication service cluster.

    1. Run the following command on the primary management node as user root to connect to the database (21600 is the default port number and IaaS@DATABASE-PublicCLOUD9! is the default password):

      /opt/dfv/oam/oam-u/oam-gaussdb/app/bin/gsql -h 127.0.0.1 -p 21600 -U omm -W IaaS@DATABASE-PublicCLOUD9!

    2. Run the following command to set the query path:

      set search_path=dsware_1;

    3. Run the following command to check whether the node to be restored has the service node role:

      select * from tbl_service_process_info where serverid = (select id from tbl_server_info where manageip= 'x.x.x.x') and processtype = 'dms' and servicetype = 'dr';

      In this command, x.x.x.x indicates the management IP address of the node to be restored. If the command output is not 0, the node to be stored has the service node role. In this case, go to 16.d. If the command output is 0, no further action is required.

    4. Run the following command to obtain the ID of the replication control cluster. Then, go to 16.e.

      select CLUSTERID from TBL_SERVICE_CLUSTER_INFO where servicetype = 'dr';

    5. Log in to the CLI of the primary management node as user dsware and run the sh /opt/dsware/client/bin/dswareTool.sh --op drCmd -subOp resumeReplicateCluster -controlClusterId clusterId -nodeIps x.x.x.x command to restore the replication service cluster. To run this command, enter the name and password of super administrator account admin as prompted. In the command, clusterId indicates the ID of the replication cluster, and x.x.x.x indicates the management IP address of the node to be restored.

  17. Check the system status.

    On SmartKit, choose Home > Storage > Routine Maintenance > More > Inspection and check the system status.
    • If all inspection items pass the inspection, the inspection is successful.
    • If some inspection items fail, the inspection fails. Rectify the faults by taking recommended actions in the inspection reports. Perform inspection again after fault rectification. If the inspection still fails, contact Huawei technical support.

    For details, see the FusionStorage Block Storage Administrator Guide.

Follow-up Procedure

Label the replaced system disk modules to facilitate subsequent operations.

Translation
Download
Updated: 2019-10-18

Document ID: EDOC1100081420

Views: 6291

Downloads: 4

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next