No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionStorage V100R006C10 Block Storage Service Troubleshooting Guide 04

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
A Single FSM VM Fails (Deployed on a FusionCompute, ESXi, or Microsoft Hyper-V Host)

A Single FSM VM Fails (Deployed on a FusionCompute, ESXi, or Microsoft Hyper-V Host)

Symptom

When two FusionStorage Manager (FSM) VMs are deployed in active/standby mode and one FSM VM becomes faulty and cannot be restored after a restart, perform the operations provided in this section to rectify faults and quickly restore services.

This section applies to the FusionCompute or Server SAN scenarios in which FSM VMs are deployed on FusionCompute, ESXi, or Microsoft Hyper-V hosts.

Possible Causes

The file system on the FSM VM is damaged.

Fault Diagnosis

None

Procedure

    Log in to the faulty VM using Virtual Network Computing (VNC).

    1. Log in to FusionCompute and locate the faulty FSM VM based on the VM name.

      • If the faulty VM is deployed in the VMware hypervisor, use vCenter to log in to the ESXi management host and locate the faulty FSM VM based on the VM name.
      • If the faulty VM is deployed in the Microsoft Hyper-V hypervisor, use Hyper-V Manager to log in to the Microsoft Hyper-V management host and locate the faulty FSM VM based on the VM name.

    2. Log in to the faulty FSM VM using VNC as user dsware.

      The default password of user dsware is IaaS@OS-CLOUD9!.

    1. Check whether the VNC login is successful.

      • If yes, go to 4.
      • If no, the host of the VM fails. Go to 7.

    Enable the high availability (HA) service.

    1. Run the following command and enter the password of user root to switch to user root:

      su - root

    2. Run the following command to check the HA status of the FSM VM:

      sh /opt/dsware/manager/setup/forCommonServer/checkFSMStatus.sh

      If the command fails to execute, the HA service encounters a fault. Reinstall the FSM VM and go to 7.

      The VM is successfully restored if information similar to the following is displayed (the node whose HA_active value is active is the active node):

      Ha mode
      double 
      
      NodeName                                 HostName                                 HaVersion                StartTime                HA_active            HA allResOK          HARunPhase          
      DSM01                                    FSM01                                    V100R001C01              2015-10-30 16:02:04      active               normal               Actived             
      DSM02                                    FSM02                                    V100R001C01              2015-10-30 16:02:04      standby              normal               Deactived           
      
      NodeName                                 ResName                                  ResStatus                ResHAStatus              ResType             
      DSM01                                    DSMExternalMgrFloatIp                    Normal                   Normal                   Single_active       
      DSM01                                    DSMInternalMgrFloatIp                    Normal                   Normal                   Single_active       
      DSM01                                    DSMLocalExternalMgrIp                    Normal                   Normal                   Double_active       
      DSM01                                    DSMLocalInternalMgrIp                    Normal                   Normal                   Double_active       
      ......

    3. Check whether the HA allResOK values are normal.

    Query the HA status.

    1. Log in to the other FSM VM using VNC as user dsware.

      The default password of user dsware is IaaS@OS-CLOUD9!.

      If the login fails, contact technical support.

    2. Run the following command to switch to user root:

      su - root

      The default password of user root is IaaS@OS-CLOUD8!.

    3. Run the following command to check the HA status of the FSM VM:

      sh /opt/dsware/manager/setup/forCommonServer/checkFSMStatus.sh

      Information similar to the following is displayed:

      Ha mode
      double 
      
      NodeName                                 HostName                                 HaVersion                StartTime                HA_active            HA allResOK          HARunPhase          
      DSM01                                    FSM01                                    V100R001C01              2015-10-30 16:02:04      active               normal               Actived             
      DSM02                                    --                                       --                       --                       --                   --                   --                  
      
      NodeName                                 ResName                                  ResStatus                ResHAStatus              ResType             
      DSM01                                    DSMExternalMgrFloatIp                    Normal                   Normal                   Single_active       
      DSM01                                    DSMInternalMgrFloatIp                    Normal                   Normal                   Single_active       
      DSM01                                    DSMLocalExternalMgrIp                    Normal                   Normal                   Double_active       
      DSM01                                    DSMLocalInternalMgrIp                    Normal                   Normal                   Double_active       
      ......

    4. Determine the active and standby FSM nodes based on the host name.

      If node DSM01 is in the active state, this node is the original active node, and the faulty node is the original standby node. Otherwise, the current node is the original standby node, and the faulty node is the original active node.

    Complete pre-processing before data rebuilding.

    1. Run the following command to switch to the /opt/omm/oms/ directory:

      cd /opt/omm/oms/

    2. Run the following command to query the workspace and workspace_install soft links:

      ll

    3. Check whether the workspace and workspace_install soft links are correct.

      • If yes, go to 22.
      • If no, go to 14.

      The soft links are correct if the workspace soft link points to workspace0 and the workspace_install soft link points to workspace1.

      If information similar to the following is displayed, the workspace soft link is correct:

      ...
      lrwxrwxrwx  1 omm    omm    28 Apr 12 10:30 tools -> /opt/omm/oms/workspace/tools
      lrwxrwxrwx  1 root   root   10 Apr 13 16:00 workspace -> workspace0
      drwxr-x--- 16 omm    omm  4096 Apr 13 00:00 workspace0
      drwxr-x--- 15 omm    omm  4096 Apr 12 10:31 workspace1
      lrwxrwxrwx  1 root   root   10 Apr 13 16:00 workspace_install -> workspace1

    4. Run the following command to switch to the /opt/dsware/manager/setup directory:

      cd /opt/dsware/manager/setup

    5. Run the following command to stop the HA service:

      ./MonitorTool.sh stop

      NOTE:
      You can run the ps -ef | grep ha.bin command to check whether the HA process has been stopped.

    6. Run the following command to switch to the /opt/omm/oms/ directory:

      cd /opt/omm/oms/

    7. Run the following commands to switch the OMS workspace:

      mv workspace1 workspace_work

      mv workspace0 workspace1

      mv workspace_work workspace0

      rm workspace

      ln -s workspace0 workspace

      rm workspace_install

      ln -s workspace1 workspace_install

    8. Run the following command to query the workspace soft link:

      ll

      Ensure that the workspace soft link points to workspace0 and the workspace_install soft link points to workspace1.

    9. Run the following command to switch to the /opt/dsware/manager/setup directory:

      cd /opt/dsware/manager/setup

    10. Run the following command to start the HA service:

      ./MonitorTool.sh start

      NOTE:
      You can run the ps -ef | grep ha.bin command to check whether the HA process has been started.

    11. Run the following command to check the HA status of the FSM VM:

      sh /opt/dsware/manager/setup/forCommonServer/checkFSMStatus.sh

      Information similar to the following is displayed:

      Ha mode
      double 
      
      NodeName                                 HostName                                 HaVersion                StartTime                HA_active            HA allResOK          HARunPhase          
      DSM01                                    FSM01                                    V100R001C01              2015-10-30 16:02:04      active               normal               Actived             
      DSM02                                    --                                       --                       --                       --                   --                   --                  
      
      NodeName                                 ResName                                  ResStatus                ResHAStatus              ResType             
      DSM01                                    DSMExternalMgrFloatIp                    Normal                   Normal                   Single_active       
      DSM01                                    DSMInternalMgrFloatIp                    Normal                   Normal                   Single_active       
      DSM01                                    DSMLocalExternalMgrIp                    Normal                   Normal                   Double_active       
      DSM01                                    DSMLocalInternalMgrIp                    Normal                   Normal                   Double_active       
      ......

    Reinstall an OS for the faulty FSM VM.

    1. On the FusionCompute web client, log in to the faulty FSM VM using VNC.
    2. On the menu bar of the VNC client, click and select Mount Local CD/DVD-ROM Drive.
    3. Select File(*.iso), click Browse, select the FusionStorage Block V100R006C10SPC300.iso file stored on the PC, select Restart the VM now to install the OS, and click Confirm.

      The VM starts to restart. During the restart, the host is disconnected from the VNC client. Click OK to retain the login status or log in to the VM again using VNC.

    4. When a screen shown in Figure 3-1 is displayed, select Install and press Enter to start the OS installation.

      NOTE:
      The installation process takes about 20 minutes.
      Figure 3-1  Install selection page

      The host OS automatically restart after the OS is installed on the VM.

    5. After the OS is installed, log in to the OS as user root.

      The default password of user root is IaaS@OS-CLOUD8!.

    6. After the installation is complete, run the following commands as user root to configure a temporary IP address and default route for the FSM node:

      ifconfig Management plane NIC Temporary IP address up

      route add default gw Default gateway

      This temporary IP address will be used to connect to the FSM node when you use a tool to configure the FSM network information.

      For example, run the following commands:

      ifconfig eth0 192.168.40.15 up

      route add default gw 192.168.40.1

    7. Click on the menu bar of the VNC client and select Unmount CD/DVD-ROM Drive.

    Configure the active/standby role of the faulty FSM VM.

    The configured role must be the same as the FSM VM role before the fault occurs.

    1. In the VNC window of the reinstalled FSM VM, enter username root and password IaaS@OS-CLOUD8! to log in to the OS.
    2. Run the following command to switch to the directory:

      cd /opt/dsware/manager/setup/forCommonServer/

    3. Run the following command to configure the active/standby role:

      sh ConfigHAForCommonServer.sh

    4. Configure the active/standby role as instructed. The following provides a configuration example:

      The configured role must be the same as the FSM VM role before the fault occurs.

      • Please enter manager HA mode (double/single): [double](?)
        Specify the FSM deployment mode and press Enter. For example, enter d to configure the active/standby deployment mode.
      • Please enter network plane mode (double/single): [single](?)
        Specify the network plane mode and press Enter. For example, enter s to select the single plane.
      • Please enter hostname: [FSM](?)

        Enter the host name of the faulty FSM VM and press Enter.

        The configured host name must be the same as the FSM host name before the fault occurs.

      • Please enter manager gateway ip : [192.168.40.1](?)
        Enter the management plane gateway and press Enter.
      • Please enter local manager ip : [192.168.40.15](?)
        Configure the management IP address planned for the local FSM node and press Enter.
      • Please enter local manager port : [eth0](?)
        Press Enter to use the default NIC eth0 as the management plane NIC.
      • Please enter manager mask : [255.255.254.0](?)
        Enter the subnet mask for the management plane network segment and press Enter.
      • Please enter remote manager ip : [192.168.40.16](?)
        Enter the management IP address of the other FSM node and press Enter.
      • Please enter remote manager port : [eth0](?)
        Press Enter to use the default NIC eth0 as the management plane NIC.
      • Please enter manager float ip : [192.168.40.10](?)
        Enter the planned floating IP address for the FSM node and press Enter.
      • Please enter local role (primary/standby) : [primary](?)
        Set this parameter based on the initial active/standby status queried in 10. If NodeName of the properly running FSM VM is DSM01, the initial active/standby status of the faulty node is standby. If NodeName of the properly running FSM VM is DSM02, the initial active/standby status of the faulty node is active.
      Confirm the configuration on the information summary page. If the configuration is correct, enter y. If the configuration is incorrect, enter n to reconfigure the information. This process takes about 2 minutes. If information similar to the following is displayed, the configuration is successful:
      Congratulations, config the OMS successfully.
      start HA successfully.
      Warning: HA monitor has been running already.
      [Tue Oct 22 11:28:31 CST 2013][line:506]Start to set monitor task.
      [Tue Oct 22 11:28:31 CST 2013][line:518]inittab add id:7
      [Tue Oct 22 11:28:31 CST 2013][line:524]Finish setting monitor task.
      [Tue Oct 22 11:28:31 CST 2013][line:526]End to config omm
      [Tue Oct 22 11:28:31 CST 2013][line:528]Config FSM successfully
      
      NOTE:

      If the active/standby configuration is incorrect, you can run the corresponding command to reconfigure the active/standby settings.

      The following exceptions may occur after the reconfiguration:
      • Alarm Heartbeat Communication Between the Active and Standby FSM Nodes Interrupted is generated on the FusionStorage Block Self-Maintenance Platform. You can manually clear the alarm.
      • An incorrect NIC name has been entered during the HA configuration. Perform the following operations to delete the network:
        • Switch to the /etc/sysconfig/network directory and delete the configuration file of the incorrect NIC, for example, ifcfg-eth5.
        • Run the service network restart command to restart the network service.

    5. Run the following command to check the active/standby configuration:

      sh /opt/dsware/manager/setup/forCommonServer/checkFSMStatus.sh

      The active node is correctly configured if information similar to the following is displayed:
      Ha mode
      double
      
      NodeName        HostName        HaVersion        StartTime            HA_active        HA allResOK        HARunPhase
      DSM01           linux-cannGZ    V100R001C01      2013-10-22 16:45:48  active           normal             Actived
      DSM02           --               --              --                   --               --                 --
      
      The standby node is properly configured if information similar to the following is displayed:
      Ha mode
      double
      
      NodeName        HostName        HaVersion        StartTime            HA_active        HA allResOK        HARunPhase
      DSM01           linux-cannGZ    V100R001C01      2013-10-22 16:45:48  active           normal             Actived
      DSM02           linux-KokCsO    V100R001C01      2013-10-22 16:45:31  standby          normal             Deactived
      
      NOTE:
      You can use the space bar to view more information. You can also press Shift+Page Up or Shift+Page Down to view complete output information.

    6. Run the following command to check the database status:

      /opt/omm/oms/tools/status_gaussdb

      • If the following information is displayed, the database is restored, and no further action is required.
        Status of GaussDB is OK.
      • If the following information is displayed, the database is being rebuilt. Wait for the rebuild to complete.
        GaussDB is repairing, please wait a minute.
      NOTE:
      During the rebuild, you can run the su - ommdba -c "gs_ctl querybuild" command to query the rebuild progress.

    7. After the rebuild is complete, run the following command to query the database status:

      /opt/dsware/manager/setup/forCommonServer/checkFSMStatus.sh | grep gaussDB

      If information similar to the following is displayed, the database is restored.
      DSM02         gaussDB         Active_normal          Normal          Active_standby
      DSM01         gaussDB         Standby_normal         Normal          Active_standby
      NOTE:

      During database reconstruction, run the preceding command to query the database status. In the command output, if the value of ResStatus is Stopped and the value of ResHAStatus is Exception, the database is being restored.

      If the database still fails, contact technical support.

Related Information

None

Translation
Download
Updated: 2019-02-01

Document ID: EDOC1000175245

Views: 3191

Downloads: 4

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next