No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

eSight V300R010C00 Maintenance Guide 07

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Fault Troubleshooting

Fault Troubleshooting

This section describes common faults in the OMMHA HA system and troubleshooting methods.

Abnormal Server Restart or Shutdown

Symptom

eSight is available. However, the standby server information cannot be obtained on the active server, and you cannot log in to the standby server.

HAMode                                                                            
double                                   
                                         
NodeName  HostName        HAVersion       StartTime    HAActive        HAAllResOK  HARunPhase          
ha1       eSightServer40  V100R001C01     2017-09-16   active          normal      Actived                     
                                         
NodeName  ResName         ResStatus       ResHAStatus  ResType        
ha1       Database           Active_normal   Normal       Active_standby 
ha1       NMSServer       Normal          Normal       Single_active  
ha1       RMFloatIp       Normal          Normal       Single_active  
Possible Causes
  • A server in the HA system is manually restarted or shut down.
  • A server is restarted or shut down due to a system abnormality.
Procedure
  1. Request the technical support personnel to check the status of the VM where the standby eSight server is located.

    • If the standby server has been started, query the status of the two servers on the active server. After the active and standby servers are connected, the system will automatically recover. No manual operation is required.
      HAMode                                                                            
      double                                   
                                               
      NodeName  HostName        HAVersion       StartTime    HAActive        HAAllResOK  HARunPhase          
      ha1       eSightServer40  V100R001C01     2017-09-16   active          normal      Actived             
      ha2       eSightServer46  V100R001C01     2017-09-16   standby         normal      Deactived           
                                               
      NodeName  ResName         ResStatus       ResHAStatus  ResType        
      ha1       Database        Active_normal   Normal       Active_standby 
      ha1       NMSServer       Normal          Normal       Single_active  
      ha1       RMFloatIp       Normal          Normal       Single_active  
      ha2       Database        Standby_normal  Normal       Active_standby 
      ha2       NMSServer       Stopped         Unknown      Single_active  
      ha2       RMFloatIp       Stopped         Normal       Single_active  
    • If the standby server is shut down, manually start up the standby server. After the server is started, the system will automatically recover. No manual operation is required.
    • If the VM is abnormal, contact the technical support personnel for fault locating.

Peer End Information Cannot Be Obtained During Status Query

Symptom

eSight is available. However, the standby server information cannot be obtained on the active server, and the active server information cannot be obtained on the standby server.

HAMode                                                                            
double                                   
                                         
NodeName  HostName        HAVersion       StartTime    HAActive        HAAllResOK  HARunPhase          
ha1       eSightServer40  V100R001C01     2017-09-16   active          normal      Actived                     
                                         
NodeName  ResName         ResStatus       ResHAStatus  ResType        
ha1       Database        Active_normal   Normal       Active_standby 
ha1       NMSServer       Normal          Normal       Single_active  
ha1       RMFloatIp       Normal          Normal       Single_active  
Possible Causes
  • Heartbeat communications between the active and standby servers are interrupted.
  • The trust relationship is invalid.
Procedure
  1. Collect heartbeat IP addresses of the two eSight servers in the installation plan.
  2. Run the following command on one server to check whether the heartbeat IP address of the remote server can be pinged.

    # ping Heartbeat IP address of the remote server

    • If the communications are interrupted, contact the FusionSphere administrator for network fault locating. No further action is required.
    • If the command output shown in Figure 3-10 is displayed, the heartbeat communications are normal. Go to Step 3.
      Figure 3-10 Communications connection

  3. Rectify the fault based on "ALM-316010197 OMMHA Two-Node Cluster File Synchronization" and re-establish the trust relationship.
  4. Query the status of the two-node cluster again.

    • If the status is normal, no further action is required.
    • If the fault persists, contact Huawei technical support.

Damaged Server OS or Hard Disk

Symptom

eSight is available. However, the status of the standby server cannot be obtained on the active server, and you cannot log in to the standby server. After the standby server is restarted, a message indicating disk reading or operating system exception is displayed.

HAMode                                                                            
double                                   
NodeName  HostName        HAVersion       StartTime    HAActive        HAAllResOK  HARunPhase          
ha1       eSightServer40  V100R001C01     2017-09-16   active          normal      Actived                     
NodeName  ResName         ResStatus       ResHAStatus  ResType        
ha1       Database        Active_normal   Normal       Active_standby 
ha1       NMSServer       Normal          Normal       Single_active  
ha1       RMFloatIp       Normal          Normal       Single_active  
Possible Causes

The OS or hard disk of one server is damaged.

Procedure
  1. Log in to the active server as the root user.
  2. Stop the file synchronization task on the active server.

    # crontab -r -u ossuser

  3. Stop the OMMHA software on the active server.

    1. Switch to the ossuser user.

      # su ossuser

    2. Run the following commands to stop the OMMHA software:

      > cd /opt/ommha/ha/bin

      > ./stop.sh

      > exit

  4. Reinstall the standby server. Install the corresponding baseline version based on the eSight version before the fault occurs.

    • Image installation

      For details about the installation, see the installation guide for different scenarios in "Installation and Commissioning" in the eSight Product Documentation.

      The active and standby servers do not need to be connected.

    • Full installation
      For details about the installation, see the installation guide for different scenarios in "Installation and Commissioning" in the eSight Product Documentation. Stop the operation after the installation software has been uploaded. Operations after that are different from those for installing eSight. The following uses the local two-node cluster as an example:
      1. Log in to the standby server as the root user.
      2. Run the following commands to install eSight:

        # cd /opt/install

        # mv install.sh single.sh

        # chmod u+x single.sh

        # ./single.sh

      3. Select the two-node cluster type as prompted, for example, local two-node cluster (1).
        ##################################################
        Welcome to eSight installation & configuration Wizard
        ##################################################
        Please select HA type, 1(local-HA)  2(remote-HA):
        >1
      4. Enter the system IP address, heartbeat IP address, and floating IP address of the local server.
        Please input local system ip address:
        >10.137.97.16
        Please input local heartbeat ip address:
        >10.9.0.16
        Please input float ip address:
        >10.137.97.15
        Enter 'y' to apply the setting of South IP Address or 'n' to ignore (y/n):
        >n
      5. Enter the system IP address and heartbeat IP address of the remote server.
        Please input remote system ip address:
        >10.137.97.17
        Please input remote heartbeat ip address:
        >10.9.0.17
      6. Confirm the configurations.
        Please confirm the following configurations...
        ****************************************
                       local system ip
                                 10.137.97.16
                       local heartbeat ip
                                 10.9.0.16
                       float ip
                                 10.137.97.15
                       remote system ip
                                 10.137.97.17
                       remote heartbeat ip
                                 10.9.0.17
        Enter 'y' to apply these values and proceed to the next step, or 'n' to return to make any changes (y/n):y
      7. Enter the root user password of the remote server as prompted.
        Please input remote root password: 
      8. If the following information is displayed, eSight is installed successfully.
        begin to check local parameters... 
        check parameters finish 
        begin to install ha... 
        enter force-installation mode... 
        install ha successfully. 
        begin to install eSight... 
        eSight install finish. 
        begin to config local Database... 
        config Database finish. 
      9. Perform other operations after installing eSight. For details, see "Installing eSight (New Installation)" in the installation and commissioning for the local HA system scenario in the eSight Product Documentation.
        NOTE:

        If the virtual resource management component has been installed, follow instructions in "Operation Guide > Virtual Resources Management > Resource Adding > Adding a Single Virtual Resource > Configuring the Mapping Between FusionSphere OpenStack Domain Names and IP Addresses" to perform follow-up operations.

  5. Connect the active and standby servers on the active server.

    For details about how to connect the active and standby servers in the local HA system or remote HA system, see the section about connecting the active and standby servers for different scenarios in the installation and commissioning in the product documentation.

  6. Restore the file synchronization task.

    Run the following commands on the active server as the root user:

    # echo "*/2 * * * * /bin/sh /opt/eSight/mttools/ha/filecopy/doSync.sh filesync_with_pd.sh EXCLUDE" > /tmp/cro.ossuser

    # crontab -u ossuser /tmp/cro.ossuser

    # rm /tmp/cro.ossuser

  7. Apply and import the license again.

    The UUID of the new VM has changed. As a result, the original license ESN information does not match the server. Perform operations in "Reference > eSight Alarm Reference > ALM-999999995 Invalid License" in the eSight Product Documentation to solve the problem.

Damaged Two-Node Cluster OS or Hard Disk

Symptom

eSight is unavailable, and you can log in to neither of the two servers. After the servers are restarted, a message indicating the disk reading or OS abnormality is displayed.

Possible Causes

OSs or hard disks of these two servers are damaged.

Procedure

After a fault occurs, reinstall the VM, and restore the data that has been backed up remotely as scheduled before the fault occurs.

  1. For details about the remote backup, see "Operation and Maintenance > Maintenance Guide > Backup and Restore > Backing Up and Restoring the Database and Configuration Files" in the eSight Product Documentation.
  2. Reinstall the two servers. Install the corresponding baseline version based on the eSight version before the fault occurs.

    For details about the installation methods in different scenarios, see "Installation and Commissioning > Local High Availability System Software Installation Guide (SUSE Linux + MySQL + OMMHA)" in the eSight Product Documentation.

  3. If the eSight patch has been installed before the fault, follow instructions in the patch installation guide to install the eSight patch so that the current eSight patch version is the same as that before the fault.
  4. Stop the OMMHA service of the two-node cluster.

    For details, see "Installation and Commissioning > Local High Availability System Software Installation Guide (SUSE Linux + MySQL + OMMHA) > Appendix > Common eSight Operations > Stopping eSight" in the eSight Product Documentation.

  5. Restore data on the active server.

    For details, see "Operation and Maintenance > Maintenance Guide > Backup and Restore > Backing Up and Restoring the Database and Configuration Files" in the eSight Product Documentation.

  6. Connect the active and standby servers.

    For details about the connection, see "Installation and Commissioning > Local High Availability System Software Installation Guide (SUSE Linux + MySQL + OMMHA) > Installing the eSight System (Image Installation) > Connecting the Active and Standby Servers" in the eSight Product Documentation.

  7. In a full installation scenario, perform other operations after the software is installed.

    Skip this step for an image installation scenario.

  8. Apply and import the license again.

    The UUID of the new VM has changed. As a result, the original license ESN information does not match the server. Perform operations in "Reference > eSight Alarm Reference > ALM-999999995 Invalid License" in the eSight Product Documentation to solve the problem.

Abnormal Resource Status on the Standby Server

Symptom

eSight is available. However, if you query the standby server resources on the active server, resources in Exception state exist on the standby server.

HAMode                                                                           
double                                   
                                         
NodeName  HostName        HAVersion       StartTime    HAActive        HAAllResOK  HARunPhase          
ha1       eSightServer40  V100R001C01     2017-09-16   active          normal      Actived             
ha2       eSightServer46  V100R001C01     2017-09-16   standby         normal      Deactived           

NodeName  ResName         ResStatus       ResHAStatus  ResType        
ha1       Database        Active_normal   Normal       Active_standby 
ha1       NMSServer       Normal          Normal       Single_active  
ha1       RMFloatIp       Normal          Normal       Single_active  
ha2       Database        Standby_normal  Normal       Active_standby 
ha2       NMSServer       Stopped         Exception    Single_active  
ha2       RMFloatIp       Stopped         Normal       Single_active 
Possible Causes
  • An abnormality exists on the standby server.
  • After the active server becomes abnormal, a switchover occurs.
Procedure
The following uses the local two-node cluster where southbound and northbound services are not isolated. There are three types of resources: Database, NMSServer, and RMFloatIp.
  • If RMFloatIp is abnormal, manual intervention is not required. The system tries to automatically rectify the fault.
  • If Database resources are abnormal, the system tries to automatically rectify the fault. If the fault cannot be automatically rectified, rectify the fault based on ALM-316010198 Data Replication Failure in the Two-Node Cluster.
  • When NMSServer is abnormal, perform the following operations to manually rectify the fault and verify the result:
    1. Log in to the standby server as the ossuser user.
    2. Run the following commands to rectify the fault:

      > cd /opt/ommha/ha/bin

      > sh clearrmfault.sh

    3. Check whether eSight can run properly.
      1. Log in to the active server as the ossuser user.
      2. Run the following commands to enable the switchover:

        > cd /opt/ommha/ha/bin

        > sh forbiden_switch.sh cancel

      3. Run the following command to perform the switchover:

        > sh switchover.sh

        After the active-standby switchover is successful, the resource status is as follows:

        HAMode                                
        double                                   
                                                 
        NodeName  HostName        HAVersion       StartTime    HAActive        HAAllResOK  HARunPhase          
        ha1       eSightServer40  V100R001C01     2017-09-16   standby         normal      Deactived            
        ha2       eSightServer46  V100R001C01     2017-09-16   active          normal      Actived      
                                                 
        NodeName  ResName         ResStatus       ResHAStatus  ResType        
        ha1       Database        Standby_normal  Normal       Active_standby 
        ha1       NMSServer       Stopped         Normal       Single_active  
        ha1       RMFloatIp       Stopped         Normal       Single_active  
        ha2       Database        Active_normal   Normal       Active_standby
        ha2       NMSServer       Normal          Normal       Single_active   
        ha2       RMFloatIp       Normal          Normal       Single_active 

        If the switchover fails, contact Huawei technical support engineers.

    4. Observe the system for 10 minutes to check whether the system runs stably and whether the automatic switchover has been disabled.

      If the original standby server becomes the active one and runs stably, no more operations are required.

eSight Server Unavailability Due to FusionSphere Host Damag

Symptom

The VM corresponding to eSight cannot be started. After check, the corresponding cloud storage is normal.

Possible Causes

The FusionSphere host where the VM corresponding to eSight is located is damaged or abnormal.

Solution
  1. Communicate with the FusionSphere administrator to check whether the platform-level recovery operation has been completed and whether single-VM recovery needs to be performed.
  2. Use the rebuild capability of FusionSphere to recover the VM corresponding to eSight.

    For example, you can follow instructions in the host operating system fault section in the HUAWEI CLOUD Stack NFVI Solution Documentation.

Translation
Download
Updated: 2019-06-30

Document ID: EDOC1100044373

Views: 24672

Downloads: 74

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next