No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionAccess V100R006C20 Alarm Handling 05 (FusionSphere 6.3.1)

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
1004001 Database Server Abnormal

1004001 Database Server Abnormal

Description

The database server sends a heartbeat message to the IT adapter (ITA) every 2 minutes. The heartbeat message contains CPU usage and memory usage. This alarm is generated when the ITA does not receive the database heartbeat message for three consecutive times.

In the active/standby database deployment scenario, the active database checks the standby database status every 2 minutes. This alarm is generated when the active database detects that the standby database status is abnormal for three consecutive times.

This alarm is cleared when the ITA receives the database heartbeat message again or the active database detects that the standby database recovers.

Attribute

Alarm ID

Alarm Severity

Auto Clear

1004001

Critical

Yes

Parameters

Name

Meaning

Alarm ID

Identifies an alarm. Each alarm is uniquely identified by an alarm ID and an alarm name.

Alarm Severity

Indicates the severity of an alarm. Value:

  • Critical indicates that a fault affecting services provided by the system occurs. You need to rectify the fault immediately. If a device or resource is faulty, rectify it immediately even if the fault occurs during non-working hours.
  • Major: indicates that a fault affecting the service quality of the system occurs. You need to rectify the fault immediately. If the service quality of a device or resource is degraded, rectify it immediately during working hours.
  • Minor: indicates a fault that does not affect service quality. To prevent more serious faults, this type of alarm needs to be observed or handled if necessary.
  • Warning: indicates a fault that may affect service quality. This type of alarm must be handled based on the error type.

Alarm Name

Identifies an alarm. Each alarm is uniquely identified by an alarm ID and an alarm name.

Object Type

Specifies the type of the object for which the alarm is generated.

Alarm Object Name

Specifies the name of the object for which the alarm is generated.

Component Type

(This parameter exists only in FusionManager.)

Specifies the type of the component for which the alarm is generated.

Generation Time

Specifies the time when the alarm is generated.

Clear Time

Specifies the time when the alarm is cleared.

Clear Mode

Specifies whether the alarm is manually or automatically cleared.

Operation

Specifies the operation that can be performed on the alarm.

Value: Manually Clear Alarm

Impact on the System

Database Server Abnormal will result in serious consequences, for example, the standby database service is unavailable, and the data in the active and standby databases is inconsistent. The database server should remain running state, if this alarm appeared, you must handle it on the same day.

Possible Causes

  • IP address conflict.
  • The IP address has been changed. The alarms generated by the IP address must be manually cleared.
  • The database service is not running properly.
  • The active and standby database nodes are disconnected.
  • The network is faulty.
  • The HA service of the database server that generated the alarm is abnormal.
  • The HA services of the active and standby database servers are abnormal.

Procedure

  1. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether alarm 1000033 HA Active/Standby Heartbeat Fault exists and whether the IP address displayed in peer IP address in Detailed Alarm Information is the same as that of the abnormal database server.

  2. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to Step 3.
    • If no, no further operation is required.

  3. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether alarm 1004001 Database Server Abnormal exists and whether the IP address displayed in Detailed Alarm Information is the same as the peer database server.

  4. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to Step 5.
    • If no, no further operation is required.

  5. Log in to the server where the alarm is generated using an administrator account and run the arping -c 3 -f -D -I eth0 IP address of the server where the alarm is generated command to check whether IP conflict occurs.

    • If yes, go to Step 6.
    • If no, go to Step 8.

      If the information similar to the following is displayed, no IP conflict occurs:

      ARPING 192.168.162.11 from 0.0.0.0 eth0 
      Sent 3 probes (3 broadcast(s)) 
      Received 0 response(s) 
      (Note: The IP addresses are only examples. Use the actual IP addresses.)     

      If the information similar to the following is displayed, IP conflict occurs:

      ARPING 192.168.162.11 from 0.0.0.0 eth0 
      Unicast reply from 192.168.162.11 [12:6E:D4:AB:CD:EF]  1.022ms 
      Sent 1 probes (1 broadcast(s)) 
      Received 1 response(s) 
      (Note: The preceding IP addresses and MAC addresses are only examples. Use the actual IP addresses and MAC addresses.)     

  6. Log in to the server that causes the IP conflict, shut down the server or change the server IP address, and run the arping -c 3 -f -D -I eth0 IP address of the server where the alarm is generated command again on the server where the alarm is generated to check whether the IP conflict persists.

    • If yes, contact Huawei technical support.
    • If no, go to Step 7.

  7. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to Step 8.
    • If no, no further operation is required.

  8. Log in to the ITA server as user gandalf, and check whether the database server network is normal. Run ping -c 3 IP address of the database server for which the alarm is generated to check whether the communication is normal.

    • If yes, go to Step 10.
    • If no, go to Step 9.

      The communication is normal if the command output is as follows:

      PING 192.168.190.2 (192.168.190.2) 56(84) bytes of data. 
      64 bytes from 192.168.190.2: icmp_seq=1 ttl=64 time=0.047 ms 
      64 bytes from 192.168.190.2: icmp_seq=2 ttl=64 time=0.057 ms 
      64 bytes from 192.168.190.2: icmp_seq=3 ttl=64 time=0.058 ms 
      (Note: The IP addresses are only examples. Use the actual IP addresses.)     

  9. Locate and rectify the network fault based on the actual situation on site.
  10. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to Step 11.
    • If no, no further operation is required.

  11. Log in to the DB server for which the alarm is generated using a database administrator account, and run the shell command gs_ctl status -P database administrator password to check whether the database service is normal.

    • If yes, go to Step 14.
    • If no, run the shell command gs_ctl restart to restart the database service.

      If the information similar to the following is displayed, the database service is normal:

      gs_ctl: server is running     

  12. Repeat Step 11 to check whether the database service is normal.

    • If yes, go to Step 13.
    • If no, contact Huawei technical support.

  13. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to Step 14.
    • If no, no further action is required.

  14. Log in to the database server where the alarm is generated using root account and run the shell command sh /opt/HA/module/hacom/script/config_ha.sh -a to check the IP address of the standby server.

    The information similar to the following is displayed. The information in bold is the IP address of the standby server.

     
    HaMode:       double 
     
    HaLocalName:  HA192220626(active) 
    HaPeerName: HA192220627(standby) 
     
    HaArbLk:      192.168.6.26:1234  --  192.168.6.27:1234 
                  192.168.6.26:1236  --  192.168.6.27:1236 
     
    HaSyncLk:     192.168.6.26:1235  --  192.168.6.27:1235 
                  192.168.6.26:1237  --  192.168.6.27:1237 
     
    HaRpcLk:      127.0.0.1:61806 
     
    HaArpLk:      192.168.6.31 
     
    HaGwLk:       192.168.6.1

  15. Log in to the standby database server using the database administrator account and run the shell command gs_ctl status -P database administrator password to check whether the database service is normal.

    • If yes, go to Step 18.
    • If no, run the shell command gs_ctl restart to restart the database service.

      If the information similar to the following is displayed, the database service is normal:

      gs_ctl: server is running     

  16. Repeat Step 11 to check whether the database service is normal.

    • If yes, go to Step 17.
    • If no. contact Huawei technical support.

  17. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to Step 18.
    • If no, no further action is required.

  18. Run the shell command gs_ctl query -P database administrator password to check whether the active and standby databases are disconnected.

    • If yes, go to Step 19.
    • If no, contact Huawei technical support.

      If the information similar to the following is displayed, the active and standby databases are disconnected:

       Ha state: 
              LOCAL_ROLE                     : Standby 
              STATIC_CONNECTIONS             : 1 
              DB_STATE                       : NeedRepair 
              DETAIL_INFORMATION             : repl1: Disconnected 
       
       Senders info: 
              No information 
       Receiver info: 
              No information

  19. Log in to the database server where the alarm is generated using the database administrator account.
  20. Run the shell command gs_ctl restart to restart the database service.
  21. On the standby database server, run the shell command gs_ctl restart to restart the database service.
  22. Choose FusionManager > Monitoring or FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, contact Huawei technical support.
    • If no, go to Step 23.

  23. Checking whether the database data is complete by the following method.

    • Check whether the number of VMs, the user assignment relationships and so on are correct.
    • Check whether the operation logs are complete, whether recent operation logs are missing.
    • Compare whether the unclear alarm records of FusionManager > Monitoring and FusionAccess > Alarm are the same.

  24. After the above check, whether the database data is complete?

    • If yes, no further operation is required.
    • If no, contact Huawei technical support.

Related Information

None

Download
Updated: 2019-03-01

Document ID: EDOC1100010511

Views: 20620

Downloads: 12

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next