No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionAccess 6.5 Alarm Handling 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
1004001 Database Server Abnormal

1004001 Database Server Abnormal

Description

The database server sends a heartbeat message to the IT adapter (ITA) every 2 minutes. The heartbeat message contains CPU usage and memory usage. This alarm is generated when the ITA does not receive the database heartbeat message for three consecutive times.

This alarm is cleared when the ITA receives the database heartbeat message again.

Attribute

Alarm ID

Alarm Severity

Auto Clear

1004001

Critical

Yes

Parameters

Name

Meaning

Alarm ID

Identifies an alarm. Each alarm is uniquely identified by an alarm ID and an alarm name.

Alarm Severity

Indicates the severity of an alarm. Value:

  • Critical indicates that a fault affecting services provided by the system occurs. You need to rectify the fault immediately. If a device or resource is faulty, rectify it immediately even if the fault occurs during non-working hours.
  • Major: indicates that a fault affecting the service quality of the system occurs. You need to rectify the fault immediately. If the service quality of a device or resource is degraded, rectify it immediately during working hours.
  • Minor: indicates a fault that does not affect service quality. To prevent more serious faults, this type of alarm needs to be observed or handled if necessary.
  • Warning: indicates a fault that may affect service quality. This type of alarm must be handled based on the error type.

Alarm Name

Identifies an alarm. Each alarm is uniquely identified by an alarm ID and an alarm name.

Object Type

Specifies the type of the object for which the alarm is generated.

Alarm Object Name

Specifies the name of the object for which the alarm is generated.

Generation Time

Specifies the time when the alarm is generated.

Clear Time

Specifies the time when the alarm is cleared.

Clear Mode

Specifies whether the alarm is manually or automatically cleared.

Operation

Specifies the operation that can be performed on the alarm.

Value: Manually Clear Alarm

Impact on the System

Database Server Abnormal will result in serious consequences, for example, the standby database service is unavailable, and the data in the active and standby databases is inconsistent. The database server should remain running state, if this alarm appeared, you must handle it on the same day.

Possible Causes

  • IP address conflict.
  • The IP address has been changed. The alarms generated by the IP address must be manually cleared.
  • The database service is not running properly.
  • The network is faulty.
  • The HA service of the database server that generated the alarm is abnormal.

Procedure

  1. Choose FusionAccess > Alarm to check whether alarm 1000033 HA Active/Standby Heartbeat Fault exists and whether the IP address displayed in peer IP address in Detailed Alarm Information is the same as that of the abnormal database server.

  2. Choose FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to 3.
    • If no, no further operation is required.

  3. Log in to the server where the alarm is generated using an administrator account and run the arping -c 3 -f -D -I eth0 IP address of the server where the alarm is generated command to check whether IP conflict occurs.

    • If yes, go to 4.
    • If no, go to 6.

      If the information similar to the following is displayed, no IP conflict occurs:

      ARPING 192.168.162.11 from 0.0.0.0 eth0 
      Sent 3 probes (3 broadcast(s)) 
      Received 0 response(s) 
      (Note: The IP addresses are only examples. Use the actual IP addresses.)     

      If the information similar to the following is displayed, IP conflict occurs:

      ARPING 192.168.162.11 from 0.0.0.0 eth0 
      Unicast reply from 192.168.162.11 [12:6E:D4:AB:CD:EF]  1.022ms 
      Sent 1 probes (1 broadcast(s)) 
      Received 1 response(s) 
      (Note: The preceding IP addresses and MAC addresses are only examples. Use the actual IP addresses and MAC addresses.)     

  4. Log in to the server that causes the IP conflict, shut down the server or change the server IP address, and run the arping -c 3 -f -D -I eth0 IP address of the server where the alarm is generated command again on the server where the alarm is generated to check whether the IP conflict persists.

    • If yes, contact Huawei technical support.
    • If no, go to 5.

  5. Choose FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to 6.
    • If no, no further operation is required.

  6. Log in to the ITA server as user gandalf, and check whether the database server network is normal. Run ping -c 3 IP address of the database server for which the alarm is generated to check whether the communication is normal.

    • If yes, go to 8.
    • If no, go to 7.

      The communication is normal if the command output is as follows:

      PING 192.168.190.2 (192.168.190.2) 56(84) bytes of data. 
      64 bytes from 192.168.190.2: icmp_seq=1 ttl=64 time=0.047 ms 
      64 bytes from 192.168.190.2: icmp_seq=2 ttl=64 time=0.057 ms 
      64 bytes from 192.168.190.2: icmp_seq=3 ttl=64 time=0.058 ms 
      (Note: The IP addresses are only examples. Use the actual IP addresses.)     

  7. Locate and rectify the network fault based on the actual situation on site.
  8. Choose FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, go to 9.
    • If no, no further operation is required.

  9. Log in to the DB server for which the alarm is generated using a database administrator account, and run the shell command gs_ctl status -P database administrator password to check whether the database service is normal.

    • If yes, contact Huawei technical support.
    • If no, run the shell command gs_ctl restart to restart the database service.

      If the information similar to the following is displayed, the database service is normal:

      gs_ctl: server is running     

  10. Repeat 9 to check whether the database service is normal.

    • If yes, go to 11.
    • If no, contact Huawei technical support.

  11. Choose FusionAccess > Alarm to check whether the alarm still exists.

    • If yes, contact Huawei technical support.
    • If no, no further action is required.

Related Information

None

Translation
Download
Updated: 2019-10-11

Document ID: EDOC1100061083

Views: 4085

Downloads: 14

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next