No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-37024 Clusters Are Unbalanced

ALM-37024 Clusters Are Unbalanced

Description

This alarm is generated when active and standby instances in a cluster switch over, which is inconsistent with the initial cluster status.

Attribute

Alarm ID

Alarm Severity

Auto Clear

37024

Major

Yes

Parameters

Name

Meaning

ServiceName

Identifies the service for which the alarm is generated.

RoleName

Identifies the role for which the alarm is generated.

Impact on the System

If the alarm is reported, active and standby GTMs or DataNodes switch over in the cluster, and the new active and standby relationship is different from the initial status. Active instances in the cluster may be excessively switched over to a node, causing unbalanced cluster loads and affecting cluster performance.

Possible Causes

The active and standby relationship of the DataNode instance is abnormal.

  • The active DataNode instance is invalid and cannot provide external services.
  • The active and standby DataNodes are disconnected.
  • The active and standby DataNodes are manually switched over.

The relationship of the active and standby GTM instances is abnormal.

  • The active GTM instance is invalid and cannot provide external services.
  • The active and standby GTM instances are disconnected.
  • The active and standby GTM instances are manually switched over.

Procedure

Locate the alarm cause.

  1. Log in to the FusionInsight Manager.

    1. Log in to the ManageOne OM plane using a browser, then choose Alarms.
      • Login address: https://URL for the homepage of the ManageOne OM plane:31943. Example: https://oc.type.com:31943.
      • Default username: admin, default password: Huawei12#$.
    2. In the alarm list, locate and click the target alarm name in the Name column. The Alarm Details and Handling Recommendations dialog box is displayed.
    3. Locate the value in the IP Address/URL/Domain Name column, which is the float IP address of the FusionInsight Manager.
    4. Log in to the FusionInsight Manager using a browser.
      • Login address: https://float IP address of the FusionInsight Manager:28443/web. Example: https://10.10.192.100:28443/web.
      • Default username: admin, default password: obtain it from the system administrator.

  2. On FusionInsight Manager, choose Services > MPPDB > Instances, and obtain the nodes where the MPPDB instance residies.
  3. Log in to any LibrA instance node as user omm and run the source command to configure the environment variables and the gs_om -t status --detail command to check the cluster status (provided that the cluster installation directory is /opt/huawei/Bigdata).

    Default user: omm, default password: Bigdata123@.

    source /opt/huawei/Bigdata/mppdb/.mppdbgs_profile

    gs_om -t status --detail

  4. If cluster_state is Normal and balanced is No, as shown in the following figure, the active and standby instances are switched(The following information in bold in the Datanode State area is displayed: P indicates that the initial status is Primary. When the standby DN is switched, the status changes to Standby Normal. ) . Rectify the fault by referring to "Resetting Instance Status" in the Product Documentation.

    [  CMServer State   ]
    
    node              node_ip       instance                                          state
    -------------------------------------------------------------------------------------------
    1  SZX1000071373  10.90.57.221  1    /opt/huawei/Bigdata/mppdb/cm/cm_server       Primary
    2  SZX1000071374  10.90.57.222  2    /opt/huawei/Bigdata/mppdb/cm/cm_server       Standby
    
    [   Cluster State   ]
    
    cluster_state   : Normal
    redistributing  : No
    balanced        : No
    
    [ Coordinator State ]
    
    node              node_ip       instance                                       state
    ------------------------------------------------------------------------------------------
    1  SZX1000071373  10.90.57.221  5001 /srv/BigData/mppdb/data1/coordinator     Normal
    2  SZX1000071374  10.90.57.222  5002 /srv/BigData/mppdb/data1/coordinator     Normal
    3  SZX1000071375  10.90.57.223  5003 /srv/BigData/mppdb/data1/coordinator     Normal
    
    [ Central Coordinator State ]
    node          node_ip         instance                                   state
    --------------------------------------------------------------------------------
    2  SZX1000071374  10.90.57.222  5002 /srv/BigData/mppdb/data1/coordinator     Normal  
    
    [     GTM State     ]
    
    node              node_ip       instance                               state                    sync_state
    ------------------------------------------------------------------------------------------------------------
    2  SZX1000071374  10.90.57.222  1001 /opt/huawei/Bigdata/mppdb/gtm     P Primary Connection ok  Sync
    1  SZX1000071373  10.90.57.221  1002 /opt/huawei/Bigdata/mppdb/gtm     S Standby Connection ok  Sync
    
    [  Datanode State   ]
    node             node_ip         instance                                state            | node             node_ip      instance                                  state            | node             node_ip      instance                                      state
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1  SZX1000071373 10.90.57.221 6001 /srv/BigData/mppdb/data1/master1     P Primary Normal | 2  SZX1000071374 10.90.57.222 6002 /srv/BigData/mppdb/data1/slave1     S Standby Normal | 3  SZX1000071375 10.90.57.223 3002 /srv/BigData/mppdb/data1/dummyslave1     R Secondary Normal
    1  SZX1000071373 10.90.57.221 6003 /srv/BigData/mppdb/data2/master2     P Primary Normal | 3  SZX1000071375 10.90.57.223 6004 /srv/BigData/mppdb/data2/slave2     S Standby Normal | 2  SZX1000071374 10.90.57.222 3003 /srv/BigData/mppdb/data2/dummyslave2     R Secondary Normal
    1  SZX1000071373 10.90.57.221 6005 /srv/BigData/mppdb/data3/master3     P Primary Normal | 2  SZX1000071374 10.90.57.222 6006 /srv/BigData/mppdb/data3/slave3     S Standby Normal | 3  SZX1000071375 10.90.57.223 3004 /srv/BigData/mppdb/data3/dummyslave3     R Secondary Normal
    1  SZX1000071373 10.90.57.221 6007 /srv/BigData/mppdb/data4/master4     P Primary Normal | 3  SZX1000071375 10.90.57.223 6008 /srv/BigData/mppdb/data4/slave4     S Standby Normal | 2  SZX1000071374 10.90.57.222 3005 /srv/BigData/mppdb/data4/dummyslave4     R Secondary Normal
    2  SZX1000071374 10.90.57.222 6009 /srv/BigData/mppdb/data1/master1     P Primary Normal | 3  SZX1000071375 10.90.57.223 6010 /srv/BigData/mppdb/data1/slave1     S Standby Normal | 1  SZX1000071373 10.90.57.221 3006 /srv/BigData/mppdb/data1/dummyslave1     R Secondary Normal
    2  SZX1000071374 10.90.57.222 6011 /srv/BigData/mppdb/data2/master2     P Standby Normal | 1  SZX1000071373 10.90.57.221 6012 /srv/BigData/mppdb/data2/slave2     S Standby Normal | 3  SZX1000071375 10.90.57.223 3007 /srv/BigData/mppdb/data2/dummyslave2     R Secondary Normal
    2  SZX1000071374 10.90.57.222 6013 /srv/BigData/mppdb/data3/master3     P Primary Normal | 3  SZX1000071375 10.90.57.223 6014 /srv/BigData/mppdb/data3/slave3     S Standby Normal | 1  SZX1000071373 10.90.57.221 3008 /srv/BigData/mppdb/data3/dummyslave3     R Secondary Normal
    2  SZX1000071374 10.90.57.222 6015 /srv/BigData/mppdb/data4/master4     P Primary Normal | 1  SZX1000071373 10.90.57.221 6016 /srv/BigData/mppdb/data4/slave4     S Standby Normal | 3  SZX1000071375 10.90.57.223 3009 /srv/BigData/mppdb/data4/dummyslave4     R Secondary Normal
    3  SZX1000071375 10.90.57.223 6017 /srv/BigData/mppdb/data1/master1     P Primary Normal | 1  SZX1000071373 10.90.57.221 6018 /srv/BigData/mppdb/data1/slave1     S Standby Normal | 2  SZX1000071374 10.90.57.222 3010 /srv/BigData/mppdb/data1/dummyslave1     R Secondary Normal
    3  SZX1000071375 10.90.57.223 6019 /srv/BigData/mppdb/data2/master2     P Primary Normal | 2  SZX1000071374 10.90.57.222 6020 /srv/BigData/mppdb/data2/slave2     S Standby Normal | 1  SZX1000071373 10.90.57.221 3011 /srv/BigData/mppdb/data2/dummyslave2     R Secondary Normal
    3  SZX1000071375 10.90.57.223 6021 /srv/BigData/mppdb/data3/master3     P Primary Normal | 1  SZX1000071373 10.90.57.221 6022 /srv/BigData/mppdb/data3/slave3     S Standby Normal | 2  SZX1000071374 10.90.57.222 3012 /srv/BigData/mppdb/data3/dummyslave3     R Secondary Normal
    3  SZX1000071375 10.90.57.223 6023 /srv/BigData/mppdb/data4/master4     P Primary Normal | 2  SZX1000071374 10.90.57.222 6024 /srv/BigData/mppdb/data4/slave4     S Standby Normal | 1  SZX1000071373 10.90.57.221 3013 /srv/BigData/mppdb/data4/dummyslave4     R Secondary Normal

  5. If cluster_state is Degraded, as shown in the following figure, go to 6.

    [  CMServer State   ]
    
    node              node_ip       instance                                          state
    -------------------------------------------------------------------------------------------
    1  SZX1000071373  10.90.57.221  1    /opt/huawei/Bigdata/mppdb/cm/cm_server       Primary
    2  SZX1000071374  10.90.57.222  2    /opt/huawei/Bigdata/mppdb/cm/cm_server       Standby
    
    [   Cluster State   ]
    
    cluster_state   : Degraded
    redistributing  : No
    balanced        : No
    
    [ Coordinator State ]
    
    node              node_ip       instance                                       state
    ------------------------------------------------------------------------------------------
    1  SZX1000071373  10.90.57.221  5001 /srv/BigData/mppdb/data1/coordinator     Normal
    2  SZX1000071374  10.90.57.222  5002 /srv/BigData/mppdb/data1/coordinator     Normal
    3  SZX1000071375  10.90.57.223  5003 /srv/BigData/mppdb/data1/coordinator     Normal
    
    [ Central Coordinator State ]
    node          node_ip         instance                                   state
    --------------------------------------------------------------------------------
    2  SZX1000071374  10.90.57.222  5002 /srv/BigData/mppdb/data1/coordinator     Normal  
    
    [     GTM State     ]
    
    node              node_ip       instance                               state                    sync_state
    ------------------------------------------------------------------------------------------------------------
    2  SZX1000071374  10.90.57.222  1001 /opt/huawei/Bigdata/mppdb/gtm     P Primary Connection ok  Sync
    1  SZX1000071373  10.90.57.221  1002 /opt/huawei/Bigdata/mppdb/gtm     S Standby Connection ok  Sync
    
    [  Datanode State   ]
    node             node_ip         instance                                state            | node             node_ip      instance                                  state            | node             node_ip      instance                                      state
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1  SZX1000071373 10.90.57.221 6001 /srv/BigData/mppdb/data1/master1     P Primary Normal | 2  SZX1000071374 10.90.57.222 6002 /srv/BigData/mppdb/data1/slave1     S Standby Normal | 3  SZX1000071375 10.90.57.223 3002 /srv/BigData/mppdb/data1/dummyslave1     R Secondary Normal
    1  SZX1000071373 10.90.57.221 6003 /srv/BigData/mppdb/data2/master2     P Primary Normal | 3  SZX1000071375 10.90.57.223 6004 /srv/BigData/mppdb/data2/slave2     S Standby Normal | 2  SZX1000071374 10.90.57.222 3003 /srv/BigData/mppdb/data2/dummyslave2     R Secondary Normal
    1  SZX1000071373 10.90.57.221 6005 /srv/BigData/mppdb/data3/master3     P Primary Normal | 2  SZX1000071374 10.90.57.222 6006 /srv/BigData/mppdb/data3/slave3     S Standby Normal | 3  SZX1000071375 10.90.57.223 3004 /srv/BigData/mppdb/data3/dummyslave3     R Secondary Normal
    1  SZX1000071373 10.90.57.221 6007 /srv/BigData/mppdb/data4/master4     P Primary Normal | 3  SZX1000071375 10.90.57.223 6008 /srv/BigData/mppdb/data4/slave4     S Standby Normal | 2  SZX1000071374 10.90.57.222 3005 /srv/BigData/mppdb/data4/dummyslave4     R Secondary Normal
    2  SZX1000071374 10.90.57.222 6009 /srv/BigData/mppdb/data1/master1     P Down    Disk damaged | 3  SZX1000071375 10.90.57.223 6010 /srv/BigData/mppdb/data1/slave1     S Primary Normal | 1  SZX1000071373 10.90.57.221 3006 /srv/BigData/mppdb/data1/dummyslave1     R Secondary Normal
    2  SZX1000071374 10.90.57.222 6011 /srv/BigData/mppdb/data2/master2     P Primary Normal | 1  SZX1000071373 10.90.57.221 6012 /srv/BigData/mppdb/data2/slave2     S Standby Normal | 3  SZX1000071375 10.90.57.223 3007 /srv/BigData/mppdb/data2/dummyslave2     R Secondary Normal
    2  SZX1000071374 10.90.57.222 6013 /srv/BigData/mppdb/data3/master3     P Primary Normal | 3  SZX1000071375 10.90.57.223 6014 /srv/BigData/mppdb/data3/slave3     S Standby Normal | 1  SZX1000071373 10.90.57.221 3008 /srv/BigData/mppdb/data3/dummyslave3     R Secondary Normal
    2  SZX1000071374 10.90.57.222 6015 /srv/BigData/mppdb/data4/master4     P Primary Normal | 1  SZX1000071373 10.90.57.221 6016 /srv/BigData/mppdb/data4/slave4     S Standby Normal | 3  SZX1000071375 10.90.57.223 3009 /srv/BigData/mppdb/data4/dummyslave4     R Secondary Normal
    3  SZX1000071375 10.90.57.223 6017 /srv/BigData/mppdb/data1/master1     P Primary Normal | 1  SZX1000071373 10.90.57.221 6018 /srv/BigData/mppdb/data1/slave1     S Standby Normal | 2  SZX1000071374 10.90.57.222 3010 /srv/BigData/mppdb/data1/dummyslave1     R Secondary Normal
    3  SZX1000071375 10.90.57.223 6019 /srv/BigData/mppdb/data2/master2     P Primary Normal | 2  SZX1000071374 10.90.57.222 6020 /srv/BigData/mppdb/data2/slave2     S Standby Normal | 1  SZX1000071373 10.90.57.221 3011 /srv/BigData/mppdb/data2/dummyslave2     R Secondary Normal
    3  SZX1000071375 10.90.57.223 6021 /srv/BigData/mppdb/data3/master3     P Primary Normal | 1  SZX1000071373 10.90.57.221 6022 /srv/BigData/mppdb/data3/slave3     S Standby Normal | 2  SZX1000071374 10.90.57.222 3012 /srv/BigData/mppdb/data3/dummyslave3     R Secondary Normal
    3  SZX1000071375 10.90.57.223 6023 /srv/BigData/mppdb/data4/master4     P Primary Normal | 2  SZX1000071374 10.90.57.222 6024 /srv/BigData/mppdb/data4/slave4     S Standby Normal | 1  SZX1000071373 10.90.57.221 3013 /srv/BigData/mppdb/data4/dummyslave4     R Secondary Normal

  6. It can be seen from the part in bold and italic that active dn_6009 is in the Down state and the standby dn_6010 is switched over to the active one, causing excessive active dn instances on the SZX1000071374 node. Run the gs_replace command to rectify the faulty dn_6009.

    NOTE:

    Take the switchover of the DataNode instance as an example. If the switchover of the GTM instance is abnormal, the handling method is the same.

    omm@SZX1000071374:/srv/BigData/mppdb/data2> gs_replace -t config -h SZX1000071374
    Fixing all the CMAgents instances.
    There are [0] CMAgents need to be repaired in cluster.
    Configuring replacement instances.
    Successfully configured replacement instances.
    Successfully fixed all the CMAgents instances.
    Configuring
    Waiting for promote peer instances.
    
    Successfully upgraded standby instances.
    Deleting failed CN from pgxc_node.
    No CN needs to be fixed.
    Configuring replacement instances.
    Successfully configured replacement instances.
    Setting the SCTP.
    Successfully set the SCTP.
    Configuration succeeded.

  7. Run the following command on the host where the instance is to be replaced.

    omm@SZX1000071374:/srv/BigData/mppdb/data2> gs_replace -t start -h SZX1000071374
    Starting.
    ======================================================================
    Successfully started instance process. Waiting to become Normal.
    ======================================================================
    .
    ======================================================================
    Start succeeded on all nodes.
    Start succeeded.

  8. Reset the instance status.

    NOTE:

    Switchover is performed for maintenance. Before a switchover, ensure the cluster is running properly, all services are stopped, and the pgxc_get_senders_catchup_time() view shows no ongoing catchup between the primary and standby nodes.

    omm@SZX1000071374:/srv/BigData/mppdb/data2> gs_om -t switch --reset
    Operating: Switch reset.
    cm_ctl: cmserver is rebalancing the cluster automatically.
    .....
    cm_ctl: switchover successfully.
    Operation succeeded: Switch reset.

  9. Wait for a while and check whether the alarm persists.

    • If yes, go to 10.
    • If no, no further action is required.

Collect fault information.

  1. On FusionInsight Manager, choose System > Log Download.
  2. Select MPPDB from the Services drop-down list box and click OK.
  3. Set Start Time for log collection to 1 hour ahead of the alarm generation time and End Time to 1 hour after the alarm generation time, and click Download.
  4. Contact Technical Support and send the collected logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 38048

Downloads: 31

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next