Causes of an Automomatic Active/Standby Switchover During the Running Of the HA System
Question
What causes lead to an active/standby switchover during the running of the HA system?
Answer
- The resources on the active node are abnormal. As a result, the weights of the resources monitored by the active and standby nodes are inconsistent, triggering an active/standby switchover.
You can open the HA runlog file on the node with a smaller heartbeat IP address, search for the keyword sure, and check whether error logs exist in the context. If yes, the resources on the active node are abnormal.
- If the CPU, memory, or I/O usage on the active node is too high, the resource script execution may time out. As a result, resources are abnormal and an active/standby switchover is triggered.
You can check whether the HA run logs contain error code 129. If yes, the resource script execution times out.
- The heartbeat between the active and standby nodes is interrupted for a period longer than the breakdown (the deadtime value configured in the /home/data/ha/module/haarb/conf/haarb.xml file is 500 milliseconds by default). The floating IP address cannot be pinged from the standby node, but the arbitration IP address can be pinged from the standby node. Therefore, the standby node can be switched to the active node, and an active/standby switchover is triggered.
You can check whether the HA run logs contain exception logs about heartbeat interruption. If yes, the heartbeat between the active and standby nodes is interrupted.