Running Status of the Two-node Cluster Is Abnormal
Symptom
In normal cases, one device is the active node, and the other is the standby node. If both devices are active or standby nodes, the two-node cluster runs abnormally.
Possible Causes
- The heartbeat IP address and floating IP address used by the two-node cluster are occupied.
- The local device cannot detect the status of the peer device because the heartbeat link between the two devices is abnormal.
- The firewall configuration blocks the establishment of the heartbeat link of the two-node cluster.
- Container resources are abnormal.
- The arbitration IP address cannot be pinged on both ends.
Procedure
If both nodes are active nodes, perform the following steps:
- Check whether the heartbeat IP address and floating IP address are occupied by other devices.
- If no, go to the next step.
- If yes, change the heartbeat IP address and floating IP address to ensure that they are not used by other devices.
- Check whether the heartbeat link between the two devices is normal.
Check whether the heartbeat IP addresses of the two devices can communicate with each other. That is, ping the heartbeat IP address of the peer device from local device. If the IP address can be pinged, it indicates that the route between the two devices is reachable and the two devices can communicate with each other.
- If yes, go to the next step.
- If no, check the network and IP address configuration.
- Check whether the arbitration IP address can be pinged on both devices.
- If yes, go to the next step.
- If no, check the network and IP address configuration.
- If the fault persists, the firewall configuration may block the establishment of the heartbeat link between the two devices. (For Atlas 500 AI edge station, skip this step.)
Run the iptables -I INPUT 1 -s Client heartbeat IP address -p tcp -j ACCEPT command on the server to configure the firewall allowlist. The device with a smaller heartbeat IP address is the server, and the device with a larger heartbeat IP address is the client. For example, if the heartbeat IP addresses of the two devices are 10.90.130.11 and 10.90.130.12 respectively, the device whose heartbeat IP address is 10.90.130.11 is the server and the device whose heartbeat IP address is 10.90.130.12 is the client. In addition, to make the firewall whitelist configuration persistent, you are advised to add this command to the startup script. For example, for EulerOS, you can run the vi /etc/rc.local command to open the /etc/rc.local file and add this command to the end of the file.
- If the two-node cluster is still abnormal after the preceding problems are solved, contact Huawei technical support.
If both nodes are standby nodes, perform the following steps:
- Check whether the /home/data/ha/module/harm/conf/docker.xml file contains container resources.
- If yes, run the docker ps command to check whether the container resources are started.
- If yes, go to the next step.
- If no, check whether the container is abnormal. If the container is abnormal, rectify the container resource exception by referring to "Troubleshooting > Containers Are Running Abnormally" in the MindX Edge Application Deployment and Model Update Guide.
- If no, go to the next step.
- If yes, run the docker ps command to check whether the container resources are started.
- Check whether the arbitration IP address can be pinged on both devices.
- If yes, go to the next step.
- If no, check the network and IP address configuration.
- Contact Huawei technical support.