When the cm_ctl stop -n num command is used to stop an instance of FusionInsight LibrA C80, an error message indicating that the instance fails to be stopped is displayed and the cluster becomes unavailable, as shown in the following figure.
1. Check the CM run logs on the isolated host:
2. It is found that the shutdown command is executed cyclically. As a result, the stop command times out.
3. Check the cluster status. It is found that the standby GTM has not been promoted to the active status. As a result, the cluster is unavailable.
4. Locate the log according to the error message displayed in Step 2.
5. It is found that the remote connection cannot be stopped.
6. Query the gtm running status.
It is found that the gtm is still running.
7. Manually kill the process, it is found that the cluster status is restored to Degraded.
8. Rectify the faulty instance.
9. Start the restored host.
10. Reconfigure the load balancing status
11. It is found that the cluster is running properly.
A host where the active GTM is located cannot be stopped by the cm_ctl stop command. Specifically, hosts where the active GTM exists is not defined in the logic of this command. The GTM can be stopped only when there is no other links, but in this case, the GTM is not the last to be stopped by this command. Therefore, the active GTM process cannot be completely stopped and the standby GTM is not promoted to the active GTM process. As a result, the host where the active GTM process resides fails to be isolated.
Manually kill the GTM process and restore the instance. For details, see Step 6 to Step 11 in the preceding handling process.
Improve the cm_ctl stop command logic and fix its logic bug. If the host to be isolated is not a management node, isolate it through the FusionInsight GUI.