No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

The ESC node’s clc process restarting repeatedly causes we can’t see the CNA node’s virtual machine in the OMS Portal

Publication Date:  2012-11-21 Views:  32 Downloads:  0
Issue Description
The version of the platform: V100R002C00SPC100, the management node is a single node networking, which includes two clusters, the engineers reflected that they could see the virtual machine in the user’s self-service system, however they couldn’t see the virtual machines in the CNA node while viewing each cluster’s CNA nodes in the OMS Portal, and the virtual machines can’t be logon in normally, they ask for help to settle this problem.

Alarm Information
There is the 002016 alarm: the elastic computing service has stopped or failed in the ESC node.

Handling Process
1. Because the ESC node’s Watchdog hasn’t deleted the “/opt/eucalyptus/restartFlag” restart flag file successfully, so it causes the “restartFlag” file existing in the ESC node all the time, and the Watchdog will consider the CLC process is abnormal. Delete the ESC node’s file “opt/eucalyptus/restartFlag” manually, and then restart up the ESC process:
sh /opt/setup/eucalyptus-cloud start
From then on, we find the ESC process’s status keeps on normal, and the see from the OMS Portal, we find the virtual machines in the CNA node reporting normally.
2. The problem that the Watchdog can’t delete the “/opt/eucalyptus/restartFlag” restart flag file rightly has been settled in the version of V100R002C00SPC200.

Root Cause
1. In the VMSUSER, the virtual machine’s information is queried from the “eucalyptus” database of the ESC node, however, in the OMS Portal, the virtual machine’s status information of in the CNA node displays after it has been reported to the ESC from each cluster’s CRM node. From the symptom of the fault, there is virtual machine information in the ESC node database, but the information reported by the cluster CRM hasn’t been received by the ESC, the possible reason are as followed:
(1) The CRM node’s process has problem, it hasn’t reported the virtual machine’s status to the ESC node;
(2) The ESC node’s process has problem, it hasn’t received the virtual machine’s information reported by the CRM node;
(3) The network has problem, which leads to the information hasn’t been transmitted normally.
2. Check the CRM node’s process, we find its status is normal.


3. Check the ESC node’s process, its status is abnormal.


Restart up the ESC node’s process, check its status, we find it change from normal to abnormal fast:


4. After the researcher analyzing the ESC logs deeply, they find that the reason of the CLC process is abnormal is that the Watchdog making the CLC process to restart repeatedly, once the Watchdog find the CLC process has only one of the following three conditions, it will consider the CLC process is abnormal, and then make the process restarting:
(1) The number of TCP connections at the 8773 port is 0;
(2) There exists the “/opt/eucalyptus/restartFlag” file;
(3) It doesn’t have the “eucalyptus” certification.
5. Check the ESC node, we find there is the “restartFlag” file in the “/opt/eucalyptus/” catalog, while the CLC process starting up, there will start some startup items, if one of these startup items hasn’t started up successfully, there will produce the “/opt/eucalyptus/restartFlag” flag file, before the CLC process starting, the Watchdog will delete the “/opt/eucalyptus/restartFlag” flag file. However, because there is software bug in the R002C00SPC100 which causing to delete the flag file “/opt/flag” in the version of R001C01, then the Watchdog has deleted the restart flag file, so each time starts the CLC process, it will be see as abnormal, and then the CLC process has been restarted repeatedly.

Suggestions
None

END