ALM-14105 Abnormal Instance Node Status
Description
This alarm is generated when the standby node is faulty. If the alarm is not cleared and the status of the active node is abnormal, the instance service is unavailable or the data is abnormal.
Attribute
Alarm ID |
Alarm Severity |
Alarm Type |
---|---|---|
14105 |
Major |
Communications alarm |
Alarm Parameters
Parameter |
Description |
---|---|
instanceId |
Specifies the instance ID. |
alarmId |
Specifies the alarm ID. |
causeId |
Specifies the alarm cause ID. |
Parameter |
Description |
---|---|
hostName |
Specifies the name of the host for which the alarm is generated. |
hostIP |
Specifies the IP address of the host for which the alarm is generated. |
floatingIpAddress |
Specifies the IP address of the faulty node. |
cause |
Specifies the description of the alarm cause. |
Impact on the System
- Data cannot be automatically synchronized between the active and standby instances.
Possible Causes
The application that generates the alarm works unexpectedly.
Procedure
- Obtain the information about the application that generates the alarm.
Log in to FusionStage.
- Use a browser to log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B scenarios) as a VDC administrator or VDC operator.Login address:
- Login address in a non-B2B scenario: https://Address for accessing ManageOne Operation Portal, for example, https://console.demo.com.
- Login address in the B2B scenario: https://Address for accessing ManageOne Tenant Portal, for example, https://tenant.demo.com
- Select your region and then project from the drop-down list on the top menu bar.
Choose Console > Application > FusionStage from the main menu.
- Use a browser to log in to ManageOne Operation Portal (ManageOne Tenant Portal in B2B scenarios) as a VDC administrator or VDC operator.
- Choose Application Publishing > Application Management from the main menu.
- Select the application that generates the alarm and copy the application name.
- Use PuTTY to log in to the manage_lb1_ip node.
The default username is paas, and the default password is QAZ2wsx@123!.
- Run the following command and enter the password of the root user to switch to the root user:
su - root
Default password: QAZ2wsx@123!
- Run the following command to query the container:
kubectl get pod --all-namespaces |grep dcs-server-component-90478e21
Information similar to the following is displayed:
06954f029140408d844c21beb9a04407 dcs-server-component-90478e21-869fbc6d77-7gh9g 1/1 Running 0 4h 06954f029140408d844c21beb9a04407 dcs-server-component-90478e21-869fbc6d77-g7bjh 1/1 Running 0 4h
- Run the following command to access the container:
kubectl exec -ti dcs-server-component-90478e21-869fbc6d77-7gh9g -n 06954f029140408d844c21beb9a04407 /bin/bash
NOTE:
In the preceding command, dcs-server-component-90478e21-869fbc6d77-7gh9g and 06954f029140408d844c21beb9a04407 respectively indicate the application name and namespace obtained in 4.
- Go to the /opt/dcs/logs/dcs directory and run the following command to view the error information:
grep "${instance_id}" /opt/dcs/logs/dcs/dcs_status.log | grep "WARN"
In the preceding information, ${instance_id} must be changed to the value of instanceId in the alarm location information.
- Perform the following operations based on the log information:
- If the system displays a message indicating that the DCS-Server process cannot access the service VM, perform the following operations:
- Log in to the faulty node in VNC mode.
The default username is paas, and the default password is QAZ2wsx@123!.
- Check whether the SSH service runs properly.systemctl status sshd
- If the service is running properly, the following information is displayed. Go to 7.d.
- If the service is not running properly, run the following command to restart the SSH service:
systemctl restart sshd
- If the service is running properly, the following information is displayed. Go to 7.d.
- Run the following command and enter the password QAZ2wsx@123! of the root user to switch to the root user:
su - root
- Run the following command to check whether the paas user can log in remotely:
vim /etc/ssh/sshd_config
- If the following information is specified in the configuration file, the paas user can remotely log in to the service VM.
- If the information cannot be found, add it to the file.
- If the following information is specified in the configuration file, the paas user can remotely log in to the service VM.
- Check whether the network information is correctly configured.
- Run the following command to check whether the disk space is used up:
df -h
If the disk space is used up, the program cannot run properly.
- Log in to the faulty node in VNC mode.
- If the system displays a message indicating that the om command fails to be executed, perform the following operations to check whether the Redis process on the service VM is normal:
- Use PuTTY to log in to the Redis node.
The default username is paas, and the default password is QAZ2wsx@123!.
- Run the following command to check whether the Redis process is started:
ps -ef |grep redis
- Run the following command to start the Redis service:
/opt/dcs/redis/redis/data/ctl/redis_ctl start
- Go to the /var/log/dcs/redis/redis_run.log directory, view service run logs, and check whether Redis is running properly.
- If Redis is running properly, contact technical support.
- If Redis is not running properly, go to 7.e.
- Run the following commands to restart the Redis service:
/opt/dcs/redis/redis/data/ctl/redis_ctl stop
/opt/dcs/redis/redis/data/ctl/redis_ctl start
- Use PuTTY to log in to the Redis node.
- If the system displays a message indicating that the DCS-Server process cannot access the service VM, perform the following operations:
- If the issue persists, contact technical support.
Alarm Clearing
After the fault is rectified, the system automatically clears the alarm.
Related Information
None