No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-38003 etcd Is Abnormal

ALM-38003 etcd Is Abnormal

Description

This alarm is reported when the etcd is abnormal.

Attribute

Alarm ID

Alarm Severity

Alarm Type

38003

Major

Operational alarms

Parameters

Parameter

Description

serviceName

Indicates the name of the service that reports the alarm.

namespace

Indicates the namespace of the service that reports the alarm.

instanceName

Indicates the name of the service instance for which the alarm is reported.

srvAddr

Indicates the access address of the service that reports the alarm.

Impact on the System

The etcd function is unavailable, and data cannot be added, deleted, or updated.

Possible Causes

  • etcd is faulty due to a container network fault.
  • etcd is abnormal due to insufficient disk space.
  • The etcd cluster is abnormal due to a disk I/O fault.

Procedure

Figure 17-8 shows the procedure for handling the alarm.

Figure 17-8 Process of handling an alarm
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Check whether the etcd container network is normal.

    1. Run the following command to query the IP address of the node (etcd-server is used as an example):

      kubectl get pod -nfst-manage -owide|grep etcd-server

      etcd-server-paas-10-31-30-182     1/1       Running   0          31m       10.31.30.182    paas-10-31-30-182
      etcd-server-paas-10-31-30-175     1/1       Running   0          31m       10.31.30.175    paas-10-31-30-175
      etcd-server-paas-10-31-30-217     1/1       Running   0          31m       10.31.30.217    paas-10-31-30-217 
    2. Run the following command to log in to the etcd-server-paas-10-31-30-182 node:

      ssh 10.31.30.182

      10.31.30.182 is the IP address of the node queried in 3.a.

    3. Run the following command to query the ID of the etcd-server container:

      docker ps | grep etcd-server | grep -v pause

      ddcb24c2ebf9        cfe-etcd:1.12.25                                             "/bin/sh -c 'umask 07"   5 hours ago         Up 5 hours                              k8s_etcd-container.edd2eda1_etcd-server-172.31.30.182_om_f83f596e83c4489260098a6163385718_ed83203e
    4. Run the following command to access the etcd-server container:

      docker exec -ti ddcb24c2ebf9 bash

      ddcb24c2ebf9 is the container ID obtained in 3.c.

    5. Run the following commands to check whether the network connection between etcd-server-paas-10-31-30-182, etcd-server-paas-10-31-30-175, and etcd-server-paas-10-31-30-217 is normal, respectively:

      ping 10.31.30.175

      ping 10.31.30.217

      10.31.30.175 and 10.31.30.217 are the node IP addresses obtained in 3.a.

      64 bytes from 10.31.30.175: icmp_seq=317 ttl=64 time=0.146 ms
      64 bytes from 10.31.30.175: icmp_seq=318 ttl=64 time=0.159 ms
      64 bytes from 10.31.30.175: icmp_seq=319 ttl=64 time=0.200 ms
      64 bytes from 10.31.30.175: icmp_seq=320 ttl=64 time=0.155 ms
      ....
      • If information similar to the preceding is displayed, the network connection is normal. Go to 5.
      • Otherwise, go to 4.

  4. Rectify the container network fault.

    1. Run the following command to check whether the network configuration is 1:

      cat /proc/sys/net/ipv4/ip_forward

      • If yes, go to 5.
      • If no, go to 4.b.
    2. Run the following command to change the network configuration to 1:

      echo "1" > /proc/sys/net/ipv4/ip_forward

    3. Perform operations provided in 3.e.
      • If the container network is normal, go to 4.d.
      • If the container network is abnormal, go to 8.
    4. Check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, run the exit command to exit the container and go to 5.

  5. Run the following commands to check whether the disk space is full:

    cd /var/paas/run

    df -h . | grep 100%

    /dev/mapper/opt_vg-vol_opt 102687672 22225952  75202456  100% /opt
    • If information similar to the preceding is displayed, the disk space is used up. Go to 6.
    • Otherwise, go to 7.

  6. Clean up the disk space.

    1. Run the following command to go to the disk directory:

      cd /opt

      /opt is the directory in the command output of 5.

    2. Run the following command to delete a file or subdirectory from the directory:

      rm -rf File name or directory

      • You can run the ls command to view the files and subdirectories in the directory.
      • Before running the command, ensure that the product functions are not affected after files or subdirectories are deleted.
    3. After the deletion is complete, check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 7.

  7. Run the following command to query the system I/O status:

    iostat -x 1

    Linux 3.10.0-327.36.58.4.x86_64 (CFE-ETCD03) 12/05/2017 _x86_64_(4 CPU)
    
    avg-cpu:%user%nice %system %iowait%steal%idle
    8.820.0015.050.070.0875.98
    
    Device:rrqm/swrqm/sr/sw/srkB/swkB/s avgrq-sz avgqu-szawait r_await w_awaitsvctm%util
    vda0.025.860.089.443.13104.2922.590.021.695.571.660.700.67
    vdb0.000.000.000.000.000.0014.120.000.350.350.000.340.00
    dm-00.000.000.000.510.000.692.740.000.614.170.610.500.03
    dm-10.000.000.081.172.9640.8269.980.016.5624.305.360.420.05
    dm-20.000.000.081.172.9640.8269.980.016.5624.315.360.420.05
    dm-40.000.000.000.000.000.0037.060.000.750.623.750.590.00
    dm-50.000.000.000.020.021.59163.470.002.584.792.552.270.00
    dm-30.000.000.000.000.020.00139.630.009.816.7633.402.120.00
    dm-60.000.000.000.000.000.0037.060.001.410.9910.941.010.00
    dm-70.000.000.000.000.040.00111.520.0013.568.7932.813.450.00

    If a large amount of 100 or 99 is displayed in the %util column, the system I/O resources are used up.

  8. Contact technical support for assistance.

Alarm Clearing

This alarm will be automatically cleared after the fault is rectified.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 35453

Downloads: 31

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next