No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Multiple Abnormal ETCD Pods

Multiple Abnormal ETCD Pods

Symptom

Fault Symptom

Multiple etcd pods are abnormal.

Fault Locating
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the status of etcd pods on the Tenant Management zone:

    kubectl get pod -n fst-manage|grep etcd

    etcd-event-server-paas-10-31-30-153         1/1       err       0          5d
    etcd-event-server-paas-10-31-30-162         1/1       err       0          5d
    etcd-event-server-paas-10-31-30-214         1/1       Running   0          5d
    etcd-network-server-paas-10-31-30-153       1/1       Running   0          5d
    etcd-network-server-paas-10-31-30-162       1/1       Running   0          5d
    etcd-network-server-paas-10-31-30-214       1/1       Running   0          5d
    etcd-server-paas-10-31-30-153               1/1       Running   0          5d
    etcd-server-paas-10-31-30-162               1/1       Running   0          5d
    etcd-server-paas-10-31-30-214               1/1       Running   0          5d
    NOTE:

    The preceding command output displays some abnormal states of etcd pods. If two or more pods in the same cluster are faulty, multiple etcd pods are faulty, as shown in the preceding command output.

Troubleshooting

Prerequisites

The paas user has been added to the whitelist. For details, see Operations When the sudoCommand Failed to Be Run.

Procedure
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the IP address of the node where etcd-event resides:

    1. Run the following command to query the podip.

      kubectl get po -n fst-manage -owide | grep etcd-event

    2. Run the following command to query the IP address of the node where podip and etcd-event-server-paas-10-120-168-211.

      kubectl get pod etcd-event-server-paas-10-120-168-211 -n fst-manage -oyaml | grep hostIP

      hostIP:10.120.168.211

      etcd-event-server-paas-10-120-168-211 is the name of pod.

  4. Log in as the paas user to the node where etcd-event-server-paas-10-120-168-211 resides.
  5. Perform the following operations to locate the root cause of the etcd fault and rectify the fault accordingly:

    • Container network problems
      1. Run the following command to log in to the container of etcd-event:

        sudo docker ps |grep etcd-event

        0a1c9946060f        10.184.42.33:20202/root/cfe-etcd:2.2.4                   "/bin/sh -c 'umask 06"   3 minutes ago       Up 3 minutes                            k8s_etcd.902abe6d_etcd-0_manage_4072c181-888c-11e7-9423-286ed489be96_29e8d83b
        08131be7509a        10.184.42.33:20202/root/default/cfe-pause:2.8.7          "/pause"                 2 hours ago         Up 2 hours                              k8s_POD.2cdee072_etcd-0_manage_4072c181-888c-11e7-9423-286ed489be96_f5970a1f

        The container ID is displayed in the command output. In this case, the container ID is 0a1c9946060f.

        sudo docker exec -it 0a1c9946060f sh

      2. Run the following command to do the network detection.Detects whether the network connection between each pod of etcd-event is unobstructed.

        ping 10.120.168.214

        10.120.168.214 is podip in 3.a.

        If the preceding information is not displayed, contact technical support to rectify container network problems.

      3. Exit the container.

        exit

    • Disk space problems

      Run the following commands to check disk space:

      cd /var/paas/run

      df -h . | grep 100%

      If any information is displayed, the disk space is used up. Clear the disk space.

    • Disk I/O problems

      Run the following command to check system I/O status:

      iostat -x 1

      Linux 3.12.49-11-default (SZV1000269249) ?04/23/17 ?_x86_64_?(16 CPU) 
        
       avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
                  8.37    0.04   13.11    4.63    0.00   73.85 
        
       Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
       xvda              0.27    83.42    1.12   28.17    22.88   551.59    39.22     0.26    8.99    5.95    9.11   0.52   100
       xvde              0.75   219.17   12.87  355.79   276.25  8284.19    46.44     1.04    2.83    4.69    2.76   0.35  99
       dm-0              0.00     0.00    0.07    0.77     0.27     3.08     8.00     0.00    4.06    1.45    4.29   0.94   0.08 
       dm-1              0.00     0.00   11.79   15.72   265.92   251.92    37.64     0.25    8.91    5.05   11.80   0.46   1.25 
       dm-2              0.00     0.00   11.79   15.72   265.92   251.92    37.64     0.25    8.91    5.05   11.81   0.46   100
       xvdf              0.00   240.41    4.03  610.73   169.29  4264.36    14.42     1.71    2.78    4.61    2.76   0.97 100

      In the command output, if a large number of 100 and 99 are displayed in the %util column, system I/O is used up. In this case, contact IaaS technical support for system optimization.

    • If the preceding problem does not occur, see (Optional) Restoring a Faulty Tenant Management Zone cfe-etcd Cluster in Backup and Restoration Guide.
    NOTE:

    This section uses one etcd node as an example to show how to locate and rectify faults.

Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1908

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next