No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 03

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Multiple Abnormal ETCD Pods

Multiple Abnormal ETCD Pods

Symptom

Fault Symptom

Multiple etcd pods are abnormal.

Fault Locating
  • Use PuTTY to log in to the om_core1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

    Run the following command to query the status of etcd pods on the OM zone:

    kubectl get pod -nmanage|grep etcd

    etcd-events-server-paas-10-109-173-143        1/1       err   0          22h 
    etcd-events-server-paas-10-109-173-244        1/1       err   1          22h 
    etcd-events-server-paas-10-109-173-62         1/1       Running   3          23h 
    etcd-server-paas-10-109-173-143               1/1       Running   0          22h 
    etcd-server-paas-10-109-173-244               1/1       Running   1          22h 
    etcd-server-paas-10-109-173-62                1/1       Running   3          23h 
    network-etcd-server-paas-10-109-173-143       1/1       Running   0          23h 
    network-etcd-server-paas-10-109-173-244       0/1       Running   1          22h 
    network-etcd-server-paas-10-109-173-62        0/1       Running   0          23h
    NOTE:
    • The preceding command output displays some abnormal states of etcd pods.
    • If two or more pods in the same cluster are faulty, multiple etcd pods are faulty, as shown in the preceding command output.
  • Run the following command to query the status of etcd pods on the management zone:

    kubectl get pod -n manage|grep etcd|grep -v cse

    etcd-0           4/4       CrashBackoff   50          2h 
    etcd-1           4/4       ErrImaagePull   55          2h 
    etcd-2           4/4       Running   52          2h
    NOTE:
    • The preceding command output displays some abnormal states of etcd pods.
    • If two or more pods in the same cluster are faulty, multiple etcd pods are faulty, as shown in the preceding command output.

Troubleshooting

Prerequisites

The paas user has been added to the whitelist. For details, see Operations When the sudoCommand Failed to Be Run.

Procedure
  • Handling etcd Faults in the Tenant Management Zone
    1. Use PuTTY to log in to the om_core1_ip node.

      The default username is paas, and the default password is QAZ2wsx@123!.

    2. Run the following command to query the IP address of the node where etcd-0 resides:

      kubectl get pod etcd-0 -nmanage -oyaml | grep hostIP

      NOTE:

      In the preceding command, etcd-0 is the name of the abnormal pod obtained in section Symptom.

    3. Log in as the paas user to the node where etcd-0 resides.
    4. Perform the following operations to locate the root cause of the etcd fault and rectify the fault accordingly:
      • Container network problems
        1. Run the following command to log in to the container of etcd-0:

          sudo docker ps |grep etcd-0

          0a1c9946060f        10.184.42.33:20202/root/cfe-etcd:2.2.4                   "/bin/sh -c 'umask 06"   3 minutes ago       Up 3 minutes                            k8s_etcd.902abe6d_etcd-0_manage_4072c181-888c-11e7-9423-286ed489be96_29e8d83b
          08131be7509a        10.184.42.33:20202/root/default/cfe-pause:2.8.7          "/pause"                 2 hours ago         Up 2 hours                              k8s_POD.2cdee072_etcd-0_manage_4072c181-888c-11e7-9423-286ed489be96_f5970a1f

          The container ID is displayed in the command output. In this case, the container ID is 0a1c9946060f.

          sudo docker exec -it 0a1c9946060f sh

        2. Run the following command to check whether network connection between etcd-0, etcd-1, and etcd-2 is normal:

          ping etcd-1.etcd.manage

          4.2$ ping etcd-1.etcd.manage
          PING etcd-1.etcd.manage.svc.cluster.local (10.184.41.116) 56(84) bytes of data.
          64 bytes from etcd-network-0.etcd-network.manage.svc.cluster.local (10.184.41.116): icmp_seq=1 ttl=63 time=1.53 ms
          64 bytes from etcd-network-0.etcd-network.manage.svc.cluster.local (10.184.41.116): icmp_seq=2 ttl=63 time=1.97 ms

          If the preceding information is not displayed, contact technical support to rectify container network problems.

        3. Exit the container.
      • Login ETCD Exception node.
      • Disk space problems

        Run the following commands to check disk space:

        cd /var/paas/run

        df -h . | grep 100%

        If any information is displayed, the disk space is used up. Clear the disk space.

      • Disk I/O problems

        Run the following command to query system I/O status:

        iostat -x 1

        Linux 3.12.49-11-default (SZV1000269249) ?04/23/17 ?_x86_64_?(16 CPU) 
          
         avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
                    8.37    0.04   13.11    4.63    0.00   73.85 
          
         Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
         xvda              0.27    83.42    1.12   28.17    22.88   551.59    39.22     0.26    8.99    5.95    9.11   0.52   100
         xvde              0.75   219.17   12.87  355.79   276.25  8284.19    46.44     1.04    2.83    4.69    2.76   0.35  99
         dm-0              0.00     0.00    0.07    0.77     0.27     3.08     8.00     0.00    4.06    1.45    4.29   0.94   0.08 
         dm-1              0.00     0.00   11.79   15.72   265.92   251.92    37.64     0.25    8.91    5.05   11.80   0.46   1.25 
         dm-2              0.00     0.00   11.79   15.72   265.92   251.92    37.64     0.25    8.91    5.05   11.81   0.46   100
         xvdf              0.00   240.41    4.03  610.73   169.29  4264.36    14.42     1.71    2.78    4.61    2.76   0.97 100

        In the command output, if a large number of 100s and 99s are displayed in the %util column, system I/O is used up. In this case, contact IaaS technical support for system optimization.

      • If the preceding problem does not occur, see (Optional) Restoring a Faulty Tenant Management Zone cfe-etcd Cluster or (Optional) Restoring a Faulty OM Zone cfe-etcd Cluster.
  • Handling etcd Faults in the OM Zone
    1. Use PuTTY to log in to the om_core1_ip node.

      The default username is paas, and the default password is QAZ2wsx@123!.

    2. Run the following command to query the node where etcd-events-server-paas-10-109-173-143 resides:

      kubectl get pod etcd-events-server-paas-10-109-173-143 -nom -oyaml | grep hostIP

      NOTE:

      etcd-events-server-paas-10-109-173-143 is the name of the abnormal pod.

      hostIP: 10.109.173.143
    3. Log in to the node where etcd-events-server-paas-10-109-173-143 resides as the paas user.
    4. Perform the following operations to locate the root cause of the etcd-event fault and rectify the fault.
      • Container network problems
        1. Run the following command to query the container ID of etcd-event:
          sudo docker ps |grep etcd-event
          63c330bf5eb8cfe-etcd:2.10.29 "/bin/sh -c 'umask 07" 10 hours ago Up 10 hours k8s_etcd-event-container.c757e6c4_etcd-even-server-10.120.244.156_fst-manage_427c78d484816d9042b227363cf68205_d9037bbc
          3dd0f4b89bb5 paas-cfe-pause-bootstrap "/pause" 10 hours ago Up 10 hours k8s_POD.6d5cdc5e_etcd-event-server-10.120.244.156_fst-manage_427c78d484816d9042b227363cf68205_64b94875

          Record the container ID in the command output. In this example, the container ID is 63c330bf5eb8.

        2. Run the following command to log in to the etcd-event container:

          sudo docker exec -it 63c330bf5eb8 sh

        3. Run the following command to check whether etcd-event container can communicate with etcd-events-server-paas-10-109-173-143, etcd-events-server-paas-10-109-173-244, and etcd-events-server-paas-10-109-173-62:

          ping 10.109.173.244

          In the preceding command, 10.109.173.244 indicates the IP address of the node where etcd-events-server-paas-10-109-173-244 resides. Replace the IP address with that of the node where etcd-events-server-paas-10-109-173-62 resides when checking the network connection between the etcd-event container and etcd-events-server-paas-10-109-173-62.
          PING 10.109.173.244(10.109.173.143) 56(84) bytes of data. 
          64 bytes from 10.109.173.143: icmp_seq=1 ttl=64 time=0.568 ms 
          64 bytes from 10.109.173.143: icmp_seq=2 ttl=64 time=0.454 ms 
          64 bytes from 10.109.173.143: icmp_seq=3 ttl=64 time=0.390 ms 
          64 bytes from 10.109.173.143: icmp_seq=4 ttl=64 time=0.403 ms 
          64 bytes from 10.109.173.143: icmp_seq=5 ttl=64 time=0.225 ms

          If the preceding information is not displayed, contact technical support to rectify the container network fault.

        4. Run the following command to exit the container:

          exit

      • Insufficient disk space

        Run the following commands to check the disk space:

        cd /var/paas/run

        df -h . | grep 100%

        If the command output is displayed, no free disk space is available. In this case, clear the disk space.

      • Disk I/O fault

        Run the following command to query the system I/O status:

        iostat -x 1

        Linux 3.12.49-11-default (SZV1000269249) ?04/23/17 ?_x86_64_?(16 CPU) 
          
         avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
                    8.37    0.04   13.11    4.63    0.00   73.85 
          
         Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
         xvda              0.27    83.42    1.12   28.17    22.88   551.59    39.22     0.26    8.99    5.95    9.11   0.52   100
         xvde              0.75   219.17   12.87  355.79   276.25  8284.19    46.44     1.04    2.83    4.69    2.76   0.35    99
         dm-0              0.00     0.00    0.07    0.77     0.27     3.08     8.00     0.00    4.06    1.45    4.29   0.94   0.08 
         dm-1              0.00     0.00   11.79   15.72   265.92   251.92    37.64     0.25    8.91    5.05   11.80   0.46   1.25 
         dm-2              0.00     0.00   11.79   15.72   265.92   251.92    37.64     0.25    8.91    5.05   11.81   0.46   100
         xvdf              0.00   240.41    4.03  610.73   169.29  4264.36    14.42     1.71    2.78    4.61    2.76   0.97   100

        In the command output, if a large number of 100 and 99 values are displayed in the %util column, system I/O is used up. In this case, contact IaaS technical support for system optimization.

      • If the preceding issues do not occur, see section "(Optional) Restoring a Faulty Tenant Management Zone cfe-etcd Cluster" or "(Optional) Restoring a Faulty OM Zone cfe-etcd Cluster" in Backup and Restoration Guide.
      NOTE:

      This section uses one etcd node as an example to show how to locate and rectify faults.

Translation
Download
Updated: 2019-08-16

Document ID: EDOC1100063248

Views: 25246

Downloads: 40

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next