No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Single Abnormal ETCD Pod

Single Abnormal ETCD Pod

Symptom

Fault Symptom

A single etcd pod is abnormal.

Fault Locating
Checking the status of etcd pods on the tenant management zone
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the status of etcd pods on the management zones:

    kubectl get pod -n fst-manage -owide|grep etcd|grep -v cse |grep -v elb

    apm-etcd-0                                    1/1       Running   0          1d        10.115.0.179    paas-10-118-29-115
    apm-etcd-1                                    1/1       Running   0          1d        10.115.0.194    paas-10-118-29-61
    apm-etcd-2                                    1/1       Running   0          1d        10.115.0.209    paas-10-118-29-229
    etcd-backup-server-paas-10-118-29-146        1/1       Running   0          1d        10.118.29.146   paas-10-118-29-146
    etcd-backup-server-paas-10-118-29-196        1/1       Running   0          1d        10.118.29.196   paas-10-118-29-196
    etcd-backup-server-paas-10-118-29-67         1/1       Running   0          1d        10.118.29.67    paas-10-118-29-67
    etcd-event-server-paas-10-118-29-146         1/1       Running   0          1d        10.118.29.146   paas-10-118-29-146
    etcd-event-server-paas-10-118-29-196         1/1       err       0          1d        10.118.29.196   paas-10-118-29-196
    etcd-event-server-paas-10-118-29-67          1/1       Running   0          1d        10.118.29.67    paas-10-118-29-67
    etcd-network-server-paas-10-118-29-146       1/1       Running   0          1d        10.118.29.146   paas-10-118-29-146
    etcd-network-server-paas-10-118-29-196       1/1       Running   0          1d        10.118.29.196   paas-10-118-29-196
    etcd-network-server-paas-10-118-29-67        1/1       Running   0          1d        10.118.29.67    paas-10-118-29-67
    etcd-server-paas-10-118-29-146               1/1       Running   0          1d        10.118.29.146   paas-10-118-29-146
    etcd-server-paas-10-118-29-196               1/1       Running   0          1d        10.118.29.196   paas-10-118-29-196
    etcd-server-paas-10-118-29-67                1/1       Running   0          1d        10.118.29.67    paas-10-118-29-67
    NOTE:

    The pods whose status is not Running are abnormal.

Troubleshooting

Prerequisites

The paas user has been added to the whitelist. For details, see Operations When the sudoCommand Failed to Be Run.

Locating the Root Cause of a Fault
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Query the IP address of the node where etcd-event resides (etcd-event is used as an example).

    kubectl get po -n fst-manage -owide | grep etcd-event

    Information similar to the following is displayed.

    etcd-event-server-paas-10-118-29-146         1/1       Running   0          1d        10.118.29.146   paas-10-118-29-146
    etcd-event-server-paas-10-118-29-196         1/1       Running   0          1d        10.118.29.196   paas-10-118-29-196
    etcd-event-server-paas-10-118-29-67          1/1       Running   0          1d        10.118.29.67    paas-10-118-29-67

  4. Log in to the node where etcd-event-server-paas-10-118-29-146 resides as the paas user as PuTTY.
  5. Perform the following operations to locate the root cause of the etcd fault and rectify the fault accordingly:

    • Container network problems
      1. Run the following command to log in to the etcd-event container:

        sudo docker ps |grep etcd-event

        b628b368e9a9 cfe-etcd:2.12.20 "/bin/sh -c 'umask 07" 3 weeks ago Up 3 weeks k8s_etcd-event-container_etcd-event-server-10-118-29-146_fst-manage_098fa64b945131dd4376065a3339502c_0
        fbdedd9c9389 paas-cfe-pause-bootstrap "/pause" 3 weeks ago Up 3 weeks k8s_POD_etcd-event-server-10-118-29-146_fst-manage_098fa64b945131dd4376065a3339502c_0

        The container ID is displayed in the command output. In this case, the container ID is b628b368e9a9.

        sudo docker exec -it b628b368e9a9 sh

      2. Run the following command to check whether the network connection between each pod in the etcd-event cluster is normal:

        ping 10.118.29.196

        10.118.29.196 is the pod IP address obtained in 3.

        If the preceding information is not displayed, contact technical support to rectify container network problems.

      3. Exit the container.

        exit

    • Disk space problems

      Run the following commands to check disk space:

      cd /var/paas/run

      df -h . | grep 100%

      If any information is displayed, the disk space is used up. Clear the disk space.

    • Disk I/O problems

      Run the following command to check system I/O status:

      iostat -x 1

      Linux 3.12.49-11-default (SZV1000269249) ?04/23/17 ?_x86_64_?(16 CPU) 
        
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle 
                  8.37    0.04   13.11    4.63    0.00   73.85 
        
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util 
      xvda              0.27    83.42    1.12   28.17    22.88   551.59    39.22     0.26    8.99    5.95    9.11   0.52   100 
      xvde              0.75   219.17   12.87  355.79   276.25  8284.19    46.44     1.04    2.83    4.69    2.76   0.35  99 
      dm-0              0.00     0.00    0.07    0.77     0.27     3.08     8.00     0.00    4.06    1.45    4.29   0.94   0.08 
      dm-1              0.00     0.00   11.79   15.72   265.92   251.92    37.64     0.25    8.91    5.05   11.80   0.46   1.25 
      dm-2              0.00     0.00   11.79   15.72   265.92   251.92    37.64     0.25    8.91    5.05   11.81   0.46   100 
      xvdf              0.00   240.41    4.03  610.73   169.29  4264.36    14.42     1.71    2.78    4.61    2.76   0.97 100

      In the command output, if a large number of 100 and 99 are displayed in the %util column, system I/O is used up. In this case, contact IaaS technical support for system optimization.

    • If the root cause is not located, or the fault is not solved, see Handling etcd Faults in the Tenant Management Zone and Pods Are Removed from the Cluster Due to etcd Faults.

Handling etcd Faults in the Tenant Management Zone
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the statuses of the pods where etcd-event resides (etcd-event is used as an example in this section.):

    kubectl get pod -n fst-manage -owide|grep etcd-event

    etcd-event-server-paas-10-118-29-146         1/1       err       0          1d        10.118.29.146   paas-10-118-29-146
    etcd-event-server-paas-10-118-29-196         1/1       Running   0          1d        10.118.29.196   paas-10-118-29-196
    etcd-event-server-paas-10-118-29-67          1/1       Running   0          1d        10.118.29.67    paas-10-118-29-67

  4. Use PuTTY to log in to the node where the abnormal pod queried in 3 resides as the paas user.
  5. Run the following command to stop the abnormal etcd container.

    sudo docker ps | grep etcd-event-server-paas-10-118-29-146| awk '{print $1}' | xargs sudo docker kill

  6. Run the following commands to delete directories related to faulty etcd-event and create the temporary directory tmp to store data files:

    cd /var/paas/run

    mkdir ../tmp

    mv etcd-event/ ../tmp

  7. Run the command in 3 to check whether the abnormal pod is restored.

    • If the pod status is Running, the fault is rectified. Go to 8.
    • If the pod status is not Running, contact technical support for assistance.

  8. Run the following commands to delete the temporary directory:

    cd /var/paas

    rm -rf tmp/

Pods Are Removed from the Cluster Due to etcd Faults
  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the statuses of the pods where etcd-event resides (etcd-event is used as an example in this section.):

    kubectl get pod -n fst-manage -owide|grep etcd-event

    etcd-event-server-paas-10-118-29-146         1/1       CrashBackOff   0          1d        10.118.29.146   paas-10-118-29-146
    etcd-event-server-paas-10-118-29-196         1/1       Running        0          1d        10.118.29.196   paas-10-118-29-196
    etcd-event-server-paas-10-118-29-67          1/1       Running        0          1d        10.118.29.67    paas-10-118-29-67

  4. Use PuTTY to log in to the nodes where the pods are running properly displayed in 3 as the paas user.
  5. Run the following command to query the container ID:

    sudo docker ps |grep etcd-event
    b628b368e9a9 cfe-etcd:2.12.20 "/bin/sh -c 'umask 07" 3 weeks ago Up 3 weeks k8s_etcd-event-container_etcd-event-server-10.118.29.196_fst-manage_098fa64b945131dd4376065a3339502c_0
    fbdedd9c9389 paas-cfe-pause-bootstrap "/pause" 3 weeks ago Up 3 weeks k8s_POD_etcd-event-server-10.118.29.196_fst-manage_098fa64b945131dd4376065a3339502c_0

    b628b368e9a9 displayed in the command output is the container ID.

  6. Run the following command to access the container:

    sudo docker exec -it b628b368e9a9 sh

    NOTE:

    b628b368e9a9 is the container ID obtained in 5.

  7. Run the following command to query the information of etcd-event instances:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://127.0.0.1:4002 member list -w table

    +------------------+---------+---------------+-----------------------------+-----------------------------+
    |        ID        | STATUS  |  NAME         |         PEER ADDRS          |        CLIENT ADDRS         |
    +------------------+---------+---------------+-----------------------------+-----------------------------+
    | 94fa27049d3af500 | started | etcd-event-1  | https://10.118.29.196:2381  | https://10.118.29.196:4002  |
    | 9efccae7ac52ae8c | started | etcd-event-2  | https://10.118.29.67:2381   | https://10.118.29.67:4002   |
    +------------------+---------+---------------+-----------------------------+-----------------------------+
    NOTE:

    In the preceding command, 4002 indicates the client port of etcd-event. Replace 4002 with 4001 or 4003 as required to query the information of etcd-server or etcd-network instances.

  8. Run the following command to add an etcd-event instance:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://127.0.0.1:4002 member add etcd-event-0 --peer-urls="https://10.118.29.146:2381"

    ETCD_NAME="etcd-event-0"
    ETCD_INITIAL_CLUSTER="etcd-event-1=https://10.118.29.196:2381,etcd-event-0=https://10.118.29.146:2381,etcd-event-2=https://10.118.29.67:2381"
    ETCD_INITIAL_CLUSTER_STATE="existing"

    If information similar to the preceding is displayed, the etcd-event instance is successfully added.

    NOTE:

    etcd-event-0 and https://10.118.29.146:2381 indicate NAME and PEER ADDRS of the node where etcd is abnormal, respectively.

  9. Run the following command to exit the container:

    exit

  10. Run the following commands to delete directories related to faulty etcd-event and create the temporary directory tmp to store data files:

    cd /var/paas/run

    mkdir ../tmp

    mv etcd-event/ ../tmp

    NOTE:

    Replace etcd-event in the preceding command with etcd-server or etcd-network to restore the pods where etcd-server or etcd-network resides, respectively.

  11. Run the following commands to move the etcd-event.manifest file to the upper-level directory:

    cd /var/paas/kubernetes/manifests/

    mv etcd-event.manifest ..

  12. Run the command in 3 to check the statuses of the pods where etcd-event resides.

    If the command output displays that the pod where abnormal etcd-event resides has been deleted, go to 13.

  13. Run the following commands to move the etcd-event.manifest file back to the original directory:

    cd /var/paas/kubernetes/manifests/

    mv ../etcd-event.manifest .

  14. Run the command in 3 to check whether the pod where abnormal etcd-event resides is restored.

    • If the pod status is Running, the fault is rectified. Go to 15.
    • If the pod status is not Running, contact technical support for assistance.

  15. Run the following commands to delete the temporary directory tmp:

    cd /var/paas

    rm -rf tmp/

Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1305

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next