No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Failure in etcd Backup and Restoration Due to the Node Where the etcd-backup Service Is Deployed Is Powered Off or Restarted

Failure in etcd Backup and Restoration Due to the Node Where the etcd-backup Service Is Deployed Is Powered Off or Restarted

Symptom

The etcd backup in the OM zone is successful. When one node of the etcd cluster is powered off, etcd fails to be restored, the pod of etcd may work improperly, and the Kubernetes command is unavailable. After the node is powered on again and etcd is rectified manually, etcd is restored successfully.

Possible Causes

The node powered off is the node where the etcd-backup service is deployed. After the node is powered off, the other two etcd-backup nodes cannot connect to the powered off etcd-backup node. As a result, the restoration fails.

Troubleshooting Method

  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  1. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  2. Run the following command to query the etcd status:

    kubectl get pod -n fst-manage -owide | grep etcd

    Information similar to the following is displayed:

    ...
    etcd-event-server-paas-10-168-20-187         1/1       Running   0          2d        10.168.20.187   paas-10-168-20-187
    etcd-event-server-paas-10-168-20-204         1/1       Running   0          2d        10.168.20.204   paas-10-168-20-204
    etcd-event-server-paas-10-168-20-239         1/1       Running   0          2d        10.168.20.239   paas-10-168-20-239
    etcd-network-server-paas-10-168-20-187       1/1       Running   0          2d        10.168.20.187   paas-10-168-20-187
    etcd-network-server-paas-10-168-20-204       1/1       Running   0          2d        10.168.20.204   paas-10-168-20-204
    etcd-network-server-paas-10-168-20-239       1/1       Running   0          2d        10.168.20.239   paas-10-168-20-239
    etcd-server-paas-10-168-20-187               1/1       Running   0          2d        10.168.20.187   paas-10-168-20-187
    etcd-server-paas-10-168-20-204               1/1       Running   0          2d        10.168.20.204   paas-10-168-20-204
    etcd-server-paas-10-168-20-239               1/1       Running   0          2d        10.168.20.239   paas-10-168-20-239
    • If the etcd status is Running, go to 7.
    • If the pod of the etcd is abnormal, log in to the abnormal pod node according to the displayed information and go to 4.
    • If this command is unavailable, go to 10.

  3. Log in to each etcd node and run the following command to check whether etcd-event.manifest, etcd.manifest, and etcd-network.manifest exist:

    ll /var/paas/kubernetes/

    • If yes, move these three files to /var/paas/kubernetes/manifests and then go to 5.

      mv etcd* manifests

    • If no, go to 5.

  4. Wait 2 minutes, run the following command on the manage_lb1_ip node to check the etcd status:

    kubectl get pod -n fst-manage -owide | grep etcd

    • If the pod status of etcd, etcd-event, and etcd-network is Running, go to 7.
    • If the pod status is Pending, go to 6.

  5. Log in to the three etcd nodes one after another as the paas user and run the following commands to restart kubelet and wait until the kubelet status is Running:

    monit restart kubelet

    monit summary

    ┌─────────────────────────────────────────┐
    │ Service Name                    │ Status                      │ Type           │
    ├────────────────────────────────┼────────┤
    │ szvp000303953                   │ Running                     │ System         │
    ├────────────────────────────────┼────────┤
    │ kubelet                         │ Running                     │ Process        │
    ├────────────────────────────────┼────────┤
    │ kube-proxy                      │ Running                     │ Process        │
    ├────────────────────────────────┼────────┤
    │ psm-daemon                      │ Status ok                   │ Program        │
    ├────────────────────────────────┼────────┤
    │ ovsdb-server                    │ Status ok                   │ Program        │  
    ├────────────────────────────────┼────────┤
    │ ovs-vswitchd                    │ Status ok                   │ Program        │
    ├────────────────────────────────┼────────┤
    │ docker                          │ Status ok                   │ Program        │
    ├────────────────────────────────┼────────┤
    │ canal                           │ Status ok                   │ Program        │
    └────────────────────────────────┴────────┘

  6. Access the OM zone as the op_svc_pom user and perform the restoration operations again.
  7. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  8. Log in to the manage_lb1_ip node after the restoration operations succeed and run the following command to check the etcd status:

    kubectl get pod -n fst-manage -owide | grep etcd

    If the pod is in the Pending state, log in to the three etcd nodes as the paas user. Run the following commands to restart kubelet and wait until the kubelet status is Running:

    monit restart kubelet

    monit summary

    If the pod status is Running, the pod is restored successfully and the fault is rectified.

  9. If the fault persists, contact technical support for assistance.
Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1909

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next