No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
etcd Restart Due to Inconsistent Data

etcd Restart Due to Inconsistent Data

Symptom

  • The following log information indicates that etcd restarts repeatedly:
    2018-05-15 16:00:45.985584 C | etcdmain: database file (/var/etcd-data/etcd-event/etcd-event-2/member/snap/db index 16737187) does not match with snapshot (index 21909430).
  • Alternatively, etcd restarts repeatedly because a panic log similar to the following is generated, which is caused by bbolt:
    panic: xxx
    xxx/github.com/coreos/bbolt/xxx

Possible Causes

etcd data is damaged.

Troubleshooting Method

  1. Use PuTTY to log in to the om_paas_vip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to obtain the name of the node where the etcd pod instance not in the Running status locates in the management zone:

    kubectl get pods -n manage -owide | grep etcd | grep -v "elb\|cse\|flow\|backup"

    Information similar to the following is displayed:
     etcd-event-0                     1/1       error     0          37d       10.186.53.125    manage-cluster2-fff303d2-xv98d
     etcd-event-1                     1/1       Running   0          37d       10.120.173.182   manage-cluster1-fff303d2-tzz7z
     etcd-event-2                     1/1       Running   0          37d       10.120.173.65    manage-cluster1-fff303d2-9swgx
    NOTE:

    In the preceding command, manage indicates the namespace in the tenant management zone. You need to replace manage with om when obtaining abnormal etcd node information in the OM zone. An example command for the OM zone is as follows:

    kubectl get pods -n om -owide | grep etcd | grep -v "elb\|cse\|flow\|backup"

  3. Run the following command to query the IP address of the node, namely, the IP address of the node where the etcd container locates, as shown in the red box in the following figure:

    kubectl get node manage-cluster2-fff303d2-xv98d -n manage -oyaml | grep addr

    Information similar to the following is displayed:

      address: 10.186.53.125

  4. Log in to the node where the etcd pod instance not in the Running status locates.

    /var/paas/sys/log/etcd-event/

    NOTE:

    In the preceding command, etcd-event indicates the name of the abnormal pod, which may be etcd or etcd-network.

  5. Run the following command to query logs of the critical level in run log etcd-event.log in the directory:

    vi /var/paas/sys/log/etcd-event/etcd-event.log

    Information similar to the following is displayed:
    2018-05-15 16:00:45.985584 C | etcdmain: database file (/var/etcd-data/etcd-event/etcd-event-2/member/snap/db index 16737187) does not match with snapshot (index 21909430).

    Or, information similar to the following is displayed:

    panic: xxx
    xxx/github.com/coreos/bbolt/xxx

    The command output shows that the etcd data is damaged.

  6. The path in the log is that on the etcd container. Run the following command to query the path on the node where the etcd container locates:

    cd /var/paas/run/etcd-event

    ls

    Information similar to the following is displayed:

    config.ini  etcd-event-2

  7. Run the following command to change the name of the data directory on the abnormal node:

    cd /var/paas/run/etcd-event

    mv etcd-event-2 etcd-event-2-old

    NOTE:

    In the preceding command, etcd-event-2 indicates the data directory on the abnormal etcd node in 6.

  8. Wait until the etcd instance is restarted by kubelet and run the following command to query the run log of the etcd instance:

    vi /var/paas/sys/log/etcd/etcd-event.log

    If the log of the critical level is not detected, the etcd data synchronization is complete.

  9. Query the etcd data directory and run the following command to confirm that the data directory is generated:

    /var/paas/run/etcd-event/etcd-event-2

    If the etcd instance is running properly, the fault is recovered.

Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 23165

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next