No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
CFE-ETCD Emergency Restoration of Abnormal Nodes

CFE-ETCD Emergency Restoration of Abnormal Nodes

Tenant Management Zone

Symptom

The pod is in Error, CrashLoopBackOff, Unknown, or NodeLost status.

Troubleshooting
  1. Use PuTTY to log in to the om_core1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to query node information:

    kubectl get node --all-namespaces

    Information similar to the following is displayed:

    NAMESPACE   NAME                                     STATUS     AGE
    manage      paas-manage-apm1-7d03ba8e-823f-1-14x5m   Ready      1d
    manage      paas-manage-apm2-7d03ba8e-823f-1-pt1kx   Ready      1d
    manage      paas-manage-core1-7d03ba8e-823f-18qbn    Ready      1d
    manage      paas-manage-core2-7d03ba8e-823f-02k5p    Ready      1d
    manage      paas-manage-core3-7d03ba8e-823f-f3j5m    Ready      1d
    manage      paas-manage-core4-7d03ba8e-823f-cp30f    Ready      1d
    manage      paas-manage-core5-7d03ba8e-823f-gklq7    NotReady   1d
    manage      paas-manage-db1-7d03ba8e-823f-11-wb46f   Ready      1d
    manage      paas-manage-db2-7d03ba8e-823f-11-c7vxn   Ready      1d
    manage      paas-manage-elb-lvs1-7d03ba8e-82-bm58d   Ready      1d
    manage      paas-manage-elb-lvs2-7d03ba8e-82-75t05   Ready      1d
    manage      paas-manage-elb-nginx1-7d03ba8e-xhs1w    Ready      1d
    manage      paas-manage-elb-nginx2-7d03ba8e-njqxm    Ready      1d
    manage      paas-manage-elb-svc1-7d03ba8e-82-x7xvm   Ready      1d
    manage      paas-manage-elb-svc2-7d03ba8e-82-28lwm   Ready      1d
    manage      paas-manage-iam1-7d03ba8e-823f-1-58wzr   Ready      1d
    manage      paas-manage-iam2-7d03ba8e-823f-1-5d9l9   Ready      1d
    manage      paas-manage-swr1-7d03ba8e-823f-1-q8gl7   Ready      1d
    manage      paas-manage-swr2-7d03ba8e-823f-1-03l5g   Ready      1d
    manage      paas-manage-swr3-7d03ba8e-823f-1-k9k37   Ready      1d
    manage      paas-manage-swr4-7d03ba8e-823f-1-3kwf3   Ready      1d
    manage      paas-manage-tenant1-7d03ba8e-823-kfx7w   Ready      1d
    manage      paas-manage-tenant2-7d03ba8e-823-1g296   Ready      1d
    om          paas-10-177-119-155                      Ready      1d
    om          paas-10-184-42-132                       Ready      1d
    om          paas-10-184-43-79                        Ready      1d
    om          paas-om-apm1-c47c95f1-823c-11e7-tglpm    Ready      1d
    om          paas-om-apm2-c47c95f1-823c-11e7-t31w0    Ready      1d

    Record the abnormal node name. In this example, the abnormal node name is paas-manage-core5-7d03ba8e-823f-gklq7.

  3. Run the following command to export the description file of abnormal node:

    kubectl get no paas-manage-core5-7d03ba8e-823f-gklq7 -n manage -oyaml > manage-core5.yaml

  4. Run the following command to delete the abnormal node:

    kubectl delete no paas-manage-core5-7d03ba8e-823f-gklq7 -nmanage

    Information similar to the following is displayed:

    node "paas-manage-core5-7d03ba8e-823f-gklq7" deleted

  5. After the VM is restarted, run the following command to manage the node again:

    kubectl create -f manage-core5.yaml

    Information similar to the following is displayed:

    node "paas-manage-core5-7d03ba8e-823f-gklq7" created

  6. Run the following command to query node information until the status of node paas-manage-core5-7d03ba8e-823f-gklq7 is Ready:

    kubectl get node --all-namespaces

    Information similar to the following is displayed:

    NAMESPACE   NAME                                     STATUS     AGE
    manage      paas-manage-apm1-7d03ba8e-823f-1-14x5m   Ready      1d
    manage      paas-manage-apm2-7d03ba8e-823f-1-pt1kx   Ready      1d
    manage      paas-manage-core1-7d03ba8e-823f-18qbn    Ready      1d
    manage      paas-manage-core2-7d03ba8e-823f-02k5p    Ready      1d
    manage      paas-manage-core3-7d03ba8e-823f-f3j5m    Ready      1d
    manage      paas-manage-core4-7d03ba8e-823f-cp30f    Ready      1d
    manage      paas-manage-core5-7d03ba8e-823f-gklq7    Ready      2m
    manage      paas-manage-db1-7d03ba8e-823f-11-wb46f   Ready      1d
    manage      paas-manage-db2-7d03ba8e-823f-11-c7vxn   Ready      1d
    manage      paas-manage-elb-lvs1-7d03ba8e-82-bm58d   Ready      1d
    manage      paas-manage-elb-lvs2-7d03ba8e-82-75t05   Ready      1d
    manage      paas-manage-elb-nginx1-7d03ba8e-xhs1w    Ready      1d
    manage      paas-manage-elb-nginx2-7d03ba8e-njqxm    Ready      1d
    manage      paas-manage-elb-svc1-7d03ba8e-82-x7xvm   Ready      1d
    manage      paas-manage-elb-svc2-7d03ba8e-82-28lwm   Ready      1d
    manage      paas-manage-iam1-7d03ba8e-823f-1-58wzr   Ready      1d
    manage      paas-manage-iam2-7d03ba8e-823f-1-5d9l9   Ready      1d
    manage      paas-manage-swr1-7d03ba8e-823f-1-q8gl7   Ready      1d
    manage      paas-manage-swr2-7d03ba8e-823f-1-03l5g   Ready      1d
    manage      paas-manage-swr3-7d03ba8e-823f-1-k9k37   Ready      1d
    manage      paas-manage-swr4-7d03ba8e-823f-1-3kwf3   Ready      1d
    manage      paas-manage-tenant1-7d03ba8e-823-kfx7w   Ready      1d
    manage      paas-manage-tenant2-7d03ba8e-823-1g296   Ready      1d
    om          paas-10-177-119-155                      Ready      1d
    om          paas-10-184-42-132                       Ready      1d
    om          paas-10-184-43-79                        Ready      1d
    om          paas-om-apm1-c47c95f1-823c-11e7-tglpm    Ready      1d
    om          paas-om-apm2-c47c95f1-823c-11e7-t31w0    Ready      1d

  7. Go to the normal node. Assume that the normal node is etcd-event-1. Search for the etcd-event-1 container.

    sudo docker ps | grep etcd-event

    The following information is displayed. Content in red is the value of containerId in 8.

  8. Run the following command to access the container:

    sudo docker exec -it containerId bash

    Run the following command to check the status of each node in the etcd cluster:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://127.0.0.1:4002 member list -w table
    NOTE:
    • By default, 4001, 4002, and 4003 are the corresponding client ports of the etcd, etcd-event, and etcd-network clusters.
    • When recovering both etcd and etcd-network clusters, change the client ports in the following commands to the client ports of etcd and etcd-network respectively.

    Information similar to the following is displayed.

    Run the following commands to query the details of each node in the cluster:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.cer --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://10.184.43.79:4002,https://10.177.119.155:4002,https://10.184.42.132:4002 endpoint status

    Replace endpoints with the client addresses obtained in 7.

    Information similar to the following is displayed.

    The status of each node is displayed normally, indicating that the abnormal node in the etcd cluster is restored to the normal state.

  9. Repeat 1 to 8 to restore the abnormal nodes in both etcd and network-etcd clusters.

OM Zone

Symptom

The pod is in Error, CrashLoopBackOff, Unknown, or NodeLost status.

Troubleshooting
Prerequisites

The paas user has been added to the whitelist. For details, see Operations When the sudoCommand Failed to Be Run.

Procedure
  1. Use PuTTY to log in to the om_core1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to query the node information:

    kubectl get node -n om

    NAMESPACE   NAME                                     STATUS     AGE     om          paas-10-177-119-155                      Ready      1d     om          paas-10-184-42-132                       NotReady   1d     om          paas-10-184-43-79                        Ready      1d     om          paas-om-apm1-c47c95f1-823c-11e7-tglpm    Ready      1d     om          paas-om-apm2-c47c95f1-823c-11e7-t31w0    Ready      1d

    Record the name of the abnormal node. In this example, the node name is paas-10-184-42-132.

  3. Run the following command to export the description file of the abnormal node:

    kubectl get no paas-10-184-42-132 -n om -oyaml > paas-10-184-42-132.yaml

  4. Run the following command to delete the abnormal node:

    kubectl delete no paas-10-184-42-132 -n om

    If the following information is displayed, the abnormal node is deleted:

    node "paas-10-184-42-132" deleted

  5. After the VM is restarted, run the following command to manage the node again:

    kubectl create -f paas-10-184-42-132.yaml

    If the following information is displayed, the node is managed successfully:

    node "paas-10-184-42-132" created

  6. Run the following command to query the node information and repeat this operation till the status of the paas-10-184-42-132 node is Ready:

    kubectl get node --all-namespaces

    NAMESPACE   NAME                                     STATUS     AGE     om          paas-10-177-119-155                      Ready      1d    om          paas-10-184-42-132                       Ready      2m    om          paas-10-184-43-79                        Ready      1d     om          paas-om-apm1-c47c95f1-823c-11e7-tglpm    Ready      1d     om          paas-om-apm2-c47c95f1-823c-11e7-t31w0    Ready      1d

  7. Assume that the normal node is etcd-event-1. Run the following command to query the IP address of the etcd-event-1 node:

    kubectl get pod etcd-event-1 -nmanage -oyaml | grep hostIP

     hostIP:10.154.248.63

  8. Log in to the etcd-event-1 node and run the following command to query the container ID of etcd-event:

    sudo docker ps | grep etcd-event

    6d774ac2ac2e        cfe-etcd:2.8.7                                            "/bin/sh -c 'umask 06"   2 days ago          Up 2 days                               k8s_etcd-container.d6f90091_etcd-event-server-10.184.42.132_om_9f4b2d62d846556015bb495930f7fa4f_6a546c2e    b577e0f5e45a        paas-cfe-pause-bootstrap                                  "/pause"                 2 days ago          Up 2 days                               k8s_POD.6d5cdc5e_etcd-event-server-10.184.42.132_om_9f4b2d62d846556015bb495930f7fa4f_561795ae

    In this example, the container ID of etcd-event is 6d774ac2ac2e.

  9. Run the following commands to access the container and query the status of each node in the etcd cluster:

    sudo docker exec -it 6d774ac2ac2e bash

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://127.0.0.1:4002 member list -w table

    NOTE:

    By default, 4001, 4002, and 4003 are the client ports of the etcd, etcd-event, and etcd-network clusters, respectively. When recovering both the etcd and etcd-network clusters, change the client ports in the following commands to the client ports of the etcd and etcd-network clusters respectively.

    1f4397f9956e1e8b, started, infra1, https://10.184.43.79:2381, https://10.184.43.79:4002   9a3dd24ebfc5c212, started, infra2, https://10.177.119.155:2381, https://10.177.119.155:4002    fc4a4cd2cf50cbf1, started, infra0, https://10.184.42.132:2381, https://10.184.42.132:4002

  10. Run the following commands to query the details of each node in the clusters:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints

    https://10.184.43.79:4002,https://10.177.119.155:4002,https://10.184.42.132:4002 endpointstatus -w table

    NOTE:

    https://10.184.43.79:4002, https://10.177.119.155:4002, and https://10.184.42.132:4002 are the node information obtained in Step 9.

    2017-10-17 19:17:05.436874 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated    https://10.184.42.132:4002, 789255c1b33cdf6c, 3.1.9, 2.9 MB, false, 9, 208381    https://10.184.43.79:4002, 6d4b75513d41feef, 3.1.9, 2.9 MB, true, 9, 208381    https://10.177.119.155:4002, 4a8a968eaefcca8b, 3.1.9, 2.9 MB, false, 9, 208381

    If the status of each node is displayed, the abnormal node in the etcd cluster is restored to the normal state.

  11. Repeat 1 to 10 to restore the abnormal nodes in the etcd cluster and etcd-network cluster to the normal state.
Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 22848

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next