No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Damaged ETCD Data Disks

Damaged ETCD Data Disks

Fault Symptom

Fault Symptom

The data disk of an etcd cluster on the Tenant Management zone is damaged. As a result, the node is faulty.

Prerequisites

The paas user has been added to the whitelist. For details, see Operations When the sudoCommand Failed to Be Run.

Fault Locating

Tenant Management zone cfe-etcd clusters are etcd, etcd-events, and network-etcd. These clusters are deployed on three different manage_db nodes. This section uses the etcd-events as an example. This section is for reference only.

The section uses the paas-db01 node as an example.

  1. Use PuTTY to log in to the manage_db1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to log in to the etcd-event container:

    sudo docker ps | grep etcd

    Information similar to the following is returned:

    a84671a67e40        10.184.42.33:20202/root/cse-etcd:2.1.14                   "/bin/sh -ec 'umask 0"   2 days ago          Up 2 days                               k8s_cse-etcd.7229286c_cse-etcd-2_manage_23f4f878-823d-11e7-9423-286ed489be96_db347cc5
    3364cf75b35a        paas-cfe-pause-bootstrap                                  "/pause"                 2 days ago          Up 2 days                               k8s_POD.6d5cdc5e_cse-etcd-2_manage_23f4f878-823d-11e7-9423-286ed489be96_a36de45a
    2efa4ad27935        cfe-etcd:2.8.7                                            "/bin/sh -c 'umask 06"   2 days ago          Up 2 days                               k8s_etcd-container.502905f8_etcd-network-server-10.184.42.132_manage_f49345c84316bbc47684f697fb6f64f0_7b38a9b8
    157908ceec74        cfe-etcd:2.8.7                                            "/bin/sh -c 'umask 06"   2 days ago          Up 2 days                               k8s_etcd-container.d8d1f291_etcd-server-10.184.42.132_manage_36b5d9f798751abad8dc291a4bf46865_6de02f29
    6d774ac2ac2e        cfe-etcd:2.8.7                                            "/bin/sh -c 'umask 06"   2 days ago          Up 2 days                               k8s_etcd-container.d6f90091_etcd-event-server-10.184.42.132_manage_9f4b2d62d846556015bb495930f7fa4f_6a546c2e
    3f3546f93b65        paas-cfe-pause-bootstrap                l
    b577e0f5e45a        paas-cfe-pause-bootstrap                                  "/pause"                 2 days ago          Up 2 days                               k8s_POD.6d5cdc5e_etcd-event-server-10.184.42.132_manage_9f4b2d62d846556015bb495930f7fa4f_561795ae
    e26634d14881        paas-cfe-pause-bootstrap                                  "/pause"                 2 days ago          Up 2 days                               k8s_POD.6d5cdc5e_etcd-server-10.184.42.132_manage_36b5d9f798751abad8dc291a4bf46865_5cbb2d73

    In this case, the container ID is 6d774ac2ac2e.

    sudo docker exec -it 6d774ac2ac2e bash

  3. Check the IP address and port of each node in the etcd cluster.

    NOTE:

    By default, 4001 is the client port of the etcd cluster, 4002 is the client port of the etcd-event cluster, and 4003 is the client port of the etcd-network cluster.

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.cer --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://127.0.0.1:4002 member list

    Information similar to the following is returned:

    1f4397f9956e1e8b, started, infra1, https://10.184.43.79:2381, https://10.184.43.79:4002
    9a3dd24ebfc5c212, started, infra2, https://10.177.119.155:2381, https://10.177.119.155:4002
    fc4a4cd2cf50cbf1, started, infra0, https://10.184.42.132:2381, https://10.184.42.132:4002

    Record the IP address and port of each node. In this case, the IP addresses and ports are https://10.184.43.79:4002, https://10.177.119.155:4002, and https://10.184.42.132:4002.

  4. Query the status of the etcd cluster:

    NOTE:

    By default, 4001 is the client port of the etcd cluster, 4002 is the client port of the etcd-event cluster, and 4003 is the client port of the etcd-network cluster.

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.cer --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://10.184.42.132:4002,https://10.184.43.79:4002,https://10.177.119.155:4002 endpoint status

    Information similar to the following is returned:

    2017-08-18 20:14:32.663688 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
    Failed to get the status of endpoint https://10.177.119.155:4002 (context deadline exceeded)
    https://10.184.42.132:4002, fc4a4cd2cf50cbf1, 3.1.9, 8.2 MB, false, 17, 2617441
    https://10.184.43.79:4002, 1f4397f9956e1e8b, 3.1.9, 8.4 MB, true, 17, 2617441

    The status of https://10.177.119.155:4002 is abnormal. Check the node ID obtained in Step 3. In this case, the node ID is 9a3dd24ebfc5c212.

  5. Run the following command to exit the container.

    exit

Troubleshooting

This section uses the faulty paas-db03 node as an example.

  1. Use PuTTY to log in to the manage_db3_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Remove the manifest file.

    NOTE:

    When restoring etcd and etcd-network, change etcd-event in the command to etcd and etcd-network respectively.

    cd /var/paas/kubernetes/manifests/

    mv etcd-event.manifest ..

  3. Use PuTTY to log in to the manage_db1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  4. Log in to the etcd-event container.

    NOTE:

    When restoring etcd and etcd-network, change etcd-event in the command to etcd and etcd-network respectively.

    The container ID was obtained in 2 in Fault Symptom.

    sudo docker exec -it 6d774ac2ac2e bash

  5. Replace the damaged disk.

    Before starting restoration, replace the damaged disk, and ensure that specifications of disks remain unchanged. For example, if the size of the damaged disk is 50 GB, the size of the replaced disk must be 50 GB.

  6. Delete the node whose status cannot be queried.

    NOTE:

    By default, 4001 is the client port of the etcd cluster, 4002 is the client port of the etcd-event cluster, and 4003 is the client port of the etcd-network cluster.

    The ID of the node to be deleted is obtained in 4 in Fault Symptom.

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.cer --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://10.184.42.132:4002,https://10.184.43.79:4002,https://10.177.119.155:4002 member remove 9a3dd24ebfc5c212

    Information similar to the following is returned:

    2017-08-18 20:20:08.659346 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
    Member 9a3dd24ebfc5c212 removed from cluster b2d484e5f23f7a6e

  7. Use PuTTY to log in to the manage_db3_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  8. Move the etcd-event.manifest file.

    cd /var/paas/kubernetes/manifests/

    mv ../etcd-event.manifest .

  9. Use PuTTY to log in to the manage_db1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  10. Access the etcd-event container.

    1. Run the following command to query the container ID:
    2. sudo docker ps | grep etcd-event

      The following information is displayed. Content in red is the value of containerId in 2.

    3. Run the following command to access the container:

      sudo docker exec -it {containerId} bash

  11. Query the status of each node in the etcd cluster:

    NOTE:

    By default, 4001 is the client port of the etcd cluster, 4002 is the client port of the etcd-event cluster, and 4003 is the client port of the etcd-network cluster.

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.cer --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://10.184.42.132:4002,https://10.184.43.79:4002,https://10.177.119.155:4002 endpoint status

    If the following information is displayed, the fault has been rectified:

    2017-08-18 20:24:14.201480 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
    https://10.184.42.132:4002, fc4a4cd2cf50cbf1, 3.1.9, 8.2 MB, false, 17, 2623883
    https://10.184.43.79:4002, 1f4397f9956e1e8b, 3.1.9, 8.4 MB, true, 17, 2623883
    https://10.177.119.155:4002, 5f95cab8bc69abd6, 3.1.9, 8.4 MB, false, 17, 2623883

  12. Repeat the preceding steps to restore faulty nodes in the etcd and etcd-network clusters.
Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1647

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next