No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-43155 etcd no space

ALM-43155 etcd no space

Description

This alarm is generated when the etcd storage space is insufficient.

Attribute

Alarm ID

Alarm Severity

Alarm Type

43155

Major

Environmental alarm

Alarm Parameter

Parameter Name

Parameter Description

service

Specifies the name of the service for which the alarm is reported.

instance

Specifies the name of the service instance for which the alarm is reported.

Impact on the System

The etcd storage space is insufficient. As a result, services cannot be provided, which adversely affects the availability of the entire cluster.

Possible Causes

The storage space occupied by one or more etcd nodes exceeds the maximum storage space. The maximum storage space of the earlier etcd version is 2 GB, and that of the new version is 8 GB.

Procedure

NOTE:

CFE-ETCD Alarm Please perform 1~13, CSE-ETCD alarm please perform 14~23 directly after executing 1.

  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. Run the following command to query the IP address of the node where the etcd-server pod resides:

    kubectl get pod -nfst-manage -owide|grep etcd-server

    etcd-server-paas-172-31-30-182     1/1       Running   0          31m       172.31.30.182    paas-172-31-30-182
    etcd-server-paas-172-31-30-175     1/1       Running   0          31m       172.31.30.175    paas-172-31-30-175
    etcd-server-paas-172-31-30-217     1/1       Running   0          31m       172.31.30.217    paas-172-31-30-217 

    Log in to the node using SSH.

  4. Run the following command to query the container ID of etcd-server:

    docker ps | grep etcd-server | grep -v pause

    ddcb24c2ebf9        cfe-etcd:1.12.25                                             "/bin/sh -c 'umask 07"   5 hours ago         Up 5 hours                              k8s_etcd-container.edd2eda1_etcd-server-172.31.30.182_om_f83f596e83c4489260098a6163385718_ed83203e

  5. Run the following command to switch to the etcd-server container:

    docker exec -ti ddcb24c2ebf9 bash

    ddcb24c2ebf9 is the container ID obtained in 4.

  6. Run the following command to query the members of the etcd cluster:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://127.0.0.1:4001 member list -w table

    Information similar to the following is displayed:

  7. Run the following command to view the node details. In the command output, DB SIZE indicates the storage space size.

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://172.31.30.182:4001,https://172.31.30.175:4001,https://172.31.30.217:4001 endpoint status -w table

    https://etcd-0.etcd.fst-manage.svc.cluster.local:4001, https://etcd-1.etcd.fst-manage.svc.cluster.local:4001, and https://etcd-2.etcd.fst-manage.svc.cluster.local:4001 are the values of CLIENT ADDRS queried in 6 and are separated by commas (,).

    Information similar to the following is displayed:

  8. Run the following command to query the latest revision:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://172.31.30.182:4001,https://172.31.30.175:4001,https://172.31.30.217:4001 endpoint status -w json

    Information similar to the following is displayed:

    "cluster_id":12288572135368435870,"member_id":3030248849092740997,"revision":122902,"raft_term":3

  9. Run the following command to compress the historical versions:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://172.31.30.182:4001,https://172.31.30.175:4001,https://172.31.30.217:4001 compact 121902

    Value 121902 is obtained by deducting 1000 from the value of revision queried in 8.

  10. Run the following command to defragment hard disks:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints={Node on which hard disks are to be defragmented CLIENT ADDRS} defrag

    For example, run the following command:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints=https://172.31.30.182:4001 defrag

  11. Run the following command to check whether an alarm is reported:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://172.31.30.182:4001 alarm list

    • If yes, go to 12.
    • If no, go to 13.

  12. Run the following command to clear the alarm:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://172.31.30.182:4001 alarm disarm

  13. Run the following command to check the storage space again:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://172.31.30.182:4001,https://172.31.30.175:4001,https://172.31.30.217:4001 endpoint status -w table

    Defragmenting hard disks may take some time. You can repeat 8 to 13 until the storage space is within the threshold. If the disk space is still insufficient after the operations are performed for multiple times, contact technical support for assistance.

  14. Run the following command to query the etcd pod resides:

    kubectl get pod -n fst-manage | grep etcd
    etcd-0                                     1/1       Running   0          2h 
    etcd-1                                     1/1       Running   0          2h 
    etcd-2                                     1/1       Running   2          2h

  15. Run the following command to switch to the etcd container:

    kubectl exec -ti etcd-0 -n fst-manage sh

  16. Run the following command to query the members of the etcd cluster:

    ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints https://127.0.0.1:4001 member list -w table

    Information similar to the following is displayed:

    +------------------+---------+------------+---------------------------------------------------------------+---------------------------------------------------------------+
    |        ID        | STATUS  |    NAME    |                          PEER ADDRS                           |                         CLIENT ADDRS                          |
    +------------------+---------+------------+---------------------------------------------------------------+---------------------------------------------------------------+
    | 2cee88b5cea8553f | started | cse-etcd-1 | https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:2380 | https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001 |
    | 9de5f9ad19499f2d | started | cse-etcd-0 | https://cse-etcd-0.cse-etcd.fst-manage.svc.cluster.local:2380 | https://cse-etcd-0.cse-etcd.fst-manage.svc.cluster.local:4001 |
    | ad2decf385c39443 | started | cse-etcd-2 | https://cse-etcd-2.cse-etcd.fst-manage.svc.cluster.local:2380 | https://cse-etcd-2.cse-etcd.fst-manage.svc.cluster.local:4001 |
    +------------------+---------+------------+---------------------------------------------------------------+---------------------------------------------------------------+

  17. Run the following command to view the node details. In the command output, DB SIZE indicates the storage space size.

    ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001, https://cse-etcd-0.cse-etcd.fst-manage.svc.cluster.local:4001,https://cse-etcd-2.cse-etcd.fst-manage.svc.cluster.local:4001endpoint status -w table

    https://cse-etcd-0.cse-etcd.fst-manage.svc.cluster.local:4001, https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001, and https://cse-etcd-2.cse-etcd.fst-manage.svc.cluster.local:4001 are the values of CLIENT ADDRS queried in 16 and are separated by commas (,).

    Information similar to the following is displayed:

    |                           ENDPOINT                            |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
    +---------------------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
    | https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001 | 2cee88b5cea8553f | 3.1.9   | 4.1 MB  | false     |         2 |     619618 |
    | https://cse-etcd-0.cse-etcd.fst-manage.svc.cluster.local:4001 | 9de5f9ad19499f2d | 3.1.9   | 4.1 MB  | true      |         2 |     619618 |
    | https://cse-etcd-2.cse-etcd.fst-manage.svc.cluster.local:4001 | ad2decf385c39443 | 3.1.9   | 4.1 MB  | false     |         2 |     619618 |
    +---------------------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+

  18. Run the following command to query the latest revision:

    ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001,https://cse-etcd-0.cse-etcd.fst-manage.svc.cluster.local:4001,https://cse-etcd-2.cse-etcd.fst-manage.svc.cluster.local:4001 endpoint status -w json

    Information similar to the following is displayed:

    "cluster_id":13721544636744488490,"member_id":3237675496563561791,"revision":206803,"raft_term":2}

  19. Run the following command to compress the historical versions:

    ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001,https://cse-etcd-0.cse-etcd.fst-manage.svc.cluster.local:4001,https://cse-etcd-2.cse-etcd.fst-manage.svc.cluster.local:4001 compact 205803

    Value 205803 is obtained by deducting 1000 from the value of revision queried in 18.

  20. Run the following command to defragment hard disks:

    ETCDCTL_API=3 / start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints={{Node on which hard disks are to be defragmented CLIENT ADDRS} defrag

    For example, run the following command:

    ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints=https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001 defrag

  21. Run the following command to check whether an alarm is reported:

    ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints=https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001 alarm list

    • If yes, go to 22.
    • If no, go to 23.

  22. Run the following command to clear the alarm:

    ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints=https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001 alarm disarm

  23. Run the following command to check the storage space again:

    ETCDCTL_API=3 /start-etcd --cacert /srv/kubernetes/ca.crt --cert /srv/kubernetes/server.cer --key /srv/kubernetes/server_key.pem --endpoints https://etcd-0.etcd.fst-manage.svc.cluster.local:4001,https://etcd-1.etcd.fst-manage.svc.cluster.local:4001,https://etcd-2.etcd.fst-manage.svc.cluster.local:4001 endpoint status -w table

    ETCDCTL_API=3 /start-etcd --cacert /var/paas/kubernetes/cert/ca.crt --cert /var/paas/kubernetes/cert/tls.crt --key /var/paas/kubernetes/cert/tls.key --endpoints https://cse-etcd-1.cse-etcd.fst-manage.svc.cluster.local:4001, https://cse-etcd-0.cse-etcd.fst-manage.svc.cluster.local:4001, https://cse-etcd-2.cse-etcd.fst-manage.svc.cluster.local:4001 endpoint status -w table

    Defragmenting hard disks may take some time. You can repeat 18 to 23 until the storage space is within the threshold. If the disk space is still insufficient after the operations are performed for multiple times, contact technical support for assistance.

Alarm Clearing

After the fault that the storage space is insufficient is rectified, manually clear the alarm on the Alarm List page.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 48465

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next