No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Node Exceptions in the Management Zone

Node Exceptions in the Management Zone

This section describes how to perform emergency recovery for management zone nodes (except OM-Core01, OM-Core02, and OM-Core03).

Symptom

The node is rebuilt and cannot recover even after a forcible restart.

Troubleshooting

Prerequisites
  • One OM-Core node (OM-Core01, OM-Core02, or OM-Core03) is faulty and has been rebuilt. Two other OM-Core nodes and the OM-Core cluster are running properly.

    For more information on how to rebuild nodes, see the relevant Infrastructure as a Service (IaaS) guide manual.

  • The physical and virtual IP addresses of the rebuilt OM-Core node remain unchanged.
  • The specifications of the rebuilt OM-Core node remain unchanged.
Deleting a Faulty Node
Procedure
  1. Log in to OM-Core01 as the paas user.
  2. Run the following command to update VM configurations:

    cd /opt/paas/bootstrap/bin

    ./fsadm addvm CorebaseHA -m base -f ../knowledge/fusionstage_CorebaseHA.yaml

  3. Perform the following operations to delete a faulty node:

    1. Run the following command to view the name of the faulty node:

      kubectl get no --all-namespaces

    2. Run the following command to export information about the faulty node:

      kubectl get no manage-cluster3-4f5b4eaa-4vsjv -n manage -oyaml > /tmp/manage-cluster3.yaml

      • manage-cluster3-4f5b4eaa-4vsjv is an example name of the rebuilt node obtained in 3.a.
      • manage-cluster3.yaml is an example file to which information about the rebuilt node is exported. The file name is user-defined. It is recommended that the file name should contain the first half of the rebuilt node name.
    3. Run the following command twice to delete the faulty node:

      Command for deleting a faulty node in the tenant management zone:

      kubectl delete no <node name> -n manage --grace-period=0 --force

      Command for deleting a faulty node in the OM zone:

      kubectl delete no <node name> -n om --grace-period=0 --force

      NOTE:

      In the preceding command, manage is used for nodes in the tenant management zone, and om is used for nodes in the OM zone.

Installing a Rebuilt Node in the Management Zone
Prerequisites

The fault node has been rebuilt. The IP address, image, hard disk, and OS of the node remain unchanged.

Procedure
  1. Use PuTTY to log in to the om_core1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to switch to user root:

    su - root

    paasword: password of the root user

  3. First check whether there is a hanging volume log file with .flag or create_vol.log ending in /home/paas/create_vol_tool/, /var/log/tools/create_vol/ or /tmp directory. If there is any deletion, skip it. Then switch to the paas user and mount disks to the rebuilt node.

    For more information, see Mounting Disks in the "Preparing for Installation" section of Installation Guide.

  4. Add the rebuilt node to the management node cluster.

    1. Run the following command to modify the node configuration file:

      vi /tmp/manage-cluster3.yaml

      manage-cluster3.yaml is an example configuration file used in 3.b in Deleting a Faulty Node.

      Delete the creationTimestamp, resourceVersion, selfLink, uid, and status fields.
      NOTE:

      If unschedulable: true is displayed, change true to false. Otherwise, skip the operation.

    2. Save the modifications and exit.
    3. Run the following command to add the rebuilt node to the node cluster:

      kubectl create -f /tmp/manage-cluster3.yaml

  5. Check the status of the rebuilt node.

    1. Run the following command to view the status of the rebuilt node:

      kubectl get no --all-namespaces

      The STATUS column in the command output displays node statuses. Three node statuses are available: unknown, notready, and ready.

    2. Log in to the rebuilt node as the paas user. Run the following command to view service status of the node:

      monit summary

      • If all services on the rebuilt node are in the Running state, the rebuilt node has been successfully added to the management node cluster.
      • If a service on the rebuilt node is not in the Running state, run the monit restart {ServiceName} command to restart the service.

        In the preceding command, {Service Name} indicates the name of the faulty service queried.

  6. (Optional) Recover the cfe-etcd.

    This step is required only if manage.etcd:etcd is listed below the labels section of the manage-cluster3.yaml file. For more information, see CFE-ETCD Restoration.

  7. (Optional) Recover the mysql.

    This step is required only if manage.mysql:mysql is listed below the labels section of the manage-cluster3.yaml file. For more information, see DBM Database Recovery.

  8. Run the following command to check pod status of the rebuilt node:

    kubectl get pods -n manage -o wide | grep <node name>

    Replace <node name> with the node name obtained in 3.a in Deleting a Faulty Node.

    If all pods on the rebuilt node are in the Running state, the node has successfully recovered.

    NOTE:
    • The commands for recovering a node in the OM zone are similar to the command for recovering a node in the tenant management zone except that manage in the commands must be replaced by om in the case of OM zone.
    • If an abnormal pod exists and the pod is not in the Running state, you can delete the pod to recover the node again.

      For example, if the pod status of the ICAgent service on the node in the management zone is abnormal, run the kubectl delete pod name -n om --grace-period=0 --force command to delete the pod.

      In the preceding command, name indicates the name of the queried abnormal pod, that is, the content in the NAME field.

    • Perform the following steps to change the name of the host on the rebuilt node if needed.
      1. Log in to the rebuilt node as the paas user and switch to the root user. Run the following command to temporarily change the host name:

        hostname {New host name}

      2. Run the following command to modify the configuration file for the new host name to take effect permenantly:

        echo '{New host name}' > /etc/hostname

      3. Log in to the rebuilt node as the paas user for the new host name to take effect immediately.

Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 22880

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next