No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Troubleshooting When Failover Is not Enabled on the Pod for a Long Time

Troubleshooting When Failover Is not Enabled on the Pod for a Long Time

After a node is faulty, applications on the node cannot be migrated to other nodes for a long time (more than 10 minutes) and applications are not ready.

Possible Causes

  • A large number of nodes in the cluster are in the NotReady state, triggering fault protection.
  • The pod fails to be scheduled.

Troubleshooting Method

  • To rectify the fault when a large number of nodes in the cluster are in the NotReady state, perform the following steps:
    1. Use PuTTY to log in to the manage_lb1_ip node.

      The default username is paas, and the default password is QAZ2wsx@123!.

    2. Run the following command and enter the password of the root user to switch to the root user:

      su - root

      Default password: QAZ2wsx@123!

    3. To rectify the fault when a large number of nodes in the cluster are in the NotReady state, perform the following steps:

      kubectl get node

      Information similar to the following is displayed:

      PROJECT        NAME                                STATUS                        ROLES     AGE       VERSION
      abc            10.60.21.18-e113d08f               NotReady,SchedulingDisabled   <none>    10d       v2.12.27-FusionStage6.5.RP2-B053-dirty
      apig           apig-1356940-fdb44e36               NotReady                      <none>    1d        
      cfe            hsbn-4321470-89853241               NotReady                      <none>    5d        
      kube-system    k-9133340-cfd73152                  NotReady                      <none>    10d       v2.12.27-FusionStage6.5.RP2-B053-dirty
      abc            lwx382934-public-4299580-31bf3f7d   Ready                         <none>    10d       v2.12.27-FusionStage6.5.RP2-B053-dirty
      cfe-test       lwx382959-6117981-0c8d89f5          Ready                         <none>    5d        v2.12.27-FusionStage6.5.RP2-B053-dirty
      cfe-test       lwx382959-6117982-5ab939de          Ready                         <none>    5d        v2.12.27-FusionStage6.5.RP2-B053-dirty
      hz-cfe         node-10-60-20-107-404ed636-bmcst   NotReady                      <none>    10d       v2.12.27-FusionStage6.5.RP2-B053-dirty
      cfe-test       node-10-60-20-107-c504f75a-gbvsw   Ready                         <none>    12h       v2.12.27-FusionStage6.5.RP2-B053-dirty
      cfe-test       node-10.60.21.215                  Ready                         <none>    14h       v2.12.27-FusionStage6.5.RP2-B053-dirty
      hz-cfe         node-8422160-8ce94d2e               NotReady                      <none>    10d       v2.12.27-FusionStage6.5.RP2-B053-dirty
      hz-cfe         node1-8664030-f47b8a22              NotReady                      <none>    10d       v2.12.27-FusionStage6.5.RP2-B053-dirty
      hz-cfe         node3-bf363074                      NotReady                      <none>    10d       
      fst-manage     paas-10-60-20-122                  Ready                         <none>    10d       v2.12.27-FusionStage6.5.RP2-B053-dirty
      ......

      The nodes are counted as follows:

      • ReadyNum is used to indicate the number of nodes in the Ready state.
      • NotReadyNum is used to indicate the number of nodes in the NotReady state.
    4. Determine whether to trigger fault protection based on the number of nodes.

      When the following conditions are met, fault protection is triggered, and the system stops evicting applications of the faulty node and does not schedule the container to other nodes:

      • NotReadyNum3
      • NotReadyNum/(ReadyNum + NotReadyNum)0.55

        If the preceding conditions are met, you need to restore other nodes in the NotReady state to the Ready state. For details, see Node in NotReady State.

        If the fault protection conditions are not met, the failover continues and no operation is required.

  • To rectify the fault when the pod fails to be scheduled, do as follows:

    For details about the fault rectification, see Pod Scheduling Failures.

  • If the fault persists, contact technical support for assistance.
Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1173

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next