No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Pod Start Failures

Pod Start Failures

Symptom

A pod fails to be started. ­­The pod is in the CrashLookBack, ExecuteCommandFailed, or ErrPackagePull state.

Possible Causes

  • The health check rules are set for the application. Because the health check fails, the pod is in the CrashLookBack state and restarts continuously.
  • When a process application is started (the pod is in the ExecuteCommandFailed state), the script fails to be executed.
  • The paths of the image and software packages are incorrect when the pod is in the ErrPackagePull state.

Troubleshooting Method

  1. Locate the CrashLookBack fault.

    1. Use PuTTY to log in to the manage_lb1_ip node.

      The default username is paas, and the default password is QAZ2wsx@123!.

    2. Run the following command and enter the password of the root user to switch to the root user:

      su - root

      Default password: QAZ2wsx@123!

    3. Run the following command to check whether the container is in the unhealthy state:

      kubectl describe pod {podname} -n {namespace}

      {podname} indicates the name of the pod that fails to be started, and {namespace} indicates the namespace the pod belongs to.

      ...
      Events:
        Type     Reason            Age                   From                                   Message
        ----     ------            ----                  ----                                   -------
        Warning  Unhealthy         5m (x6087 over 6d)    kubelet, node-zwx-02-8875970-873691f3  Liveness probe failed: Get http://10.16.19.146:88/: dial tcp 10.16.19.146:88: getsockopt: connection refused
        Warning  BackOffStart      45s (x36428 over 6d)  kubelet, node-zwx-02-8875970-873691f3  Back-off restarting failed container
        Normal   Pulled            12s (x3046 over 6d)   kubelet, node-zwx-02-8875970-873691f3  Successfully pulled image "10.68.0.82:20202/dcs-zwx685599/nginx:latest"
        Normal   Killing           12s (x3045 over 6d)   kubelet, node-zwx-02-8875970-873691f3  Killing container with id docker://container1:Container failed liveness probe.. Container will be killed and recreated.
        Normal   Pulling           12s (x3046 over 6d)   kubelet, node-zwx-02-8875970-873691f3  pulling image "10.68.0.82:20202/dcs-zwx685599/nginx:latest"
        Normal   SuccessfulCreate  11s (x3046 over 6d)   kubelet, node-zwx-02-8875970-873691f3  Created container
        Normal   Started           11s (x3046 over 6d)   kubelet, node-zwx-02-8875970-873691f3  Started container
    4. Run the following command to check the execution status of the service health check script. Because the health check script is configured by the service module, there is no troubleshooting method for handling abnormal script execution.

      If no abnormal script execution can be identified, contact technical support for assistance.

      kubectl get pod {podname} -n {namespace} -oyaml

      {podname} indicates the name of the pod that fails to be started, and {namespace} indicates the namespace the pod belongs to.

  2. Locate the ExecuteCommandFailed fault.

    Run the following command:

    kubectl describe pod {podname} -n {namespace}

    {podname} indicates the name of the pod that fails to be started, and {namespace} indicates the namespace the pod belongs to.

    If information similar to the following is displayed, obtain the information in bold.

    ...
    Processes:
      icagent:
        Process ID:     3d0f2297-34b2-11e9-95b6-fa163e22b02b
        Package:        https://10.247.245.47:20202/swr/v2/domains/op_svc_apm/namespaces/op_svc_apm/repositories/default/packages/icagent-repo/versions/7.1.53/file_paths/ICProbeAgent-7.1.53.zip
        Port:           
        State:          Waiting
          Reason:       ExecuteCommandFailed
        Ready:          False
        Restart Count:  1
        Liveness:       http-get https://10.0.0.1:28002/health delay=15s timeout=5s period=10s #success=1 #failure=1
        Environment Variables:
          cluster_mode:                  public
          scale_mode:                    0
          KUBERNETES_SERVICE_TOKEN_DIR:  /var/lib/kubelet/pods/9eb4b065-3a3f-11e7-ba5b-286ed48926f2/volumes/kubernetes.io~secret/default-token-d5mhr
    Conditions:
      Type          Status
      Initialized   True 
      Ready         False 
    ...

    Run the following command to query the IP address of the node where the pod locates.

    kubectl get pod podname -n namespace -owide

    Information similar to the following is displayed.

    NAME                         READY     STATUS        RESTARTS   AGE       IP               NODE
    pom-backupserver-2832619091   1/1      Running       0          16d       10.120.175.173   paas-10.120.175.173

    Log in to the node as user paas and switch to the root user, and switch to the directory where the pod locates, for example, /var/lib/kubelet/pods/9eb4b065-3a3f-11e7-ba5b-286ed48926f2/processes/{Process name}/log/

    Query the *.stderr log information to locate the cause.

    total 16
    drwxr-x--- 2 root root 4096 May 10 09:06 ./
    drwxr-x--- 4 root root 4096 May 16 21:58 ../
    -rw-r----- 1 root root    0 May 16 21:58 Install.stderr
    -rw-r----- 1 root root   41 May 16 21:58 Install.stdout
    -rw-r----- 1 root root    0 May 16 21:58 PostStartProcess.stderr
    -rw-r----- 1 root root    0 May 16 21:58 PostStartProcess.stdout
    -rw-r----- 1 root root    0 May 16 21:58 StartProcess.stderr
    -rw-r----- 1 root root   51 May 16 21:58 StartProcess.stdout
    -rw-r----- 1 root root    0 May 16 21:58 probe.stderr
    -rw-r----- 1 root root    0 May 16 21:58 probe.stdout

  3. Locate the ErrPackagePull fault.

    Run the following command to check whether the pod is in the ErrPackagePull state:

    kubectl get pod {podname} -n {namespace}

    {podname} indicates the name of the pod that fails to be started, and {namespace} indicates the namespace the pod belongs to.

    Run the following command to view the path and version of the software package or the image as required:

    kubectl describe pod {podname} -n {namespace}

    {podname} indicates the name of the pod that fails to be started, and {namespace} indicates the namespace the pod belongs to.

    Events:
      Type     Reason            Age                   From                                     Message
      ----     ------            ----                  ----                                     -------
      Warning  BackOffPullImage  5m (x21422 over 3d)   kubelet, w12345180-20.64.1.111-0ad09743  Back-off pulling image "10.68.0.82:20202/dzq11/nginx:latest"
      Normal   Pulling           1m (x975 over 3d)     kubelet, w12345180-20.64.1.111-0ad09743  pulling image "10.68.0.82:20202/dzq11/nginx:latest"
      Warning  FailedCreate      11s (x21445 over 3d)  kubelet, w12345180-20.64.1.111-0ad09743  Error: ImagePullBackOff
    1. Check whether the software or image exists in the software or image repository. If the software or image does not exist, upload it.
    2. On the node where the faulty pod is deployed, run the curl -k {Software repository address} command, such as curl -k 10.120.193.73:2567, to check whether the software repository network is accessible. If the network is not accessible, check the network.
    3. If the fault still cannot be solved, contact technical support for assistance.

Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1967

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next