No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-9 Failed to Restart Containerized Application

ALM-9 Failed to Restart Containerized Application

Description

This alarm is reported when a pod fails to start containerized applications.

Pod: In Kubernetes, pods are the smallest unit of creation, scheduling, and deployment. A pod is a group of relatively tightly coupled containers. Pods are always co-located and run in a shared application context. Containers within a pod share a namespace, IP address, port space, and volume.

Attribute

Alarm ID

Alarm Severity

Alarm Type

9

Minor

Environmental alarm

Parameters

Parameter Name

Parameter Description

kind

Resource type.

namespace

Name of the project to which the resource belongs.

name

Resource name.

uid

Unique ID of the resource.

OriginalEventTime

Event generation time.

EventSource

Name of the component that reports an event.

EventMessage

Supplementary information about an event.

Impact on the System

Related functions may be unavailable.

System Actions

The pod keeps restarting.

Possible Causes

The container startup command configured for the pod is incorrect. As a result, the containers cannot run properly.

Procedure

  1. Obtain the name of the pod that fails to be started.

    1. Use a browser to log in to the FusionStage OM zone console.
      1. Log in to ManageOne Maintenance Portal.
        • Login address: https://Address for accessing the homepage of ManageOne Maintenance Portal:31943, for example, https://oc.type.com:31943.
        • The default username is admin, and the default password is Huawei12#$.
      2. On the O&M Maps page, click the FusionStage link under Quick Links to go to the FusionStage OM zone console.
    2. Choose Application Operations > Application Operations from the main menu.
    3. In the navigation pane on the left, choose Alarm Center > Alarm List and query the alarm by setting query criteria.
    4. Click to expand the alarm information. Record the values of name and namespace in Location Info, that is, podname and namespace.

  2. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  3. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  4. Run the following command to obtain the IP address of the node on which the pod runs:

    kubectl get pod podname -n namespace -oyaml | grep -i hostip:

    In the preceding command, podname is the instance name obtained in 1, and namespace is the namespace obtained in 1.

    Log in to the node using SSH.

  5. Search for the error information based on the pod name and correct the container startup configuration.

    1. Run the following commands to view the kubelet log:

      cd /var/paas/sys/log/kubernetes/

      vi kubelet.log

      Press the / key, enter the name of the pod, and then press Enter for search. If the following content in bold is displayed, the container startup command fails to be executed:

      I0113 14:19:29.497459   70092 docker_manager.go:2703] 
      checking backoff for container "container1" in pod "nginx-run-1869532261-29px2"  
      I0113 14:19:29.497620   70092 docker_manager.go:2717] 
      Back-off 20s restarting failed container=container1 pod=nginx-run-1869532261-29px2_testns(b01b9e9c-f829-11e7-aa58-286ed488d1d4) 
      E0113 14:19:29.497673   70092 pod_workers.go:226] 
      Error syncing pod nginx-run-1869532261-29px2-b01b9e9c-f829-11e7-aa58-286ed488d1d4, 
      skipping: failed to "StartContainer" for "container1" with CrashLoopBackOff:
      "Back-off 20s restarting failed container=container1 pod=nginx-run-1869532261-29px2_testns(b01b9e9c-f829-11e7-aa58-286ed488d1d4)
    1. Run the following command to query the container ID:

      docker ps -a

    2. Run the following command to check the specific error information:

      docker logs containerID

      containerID is the container ID obtained in 5.b.

      The following information in bold indicates that the startup script does not exist:
      # docker logs 128acfd300c8  
      container_linux.go:247: starting container process caused "exec: \"bash /tmp/test.sh\": stat bash /tmp/test.sh: no such file or directory"
    1. Correct the container startup command based on error information, delete and deploy the application again, and then check whether the alarm is cleared. In the case of the above error, you need to specify the correct directory for the startup script or put the startup script in the directory.
      • If yes, no further action is required.
      • If no, go to 6.

  6. Contact technical support for assistance.

Alarm Clearing

This alarm is automatically cleared after the fault is rectified.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 44999

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next