No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-43151 Backup failed

ALM-43151 Backup failed

Description

This alarm is reported when a backup task created using the backup framework fails to be executed.

Attribute

Alarm ID

Alarm Severity

Alarm Type

43151

Minor

Quality of service (QoS) alarm

Parameters

Parameter Name

Parameter Description

service

Specifies the name of the service for which reports the alarm.

instance

Specifies the name of the service instance for which the alarm is reported.

Impact on the System

The backup fails, and no backup data is generated for recovery.

Possible Causes

The connection configurations of the remote backup server are incorrect. The username or password is incorrect. The network of the remote server is unreachable. The remote backup directory does not exist. The permission on the remote backup directory is incorrect. The space of the remote backup directory is insufficient.

The backup may also fail due to faults of the container network between the backup framework and services. However, in this scenario where the entire system is abnormal, the container network fault alarm is generated based on system monitoring. Therefore, no automatic backup failure alarm is generated in this scenario.

Procedure

  1. Use a browser to log in to the FusionStage OM zone console.

    1. Log in to ManageOne Maintenance Portal.
      • Login address: https://Address for accessing the homepage of ManageOne Maintenance Portal:31943, for example, https://oc.type.com:31943.
      • The default username is admin, and the default password is Huawei12#$.
    2. On the O&M Maps page, click the FusionStage link under Quick Links to go to the FusionStage OM zone console.

  2. Choose Backup > Backup task from the main menu. Find the backup task in the alarm and view the specific failed subtasks in this task.
  3. Choose Backup > Instance query and determine the instance type based on the instance name.
  4. Based on the information of failed subtasks, perform the following steps to rectify the backup faults of instances of the ETCD type:

    • cluster unhealthy
      Indicates that the etcd cluster is unhealthy. Check the health status of the corresponding etcd cluster.
      1. Use PuTTY to log in to the manage_lb1_ip node.

        The default username is paas, and the default password is QAZ2wsx@123!.

      2. Run the following command and enter the password of the root user to switch to the root user:

        su - root

        Default password: QAZ2wsx@123!

      3. Run the following command to check whether the pods are Running properly:

        kubectl get pods -n Namespace where the etcd cluster is located | grep etcd

        If the pod status is normal, the cluster status may be abnormal during the backup task. Recreate a task. If the pod status is abnormal, restore the cluster first.

    • failed to save snapshot

      Indicates that snapshots failed to be saved on the local disk. An I/O exception may occur. In this occasionally occurring scenario, recreate a task and obtain logs for developers to locate fault.

      Log address: /var/paas/sys/log/etcd-backup/etcd-backup.log of the host where the failed instance is located. As logs are prevented from being over-sized, they may be dumped or aged. Therefore, obtain logs timely.

    • failed to download file
      Indicates that an internal error occurs during file downloading. In the SFTP scenario, check whether the backup file exists and whether the user has the read permission on the backup directory and backup file.
      1. Run the following commands to check whether the file exists:

        cd Backup directory/etcd_backup/etcd/

        ls | grep Backup file name
        NOTE:

        In the preceding command, etcd indicates the name of the etcd cluster that is backed up. Backup file name indicates the name of the backup file corresponding to the backup task during restoration. You can view the name on the portal.

        If no file is displayed in the command output, recreate a task to back up the file. If the command output contains the file, go to 4.b.

      2. Run the following command to check whether the SFTP user has the read permission on the file:

        ls -al | grep Backup file name

        Check whether the SFTP user has the read (r) permission in the command output. If the SFTP user does not have the permission, assign the SFTP user the read (r) permission using the chmod command. Alternatively, use the user created in the backup file as the SFTP user to recreate a restoration task.

    • failed to upload file

      Indicates that an internal error occurs during file uploading in the SFTP scenario.

      Run the following command to check whether the file backup directory exists and whether the user has the write permission on the directory:

      cd Backup directory

      • If No such file or directory is displayed in the command output, the backup directory does not exist. Run the mkdir command as the SFTP user to create the directory level by level and run the chmod 750 level-by-level directory command to modify the permissions for directories at each level.
      • If Permission denied is displayed, the user does not have the permission to access the directory. In this case, run the chmod command to assign the read/write permission to the user or use the account of the directory creator to reconfigure the SFTP information on the portal. Then, recreate a backup task.
      • If no abnormality is displayed in the command output, run the following command to check whether the SFTP user has the write (w) permission for the directory:

        cd ../ ; ll

        If the SFTP user does not have the permission, assign the write (w) permission to this user using the chmod command.

    • have task running

      Indicates that a task is being processed. Create a task after the task is complete.

    • sftp not available

      The reasons for sftp not available are as follows. Examine the problem by referring to them one by one.

      Log in to the SFTP server as the SFTP user configured on the portal. In the following commands, Backupdirectory indicates the SFTP data backup directory configured on the portal.
      • The backup directory is not created on the SFTP server or the user does not have the permission to access the backup directory.

        cd Backup directory

        If No such file or directory is displayed in the command output, the backup directory does not exist. Run the mkdir command as the SFTP user to create the directory level by level and run the chmod 750 level-by-level directory command to modify the permissions for directories at each level. If Permission denied is displayed, the user does not have the permission to access the directory. In this case, run the chmod command to assign the read/write permission to the user or use the account of the directory creator to reconfigure the SFTP information on the portal. Then, recreate a backup task.

      • The current SFTP user cannot create subdirectories in the backup directory.

        Run the following commands to check whether the SFTP user has the permission to create subdirectories:

        cd Backup directory

        mkdir etcd_backup

        If an error is reported in the command output, the SFTP user has no permission to create subdirectories. Run the chmod command to assign the read/write permission to this user.

      • The file to be restored does not exist or is damaged before restoration.
        1. Run the following commands to check whether the file exists:

          cd Backup directory/etcd_backup/etcd/

          ls | grep Backup file name

          If no file is displayed in the command output, recreate a task to back up the file. If the command output contains the file, go to 4.b.

        2. If the file exists, run the following commands to check whether the file can be opened:

          tar -xzf Backup file name

          cd backup

          vi snapshot.db

          If the file can be opened and the data is displayed, the file is normal. Otherwise, the file is damaged and cannot be repaired. Back up the file again.

      • The SFTP username or password is incorrect.

        Use PuTTY to log in to the SFTP server and check whether the password is correct.

    • execute ssh command failed
      The reasons for execute ssh command failed are as follows. Examine the problem by referring to them one by one.
      • Check whether mutual trust relationships have been added among the manage_lb1_ip, manage_lb2_ip, and manage_lb3_ip nodes.
        1. Mutual trust relationships have been added among nodes.

          Log in to themanage_lb1_ip, manage_lb2_ip, and manage_lb3_ip nodes as the paas user and run the following command to check whether you are required to enter the password as prompted:

          scp -o StrictHostKeyChecking=no paas@IP addresses of other manage_lbnodes:/opt/paas/backup_cfe/etcd_backup/test file /var/paas/run/

          If you can perform the backup task without entering the password, the mutual trust relationships have been added. Otherwise, perform 4.c to add mutual trust relationships to the nodes again.

        2. You can log in to the local node without entering the password.

          Log in to themanage_lb1_ip, manage_lb2_ip, and manage_lb3_ip nodes as the paas user and run the following command to check whether you are required to enter the password as prompted:

          scp -o StrictHostKeyChecking=no paas@IP address of the local node:/opt/paas/backup_cfe/etcd_backup/test file /var/paas/run/

          If you can perform the backup task without entering the password, the mutual trust relationships have been added. Otherwise, perform 4.c to add mutual trust relationships to the nodes again.

        3. Add mutual trust relationships among nodes.
          1. Log in to the nodes among which the mutual trust relationships need to be added as the paas user.
          2. Run the following command to add the mutual trust relationships:

            ./fsadm solution HEC -c addvm -p all

      • Log in to the backend and check whether the password of the paas user expires.

        Log in to the manage_lb1_ip, manage_lb2_ip, and manage_lb3_ip nodes as the paas user. If the command output indicates that the password has expired, handle the password expiration problem.

    • timeout to get cluster status

      Indicates that the etcd cluster status is abnormal. Check the status of the corresponding etcd cluster.

      Run the following command to check whether the pods are Running properly:

      kubectl get pods -n Namespace where the etcd cluster is located | grep etcd

      If the pod status is normal, the cluster status may be abnormal during the backup task. Recreate a task. If the pod status is abnormal, restore the cluster first.

    • not supported storage type

      Indicates that the storage type is not supported. Check the storage type in the backup policy. The storage methods include sftp and local.

    • Other prompt information

      Contact technical support engineers.

Alarm Clearing

After locating and rectifying the backup fault, manually create a backup task and ensure that the backup is successful. Then, manually clear the alarm on the Alarm List page.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 37602

Downloads: 31

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next