No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Backup

Backup

A Backup Job May Be in the In progress State and Cannot Be Interrupted

Symptom

A backup job may be in the In progress state and cannot be interrupted.

On a backup proxy, run the ps aux|grep BackupNode command. Some processes whose names are BackupNode are in D status.

Possible Causes

Data is written into the storage unit. Wait for the system response. If the network of the NFS pr CIFS storage unit is constantly or intermittently interrupted, the entire network will be interrupted. As a result, the backup job enters the In progress status and cannot be interrupted.

Procedure
  1. Refer to Logging In to the eBackup Server to log in to the backup proxy.
  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the reboot command to restart the backup proxy operating system.

    • If the operating system is restarted successfully, no further action is required.
    • If the operating system fails to be restarted, go to 4.

  4. Run the following two commands one by one to restart the backup proxy operating system.

    Restarting the backup proxy operating system will interrupt services. If HA is configured and the active HA node is restarted, an HA switchover will occur, and services will be interrupted for a maximum of two minutes.

    echo 1 > /proc/sys/kernel/sysrq 
    echo b > /proc/sysrq-trigger      
    • If the active HA node can be restarted, no further action is required.
    • If the active HA node cannot be restarted, contact technical support engineers.

A Backup Job May Be in the In progress State for a Long Time

Symptom

A Backup job may be in the In progress state for a long time and the backup progress remains unchanged.

Possible Causes

When data is read from storage devices, if the networks of the storage devices are constantly or intermittently interrupted, read operations enter the waiting state. As a result, the backup job is in the In progress state for a long time.

Procedure
  1. Perform restore operations based on the networking mode between eBackup and backup storage:

    • If a Fibre Channel network is used, go to 2.
    • If an IP network is used, go to 3.

  2. The network between eBackup and backup storage is a Fibre Channel network, perform the following steps:

    1. Use PuTTY to log in to the backup proxy running the backup task with the management IP address.

      Default account: hcp; default password: PXU9@ctuNov17!

    2. Run the su root command and enter the password of account root to switch to user root.

      The default password of the root account is Cloud12#$.

    3. Run the TMOUT=0 command to prevent the system from exiting due to timeout.
      NOTE:

      After you run the preceding command, the system continues to run even when no operation is performed, posing security risks. For security purposes, you are advised to run the exit command to exit the system after completing your operations.

    4. Run the cat /sys/class/fc_host/host*/port_state command to view status of Fibre Channel connections.
      • If no status of Fibre Channel connections of all hosts is online:

        Check and restore Fibre Channel connections between backup storage and eBackup hosts.

        NOTE:

        After Fibre Channel connections become normal, execute the backup jobs again. If backup jobs are not restored, go to 4.

      • If the status of Fibre Channel connections of any hosts is online:

        Contact technical support engineers.

  3. The network between eBackup and backup storage is an IP network:

    1. Use PuTTY to log in to the backup proxy running the backuptask with the management IP address.

      Default account: hcp, default password: PXU9@ctuNov17!

    2. Run the su root command and enter the password of account root to switch to user root.

      The default password of the root account is Cloud12#$.

    3. Run the TMOUT=0 command to prevent the system from exiting due to timeout.
      NOTE:

      After you run the preceding command, the system continues to run even when no operation is performed, posing security risks. For security purposes, you are advised to run the exit command to exit the system after completing your operations.

    4. Run the ping Storage service IP address command to check the network connectivity.
      • If the network is interrupted constantly or intermittently:

        Check and reconnect networks between backup storage and eBackup hosts.

        NOTE:

        After network connections become normal, execute the backup jobs again. If backup jobs are not restored, go to 4.

      • If the network connections are normal:

        Contact technical support engineers.

  4. After the network connections are restored by performing the preceding operations, if backup jobs are not restored, run the reboot command to restart the operating system of the backup proxy. If the operating system fails to be restarted, run the following command to forcibly restart it.

    Restarting the operating system of the backup proxy interrupts services. If HA is configured and the active node is restarted, an HA active/standby switchover will be triggered. Services may be interrupted for a maximum of 2 minutes.

    echo 1 > /proc/sys/kernel/sysrq 
    echo b > /proc/sysrq-trigger

The Progress Stays at 0% During Backup of VMware VMs

Symptom

The progress stays at 0% during backup of VMware VMs.

Possible Causes

The vCenter server or ESXi host has insufficient resources such as CPU and memory, and therefore cannot respond to the backup job sent by an eBackup server.

Fault Diagnosis

Check whether resource usage on the vCenter server or ESXi host exceeds the threshold set by VMware, or log in to the vCenter server or ESXi host to view the error information.

Procedure
  1. Log in to the vCenter server or ESXi host.
  2. Check whether the usage of resources such as CPU and memory exceeds the threshold set by VMware (such as 80%).

    • If yes, contact VMware technical support engineers to upgrade the resources. After the upgrade, use PuTTY to log in to the backup proxy as user hcp through the management IP address, run the service hcp restart command to restart the related backup proxy, and then perform the backup job again.
    • If no, go to 3.

  3. Log in to the backup server GUI using a browser.

    Login address: https://IP address corresponding to the datamover_management_float_ip field:8088

    Default account: admin. Default password: Cloud12#$ for installation using HUAWEI CLOUD Stack Deploy, and PXU9@ctuNov17! for manual installation.

  4. In the navigation tree, choose > Configuration > Advanced Settings, and reduce the maximum number of concurrent tasks (20 by default).
  5. Use PuTTY to log in to the backup proxy using the IP address corresponding to the datamover_externalom_iplist field.

    Default account: hcp. Default password: PXU9@ctuNov17!.

  6. Run the service hcp restart command to restart the related backup proxy, and execute the backup job again.
  7. If the problem persists, contact technical support.

A Message Is Displayed Indicating that a Disk Fails to Be Opened When a VMware VM Is Being Backed Up

Symptom

When a VMware virtual machine (VM) is being backed up, a message is displayed indicating that a disk fails to be opened.

Possible Causes

If the ESXi host where the VM resides is added using the domain name but no corresponding domain name server (DNS) is configured in eBackup, eBackup cannot discover the ESXi host using the domain name. Configure the DNS on backup server and proxies. In this case, eBackup can discover the ESXi host using the DNS.

Procedure
  1. Refer to Logging In to the eBackup Server to log in to the backup server and backup proxy respectively.
  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the TMOUT=0 command to prevent the system from exiting due to timeout.

    NOTE:

    After you run the preceding command, the system keeps running even when no operation is performed, resulting in security risks. For security purposes, you are advised to run the exit command to exit the system after completing your operations.

  4. Run the vi /etc/resolv.conf command to add the IP address of a DNS server.

    Example (the IP address is 10.10.10.10):

    [root@localhost ~]# vi /etc/resolv.conf 
     nameserver 10.10.10.10

    Press Esc to exit the editing mode. Type :wq, and press Enter.

  5. After the DNS server is configured, perform the backup job again.Check whether the backup job is successful.

    • If the backup job is successful=>no further action is required.
    • If the backup job fails=>contact technical support engineers.

The Job Running Mode Is Non-VPP Protocol Acceleration When the VPP Protocol Acceleration Configuration Is Enabled

Symptom

On the eBackup GUI, choose Monitoring > Jobs. The number of blocks transmitted by VPP in the job details is 0, and the VPP protocol acceleration configuration is enabled, but the job running mode is non-VPP acceleration.

Possible Causes
  • The configuration of the VPP protocol acceleration parameters is incorrect.
  • The firewall of the VPP proxy node restricts the network connection between the backup server and backup proxy.
Procedure
  1. Log in to the backup server GUI using a browser.

    Login address: https://IP address corresponding to the datamover_management_float_ip field:8088

    Default account: admin. Default password: Cloud12#$ for installation using HUAWEI CLOUD Stack Deploy, and PXU9@ctuNov17! for manual installation.

  2. In the navigation bar, choose > Configuration > WAN acceleration.
  3. Check the WAN acceleration configuration items: Check whether the S3 server and remote acceleration node are correctly configured.

    • If they are correctly configured, go to 4.
    • If they are not correctly configured, configure correct parameters and try again.

  4. Use PuTTY to log in to the VPP proxy node.
  5. Run the following command to open configuration file vppconf.ini:

    cat /opt/huawei-vpp-proxy-svr/vppProxy/conf/vppconf.ini

  6. Check whether the VppListenIp and S3Path parameters are correctly configured.

    • If they are correctly configured, go to 7.
    • If they are not correctly configured, configure the parameters correctly and try again.

  7. Run the following command to query firewall rules:

    iptables –L -n --line-number

    A sample command output is as follows:

  8. If the following rule is displayed under Chain INPUT and Chain FORWARD:

    REJECT all   -- anywhere       anywhere   reject-with icmp-host-prohibited

    Run the following commands to delete the rule from Chain INPUT and Chain FORWARD.

    In the following command, num indicates the number of the num column in the command output of 7.

    iptables -D INPUT num

    iptables -D FORWARD num

    NOTE:

    If the operating system of the VPP proxy node is Euler, the firewall rule will be generated again after the server is restarted. Delete the rule in time.

  9. If the fault persists, contact technical support engineers.

The Job Progress Stays at 0% for a Long Time or the Job Execution Is Slow During a Backup, Restore, or Copy Job

Symptom

The job progress stays at 0% for a long time or the job execution speed is slow during a backup, restore, or copy job using the VPP protocol acceleration.

Possible Cause

The network between the VPP proxy node and the S3 server is poor.

Procedure
  1. Use PuTTY to log in to a VPP proxy node.
  2. Run the following command to check whether the network connection between the VPP proxy node and the S3 server is normal:

    ping IP address or domain name of the S3 server

    • If the network connection is abnormal or the delay is long and packet loss occurs, rectify the network connection and try again.
    • If the network is normal, contact technical support engineers.

An Inactivated Snapshot Results In Backup Failure When OceanStor V3/V5 Is the Production Storage

Description

A backup task fails. The following information is displayed in the task details: The snapshot is not in the activated state.

Possible Cause

When the production storage is OceanStor V3/V5, the production storage sets the snapshot generated by the backup to the inactive state.

Procedure
  1. In the task details, collect the IP address of the backup proxy and Request ID.
  2. Use PuTTY to log in to the backup proxy using the IP address collected in 1.

    Default account: hcp. Default password: PXU9@ctuNov17!.

  3. Run the su root command and enter the password of user root to switch to user root.

    The default password of user root is Cloud12#$.

  1. Run the TMOUT=0 command to prevent PuTTY from exiting due to timeout.

    NOTE:

    After the preceding command is executed, the system remains running even when no operation is performed, which results in security risks. For security purposes, run the exit command to exit the system after completing your operations.

  2. Run the following command to enter the backup microservice log directory:

    cd /opt/huawei-data-protection/ebackup/microservice/ebk_backup*/logs

  3. Run the following command to query the snapshot ID generated during backup:

    zgrep RequestID ebk_backup*|grep BASELUNID |awk -F? '{print $2}'|awk -F"[&]" '{print $1,$3}'

    RequestID is the value of Request ID collected in 1.

    Information similar to the following is displayed. In this example, the snapshot IDs are 831 and 832.

  4. Use a browser to log in to DeviceManager as user admin.
  5. Choose Data Protection > Snapshot and search for the desired LUN based on the snapshot ID queried in 6.

    If Running Status of the LUN is Inactive, select the LUN and click More and select Activate to activate the LUN.

  6. Perform the backup job again.

CIFS Storage Unit Is Inaccessible

Description

Log in to the backup server GUI and choose Backup Storage > Storage Unit. Accessibility Status of the CIFS storage unit is Inaccessible.

Possible Cause

If the production storage is OceanStor V3/V5 and the version is V300R006C10SPC100 or later, if SMB1 is disabled on the storage array, the CIFS storage unit cannot be accessed.

Procedure
  1. Use PuTTY to log in to the storage array using a management IP address.
  2. Run the following command to check the status of the SMB1 function:

    show service cifs

    • If the value of SMB1 Enabled is No in the command output, SMB1 is disabled. Go to 3.
    • If the value of SMB1 Enabled is Yes in the command output, SMB1 is enabled. Contact technical support engineers.

  3. Run the following command to enable the SMB1 function.

    change service cifs smb1_enable=yes

  4. Log in to the backup server GUI again, choose Backup Storage > Storage Unit, and check whether the storage unit can be accessed.

    • If yes, no further action is required.
    • If no, contact technical support engineers.

Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1151

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next