No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Application

Application

When a User Run the rescan-scsi-bus.sh Command to Scan for the New Mapped LUNs, the Task Keeps Running for a Long Time

Symptom

If an eBackup server is added to a FusionStorage cluster and the system has new mapped LUNs, the task keeps running for a long time and cannot stop when you run the rescan-scsi-bus.sh command to scan for the new mapped LUNs.

Possible Causes

The version of FusionStorage Agent does not match that of FusionStorage. This causes malfunction of scanning LUNs on the eBackup server.

Procedure
  1. Use PuTTY to log in to the backup server and backup proxy in sequence using the IP addresses corresponding to the datamover_externalom_iplist field.

    Default account: hcp. Default password: PXU9@ctuNov17!.

  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the reboot command to restart the operating system of the eBackup server.
  4. Add an eBackup server to a FusionStorage cluster.

    For details, see section Adding eBackup Servers to a FusionStorage Cluster.

The eBackup Service Stops When the Capacity Usage of the /opt Partition on the eBackup server Exceeds 96%

Symptom

The eBackup service stops when the capacity usage of the /opt partition on the eBackup server exceeds 96%, and you cannot log in to eBackup GUI.

Possible Causes

Available space of the /opt partition is insufficient.

Procedure
  1. Use PuTTY to log in to the workflow-eBackup01 node (backup manager), workflow-eBackup02 node (backup workflow server), backup server, or backup proxy.

    Login addresses:

    • Management IP address of the workflow-eBackup01 node: IP address corresponding to the Workflow-PublicService-IP0 field.
    • Management IP address of the workflow-eBackup02 node: IP address corresponding to the Workflow-PublicService-IP1 field.

    In the CSHA or management plane cross-AZ HA scenario, the backup manager name is workflow-eBackup. The name of the backup workflow server is dr-workflow-eBackup.

    Default account: hcp. Default password: PXU9@ctuNov17!.

  2. Run the su root command and enter the password of user root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run TMOUT=0 to prevent the system from exiting due to timeout.
  4. Run df -h /opt to check the remaining space of /opt. When the capacity usage exceeds 96%, run rm File name or rm -rf Folder name to delete unneeded files or folders other than the eBackup software.
  5. Run the service hcp start command to start the eBackup service.
  6. Log in to eBackup GUI again.

A Standby Node Fails to Be Removed When the HA Function Is Being Used and the Standby Node Is in the Irrecoverable Inaccessible State

Symptom

A standby node fails to be removed on the GUI when the HA function is being used and the standby node is in the irrecoverable Inaccessible state.

Possible Causes
  • An error occurs on the network between the active and standby nodes.
  • A process on the active or standby node is faulty.
Procedure

Possible cause 1: a process on the active or standby node is faulty.

  1. Use PuTTY to log in to the standby node with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the TMOUT=0 command to prevent the system from exiting due to timeout.

    NOTE:

    After you run the preceding command, the system continues to run when no operation is performed, posing a security risk. For security purposes, you are advised to run exit to exit the system after completing your operations.

  4. Run the service hcp status command to check whether the eBackup service is normal.

    • If yes => Go to 5.
    • If no => Run the cd /opt/huawei-data-protection/ebackup/bin command to go to the /bin directory. And run the sh uninstall.sh command to uninstall the eBackup.

Possible cause 2: An error occurs on the network between the active and standby nodes.

  1. Clear HA configuration information.

    1. Run the cd /opt/huawei-data-protection/ebackup/bin command to go to the /bin directory.
    2. Run the sh ha_tool.sh clear command to clear HA configuration information.

  2. Use PuTTY to log in to the active node with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  3. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  4. Run the TMOUT=0 command to prevent the system from exiting due to timeout.

    NOTE:

    After you run the preceding command, the system continues to run when no operation is performed, posing a security risk. For security purposes, you are advised to run exit to exit the system after completing your operations.

  5. Refer to 5 to clear HA configuration information.
  6. Stop the HA process.

    1. Run the cd /opt/huawei-data-protection/ebackup/ha/module/hacom/script command to go to the /script directory.
    1. Run the sh stop_ha_process.sh command to stop the HA process.

  7. Wait several minutes, and check whether the standby node has been removed on the GUI.

    • If the standby node has been removed, no further action is required.
    • If the fault persists, contact technical support engineers.

In an HA Scenario, After Command sh status_ha.sh Is Executed on Both Active and Standby Nodes, the Command Output Indicates Abnormal GaussDB Resources

Symptom

In an HA scenario, after command sh status_ha.sh is executed on both active and standby nodes, the command output indicates abnormal GaussDB resources.

Normal command output:

If the command output is not similar to that in the red rectangle in the preceding figure, the resources are abnormal. The following figure is an example.

Possible Causes

Synchronization between active and standby databases is abnormal.

The following operations may result in loss of some data.

Procedure
  1. Use PuTTY to log in to either of the active and standby nodes with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the TMOUT=0 command to prevent PuTTY from exiting due to timeout.

    NOTE:

    After you run the preceding command, the system continues to run when no operation is performed, posing a security risk. For security purposes, you are advised to run exit to exit the system after completing your operations.

  4. Run the cd /opt/huawei-data-protection/ebackup/bin command to go to the path saving the command for monitoring data synchronization between active and standby GaussDB databases.
  5. Run the sh db_sync_monitor.sh get_status command, and record the command output.

    DB last online role : Primary 
    DB last online time : 2016-04-14 16:38:31     

  6. Use PuTTY to log in to the other node with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  7. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  8. Run the TMOUT=0 command to prevent PuTTY from exiting due to timeout.

    NOTE:

    After you run the preceding command, the system continues to run when no operation is performed, posing a security risk. For security purposes, you are advised to run exit to exit the system after completing your operations.

  9. Run the cd /opt/huawei-data-protection/ebackup/bin command to go to the path saving the command for monitoring data synchronization between active and standby GaussDB databases.
  10. Run the sh db_sync_monitor.sh get_status command, and record the command output.

    DB last online role : Standby 
    DB last online time : 2016-04-14 16:38:31     

  11. Compare the preceding recorded command outputs to determine the active node.

    • Choose the node whose DB last online role is Primary as the active node.
    • If values of DB last online role for the two nodes both are Primary, choose the node whose DB last online time is later as the active node.

  12. Use PuTTY to log in to the active node with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  13. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  14. On the active node, run the service hcp restart force command to forcibly restart the hcp process.
  15. Log in to the eBackup GUI and check whether node status is normal:

    • If the node status is normal, no further action is required.
    • If the node status is abnormal, contact technical support.

In an HA Scenario, the Active and Standby Nodes Are Correctly Configured. However, Services on the Active Node Fail to Be Started

Symptom
  1. In a high availability (HA) scenario, services on the active node fail to be started. The following command output is displayed after the service hcp start command is executed on the active node:
    eBackup: /opt/huawei-data-protection/ebackup/conf #service hcp start 
    Starting Huawei eBackup Service 
    This is primary node, but synchronized status is not correct. Restore the environment by seeing related fault cases in the corresponding product documentation.
  2. After the service hcp start command is executed on the standby node, services on the standby node are started properly. About 2 minutes later, the standby node becomes the active node because the original active node fails to be started. The service hcp status command is executed again to check the eBackup process. The command output indicates that AdminNode is not running. As a result, the system login fails.
    eBackup: /home #service hcp start  
    Starting Huawei eBackup Service  
     
    eBackup: /home #service hcp status 
    Checking for Huawei eBackup Service 
    gaussdb is running 
    AdminNode isn't running. 
    BackupNode is running 
    hcplogrotate is running 
    apache/iBase is running 
    dsware_agent is running 
    HCPProcessMonitor is running 
    OmmHaMonitor is running 
Possible Causes

Services on the active node, standby node, and backup proxies are stopped or an unexpected power outage occurs. Services on all nodes are restarted after more than 10 minutes or when the time difference between the original system time and modified system time is more than 10 minutes. As a result, the services on the active node fail to be started.

Procedure
  1. Use PuTTY to log in to the active and standby nodes with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run TMOUT=0 to prevent the system from exiting due to timeout.

    NOTE:

    After you run this command, the system continues to run when no operation is performed, posing a risk. For security, run exit to exit the system after completing your operations.

  4. Run the sh /opt/huawei-data-protection/ebackup/bin/db_sync_monitor.sh get_status command on both nodes to determine the active node.

    • If the role of one node is Primary, and that of the other node is Standby, the node whose role is Primary is the active node.
    • If roles of both nodes are Primary, the node that goes online later is the active node.

  5. On the original active node, run the service hcp start force command to forcibly start services.

After a VMware Protected Environment Is Added, an Error Message Is Displayed Frequently Indicating that Communicating With the Device Failed

Symptom

After a VMware protected environment is added, an error message is displayed frequently indicating that communicating with the device failed.

Possible Causes

A VMware host whose domain name contains more than 32 bytes exists.

Fault Diagnosis
  1. Use PuTTY to log in to the backup server.

    Login address: The management IP address can be obtained from the address of backup server GUI.

    Default account: hcp, default password: PXU9@ctuNov17!

  2. Run the su root command and enter the password of user root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the TMOUT=0 command to prevent the system from exiting due to timeout.
    NOTE:

    After you run the preceding command, the system continues to run even when no operation is performed, posing security risks. For security purposes, you are advised to run the exit command to exit the system after completing your operations.

  4. Run the cd /opt/huawei-data-protection/ebackup/logs/ command to go to the path saving eBackup logs.
  5. Run the cat HCP_AdminNode.log |grep "Got error from ODBC" command. Multiple logs are displayed.

  6. Log in to the VMware protected environment that you have added using VMware vSphere Client.
  7. Choose Home > Inventory > Hosts and Clusters, and view all hosts.
  8. Check whether a host whose domain name contains a string of more than 32 bytes exists. If yes, the fault is identified.

    Delete the host whose domain name contains a string of more than 32 bytes. Re-add it using its IP address and change the domain name to a string of up to 32 bytes. Then, add the host using the new domain name.

Procedure
  1. Log in to the VMware protected environment that you have added using VMware vSphere Client.
  2. Right-click the host whose domain name contains a string of more than 32 bytes and choose Disconnect from the shortcut menu that is displayed.

  3. Right-click the host and choose Remove from the shortcut menu that is displayed.

  4. Add the host again.

  5. On the Add Host Wizard page, enter the host IP address to add the host again. IP address is recommended, although you can use a new domain name that contains a string of up to 32 bytes.

  1. After adding the host, log in to the eBackup GUI and initiate a VMware environment scanning task.

    • If the scanning task is successfully executed, the fault is rectified.
    • If the scanning task fails to be executed, contact technical support engineers.

Backup Proxies (or Backup Workflow Servers) Cannot Be Registered to the backup server (or Backup Manager)

Symptom
  • Symptom 1: The backup server and backup proxies are deployed on different nodes. When disaster recovery is performed for the backup server or the configurations of backup proxies are incorrect, the backup proxies may fail to be registered to the backup server.
  • Symptom 2: After the backup server (or backup manager) is upgraded from V200R001C00 or V200R001C10 to V200R001C30, the deployed backup proxies (or backup workflow servers) cannot be registered to the backup server (or backup manager).
Possible Causes
  • Possible cause of symptom 1: The public or private key information changes after eBackup is reconfigured.
  • Possible cause of symptom 2: The public or private key information about the backup server (or backup manager) is different from that of backup proxies (or backup workflow servers).
Procedure
  1. Use PuTTY to log in to the backup proxy ( or the workflow-eBackup02 node) with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

    If the symptom is that the backup proxy cannot be registered to the backup server, log in to the backup proxy.

    If the symptom is that the backup workflow server cannot be registered to the backup manager, log in to the workflow-eBackup02 node.

    Management IP address of the workflow-eBackup02 node: IP address corresponding to the Workflow-PublicService-IP1 field

    In the CSHA or management plane cross-AZ HA scenario, the backup manager name is workflow-eBackup. The name of the backup workflow server is dr-workflow-eBackup.

  2. Run the su root command and enter the password of user root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the cat /opt/huawei-data-protection/ebackup/conf/cert/BackupNode.pub command to obtain the public key of the backup proxy (or backup workflow servers).
  4. Use PuTTY to log in to the workflow-eBackup01 node (backup manager) or backup server through a management IP address.

    Login addresses:
    • Management IP address of the workflow-eBackup01 node: IP address corresponding to the Workflow-PublicService-IP0 field
    • Management IP address of the backup server: IP address corresponding to the datamover_externalom_iplist field

    In the CSHA or management plane cross-AZ HA scenario, the backup manager name is workflow-eBackup.

    Default account: hcp. Default password: PXU9@ctuNov17!.

  5. Run cd /opt/huawei-data-protection/ebackup/cli/ to go to the /opt/huawei-data-protection/ebackup/cli/ directory.
  6. Run sh hcpcli.sh admin, enter the password.
  7. Run the management command to enter the management view.
  8. Run the add public_key public_key command to add the public key of the backup proxy (or backup workflow servers) to the backup server (or backup manager). In the preceding command, public_key is the obtained public key of the backup proxy (or backup workflow servers). The public key consists of 40 characters.
  9. Run the service hcp restart command to restart the eBackup process.

The License Becomes Unavailable After HA Is Enabled and an Active/Standby Switchover Is Performed

Symptom

The license becomes unavailable after HA is enabled and an active/standby switchover is performed.

Possible Causes

After the switchover, the ESNs of the current active node do not exist in the license file.

Procedure
  1. Apply for a new license or change the license ESNs. The new license or modified license must contain ESNs of the active and standby nodes.

    For details, see section OceanStor BCManager eBackup in FusionCloud 6.3.1.1 License Guide .

  2. Import the new license or modified license.

    If the problem persists, contact technical support engineers.

Backup Images Are Lost After the HA Function Is Enabled and an Active/Standby Switchover Is Performed

Symptom

Backup images are lost when the HA function is enabled and an active/standby switchover is performed after the services on the active node are stopped.

Possible Causes

After the switchover, database information is not synchronized.

Procedure
  1. Use PuTTY to log in to the current active node with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the TMOUT=0 command to prevent PuTTY from exiting due to timeout.

    NOTE:

    After you run the preceding command, the system continues to run even when no operation is performed, posing security risks. For security purposes, you are advised to run the exit command to exit the system after completing your operations.

  4. Run the cd /opt/huawei-data-protection/ebackup/cli command to go to the /opt/huawei-data-protection/ebackup/cli directory.
  5. Run the sh hcpcli.sh admin command to log in to the CLI.
  6. Run the management command to enter the management view.
  7. Retrieve lost backup images.

    • In the virtual scenario
      1. Run the show protected_environment command to query the ID of the protected environment whose backup images are lost.

        The following shows the command output.

      2. Run the show protected_environment details=verbose ID=ID of the protected environment command to query the ID of the protected object.

        In the preceding command, ID of the protected environment is obtained in 7.a.

        Example:

        show protected_environment details=verbose ID=1

        The following shows the command output.

      3. Run the retrieve backup_image protected_object_id command to retrieve lost backup images.

        In the preceding command, protected_object_id is obtained in 7.b.

        The following shows a command output example:

        retrieve backup_image d2477874-97b9-5578-bb3c-8dfded73a32a       
        Command send successfully.

        You can query the job progress on eBackup.

        1. Log in to eBackup GUI.
        2. On the navigation bar, choose > Job.
        3. Check the progress of the backup image retrieval job.

          After the job is complete, the following page is displayed.

          On the navigation bar, choose > All Backup Images, and check whether the backup images are retrieved.

          If the backup images fail to be retrieved, perform Step 8.

    • In the private cloud scenario, you need to query the volume UUIDs in the upper-layer OpenStack system and then restore backup images according to 7.c.
    • To obtain UUIDs of volumes or VMs to which lost backup images belong, see Step 8.

  8. Restore storage units to retrieve lost backup images.

    For details, see section Restoring eBackup Data in

    If the problem persists, contact technical support engineers.

Failed to Delete Backup Images

Symptom

After storage space is used up, backup images fail to be deleted. In job details, message "Failed to delete information of backup image from database" is displayed.

Possible Causes

Space of storage units is used up.

Procedure
  1. Deactivate backup plans and copy plans associated with faulty storage units.

    1. Log in to the backup server GUI using a browser.

      Login address: https://IP address corresponding to the datamover_management_float_ip field:8088

      Default account: admin. Default password: Cloud12#$ for installation using FCD, and PXU9@ctuNov17! for manual installation.

    2. On the navigation bar, choose > Backup Plan.
    3. Click the backup plan associated with a faulty storage unit, and click the drop-down arrow of Active in the preview area on the right to deactivate the backup plan.
    4. On the navigation bar, choose > Copy Plan.
    5. Click the copy plan associated with the faulty storage unit, and click the drop-down arrow of Inactive in the preview area on the right to deactivate the copy plan.

  2. Use PuTTY to log in to the backup server.

    Login address: The management IP address can be obtained from the address of backup server GUI.

    Default account: hcp, default password: PXU9@ctuNov17!

  3. Run the su root command and enter the password of user root to switch to user root.

    The default password of the root account is Cloud12#$.

  4. Run the TMOUT=0 command to prevent PuTTY from exiting due to timeout.

    NOTE:

    After you run the preceding command, the system continues to run even when no operation is performed, posing security risks. For security purposes, you are advised to run the exit command to exit the system after completing your operations.

  5. Run the df -h command to query the mount point of the storage unit.

    Example:

  6. Run the cd /opt/huawei-data-protection/ebackup/bricks/Mount point of the storage unit command to go to the mount point of the storage unit.

    Example:

    cd /opt/huawei-data-protection/ebackup/bricks/94500ea0-8273–4015-9f07-3e75bf16e9ea

  7. Run the du -sk DummyFileForDisasteryRecovery.tmp command to check whether the size of file DummyFileForDisasteryRecovery.tmp is 100 MB.

    Example:

    eBackup:/opt/huawei-data-protection/ebackup/bricks/94500ea0-8273-4015-9f07-3e75bf16e9ea # du -sk DummyFileForDisasteryRecovery.tmp
    103172 DummyFileForDisasteryRecovery.tmp

  8. Run the > DummyFileForDisasteryRecovery.tmp command to clear content of file DummyFileForDisasteryRecovery.tmp.
  9. Wait for 10 seconds, and run the du -sk DummyFileForDisasteryRecovery.tmp command again.

    Ensure that the file size is less than 10 MB.

    eBackup:/opt/huawei-data-protection/ebackup/bricks/94500ea0-8273-4015-9f07-3e75bf16e9ea # du -sk DummyFileForDisasteryRecovery.tmp
    4 DummyFileForDisasteryRecovery.tmp

  10. The system will perform the delivered backup image deletion job and backup image expiration job associated with the faulty storage unit within two hours, and the jobs will be successfully executed.

    Alternatively, you can delete backup images on eBackup.

  11. Run the df -h command, and check whether the released storage unit space exceeds 200 MB.

    • If the released space exceeds 200 MB, go to Step 12.

    • Otherwise, repeat 10.

  12. Restore file DummyFileForDisasteryRecovery.tmp.

    1. Run the rm DummyFileForDisasteryRecovery.tmp command.
    2. Wait while the system performs the restore job.
    3. After six minutes, run the du -sk DummyFileForDisasteryRecovery.tmp command to check whether the file is re-generated and the file size is 100 MB.

  13. Activate the backup plan and copy plan associated with the faulty storage unit.

    For details about activation operations, see 1.

HA Nodes Are Faulty

Symptom
  • The HA active node is faulty and the system automatically completes an active/standby switchover. After the switchover, you need to recover the faulty node.
  • The HA standby node is faulty and needs to be recovered.
Possible Causes
  • Hardware is faulty.
  • The operating system is faulty.
  • The eBackup software is faulty.
Procedure
  1. Use PuTTY to log in to the backup server and backup proxy before the fault with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the TMOUT=0 command to prevent PuTTY from exiting due to timeout.

    NOTE:

    After you run the preceding command, the system keeps running even when no operation is performed, posing security risks. For security purposes, you are advised to run the exit command to exit the system after completing your operations.

  4. Run the service hcp status command, and check whether the eBackup service is normal:

    • If normal, no further action is required.
    • If abnormal, choose a method to rectify the fault based on the specific scenario:
      • The eBackup services on both the active and standby nodes are faulty: Perform disaster recovery by following instructions in Disaster Recovery.
      • The original active node is faulty but the original standby node is normal: Wait until the system completes the active/standby switchover and then perform 5.

        Method for checking whether the active/standby switchover is successful:

        1. Use PuTTY to log in to the original standby node with the management IP address.

          Default account: hcp, default password: PXU9@ctuNov17!

        2. Run the su root command and enter the password of account root to switch to user root.

          The default password of the root account is Cloud12#$.

        3. Run the TMOUT=0 command to prevent PuTTY from exiting due to timeout.
        4. Run the service hcp status command.

          If the command output shown in the following figure is displayed, the HA active/standby switchover is successful.

      • The original standby node is faulty but the original active node is normal: Perform 5.

  5. Remove an HA member.

    1. Log in to the backup manager or backup server GUI using a browser.

      Login address: https://IP address corresponding to the Workflow-Management-Float-IP field or IP address corresponding to the datamover_management_float_ip field:8088

      Default account: admin. Default password: Cloud12#$ for installation using FCD, and PXU9@ctuNov17! for manual installation.

    2. On the navigation bar, choose > Server.
    3. Select a node that you want to remove. Then from the HA Management drop-down list, select Delete HA member.

  6. Perform the following steps on the faulty node based on the fault cause:

    • The operating system cannot be started or the hardware is faulty. Reinstall the operating system or replace the hardware, and then go to 8.
    • The operating system is started normally but the eBackup service cannot be started. Go to 7.

  7. To uninstall eBackup, contact technical support engineers.
  8. To install and configure eBackup and configure the HA function, contact technical support engineers.
  9. Wait for several minutes and check whether the faulty HA node is successfully added on the GUI.If the problem persists, contact technical support engineers.

A GaussDB Exception Causes User Operation Failures Such as Login and Query Failures

Symptom

After an automatic HA switchover, the GaussDB service of the active node becomes anormal. As a result, user operations, such as logins and queries, fail.

Possible Causes

After the HA switchover, the GaussDB process of the active node fails to start, causing database operations failures.

Procedure
  1. Use PuTTY to log in to the active node with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the TMOUT=0 command to prevent system timeout.

    NOTE:

    After the preceding command is executed, the system remains running even when no operation is performed, which poses security risks. For security purposes, run the exit command to exit the system after you finish performing operations.

  4. Run the cd /opt/huawei-data-protection/ebackup/microservice/ebk_xxx_V200R001C30xx_1.0.0/logs command to enter the /opt/huawei-data-protection/ebackup/microservice/ebk_xxx_V200R001C30xx_1.0.0/logs directory.
  5. Run the vi ebk_xxx.log command to edit the operation log.

    Search the log file for ODBC driver and check whether GaussDB returns the error code -2.

    • If the error code -2 is returned, perform the following steps to handle this problem:
      1. Run the cd /opt/huawei-data-protection/ebackup/microservice/ebk_xxx_V200R001C30xx_1.0.0/script command to enter the /opt/huawei-data-protection/ebackup/microservice/ebk_xxx_V200R001C30xx_1.0.0/script directory.
      2. Run the sh ebackup_stop.sh command to stop the microservice.
      3. Run the sh ebackup_start.sh command to start the microservice.
    • If an error code other than -2 is returned, perform the following steps to handle this problem:
      1. Run the cd /opt/huawei-data-protection/ebackup/bin command to enter the /opt/huawei-data-protection/ebackup/bin directory.
      2. Run the python make_report.py command to collect operation logs.
      3. Contact technical support engineers.

The States of Non-Active Nodes Are Partially Accessible When All eBackup Nodes Are Powered Off

Symptom

After all eBackup nodes are powered off and then powered on again, eBackup is started. However, when the user logs in to the graphic user interface (GUI) of eBackup to view eBackup node information, all non-active nodes are found to be in the Partially Accessible state and services are abnormal.

Possible Causes

Port 5569 on the firewall of the active node is not opened, so other nodes cannot access the active node. In this case, BackupNode process is not started, and the states of all non-active nodes are Partially Accessible.

Procedure
  1. Use PuTTY to log in to a node that is in the Partially Accessible state with the management IP address.

    Default account: hcp, default password: PXU9@ctuNov17!

  2. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  3. Run the TMOUT=0 command to prevent the system from exiting due to timeout.

    NOTE:

    After the preceding command is executed, the system remains running even when no operation is performed, which results in security risks. For security purposes, run the exit command to exit after you finish the operations.

  4. Run the service hcp status command to check whether eBackup provides services normally.

    In the command output, check whether the state of BackupNode is isn't running.

    ... 
    BackupNode isn't running 
    ...
    • If yes, go to 5.
    • If no, contact technical support engineers engineers.

  5. Run the ps -ef|grep BackupNode command to judge whether the BackupNode process is started just before the query.

    • If yes, go to 6.
    • If no, contact technical support engineers engineers.

    The startup time is as shown in the following figure:

  6. Use PuTTY to log in to the workflow-eBackup01 node (backup manager) or backup server through a management IP address.

    Login addresses:
    • Management IP address of the workflow-eBackup01 node: IP address corresponding to the Workflow-PublicService-IP0 field
    • Management IP address of the backup server: IP address corresponding to the datamover_externalom_iplist field

    In the CSHA or management plane cross-AZ HA scenario, the backup manager name is workflow-eBackup.

    Default account: hcp. Default password: PXU9@ctuNov17!.

  7. Run the su root command and enter the password of account root to switch to user root.

    The default password of the root account is Cloud12#$.

  8. Run the cd /opt/huawei-data-protection/ebackup/bin command to go to the directory where the iptables script resides.
  9. Run the iptables -nL|grep -w 5569 command to view firewall information.

    • If no command output is displayed, go to 10.
    • If the command output is displayed, check whether the configuration is correct.
      ACCEPT tcp -- 172.28.0.0/20 172.28.0.0/20 tcp dpt:5569
      • If the above information is displayed, namely, the internal communication plane IP address and subnet mask are displayed in the command output and the state is ACCEPT, the configuration is correct.
      • If the internal communication plane IP address and subnet mask are not displayed in the command output, or the state is ACCEPT, go to 10.

  10. Run the iptables -I INPUT -s Internal communication plane IP address/Internal communication plane subnet mask -d Internal communication plane IP address/Internal communication plane subnet mask -p tcp -m tcp --dport 5569 -j ACCEPT command to add port 5569 in the iptables script.

    Example:

    iptables -I INPUT -s 172.28.0.0/20 -d 172.28.0.0/20 -p tcp -m tcp --dport 5569 -j ACCEPT

  11. Run the iptables-save > /etc/sysconfig/iptables command to save the rules.
  12. Run the iptables -nL|grep -w 5569 command to view firewall information.

    • If the internal communication plane IP address and subnet mask are displayed in the command output, and the state is ACCEPT, wait for several minutes and then log in to the GUI of eBackup to view states of other nodes.
      • If the states of other nodes are Accessible, no further operation is required.
      • If the states of other nodes are still Partially Accessible, contact technical support engineers.
    • If the internal communication plane IP address and subnet mask are not displayed in the command output, or the state is ACCEPT, contact technical support engineers.

Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 22547

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next