Geographic Redundancy Emergency Troubleshooting
Failure to Access the Management Plane
Symptom
The login page of the management plane is not displayed properly, or you cannot log in to the management plane.
The primary site in a DR scenario is used as an example. If the management plane at the secondary site cannot be accessed, perform the operations at the secondary site based on those at the primary site.
Possible Causes
- The management node is powered off or its network is abnormal.
- The database of the management plane is abnormal.
- Services of the management plane are abnormal.
Troubleshooting Procedure
This section provides only the basic troubleshooting method. If the fault persists after troubleshooting using this method, contact Huawei technical support.
- Check whether the management node is powered off.
Contact the administrator to power on the node if it is powered off.
In a DR scenario, contact the administrator to check the power supply status of the management nodes at the primary and secondary sites. If the management nodes at the primary and secondary sites are not powered off, go to 2. If the nodes are powered off, contact the administrator to power on them and then perform the following operations:
- Log in to the management plane. For details, see Logging In to the Management Plane.
- On the management plane, choose HA > Remote High Availability System > Manage DR System from the main menu.
- In the Operation column of the row that contains the product with data to be synchronized, click
. Select the product data synchronization direction.
After you specify the data synchronization direction, the DR system performs full data synchronization based on the specified direction, and data at the destination site will be overwritten. You are advised to specify the product with the latest data as the active site product to synchronize data from it to the peer site product. If the direction is from the standby to the active, the standby product will be switched to active, and then synchronizes data to the product at the peer site.
- Perform operations as prompted.
- Check the operation result. If the operation result is not as expected, contact Huawei technical support.
- On the management plane, choose HA > Remote High Availability System > Manage DR System from the main menu.
- On the Manage Remote DR System page, check that the heartbeat status between the primary and secondary sites is
.
- On the Manage Remote DR System page, check that Data Synchronization Status of all products is Synchronized or Synchronizing. If Data Synchronization Status is Delayed, a large volume of data is being synchronized between the primary and secondary sites. Check the status after data synchronization is complete.
- Verify that you can log in to the service plane of the active site. For details, see Logging In to service plane.
- Check the network status of the management plane.
Contact the administrator to check the network status and rectify the fault.
- Check the database instance status of the management plane.
- Use PuTTY to log in to the management node as the sopuser user in SSH mode.
If the management plane is deployed in cluster mode, perform operations on OMP_01 and then on OMP_02. For details about how to obtain the IP address of a node, see How Do I Query the IP Address of a Node?
- Run the following command to switch to the ossadm user:
> su - ossadm
Password: password for the ossadm user
- Run the following command to check the database status of the management plane:
> cd /opt/oss/manager/apps/DBAgent/bin
> bash dbsvc_adm -cmd query-db-instance
Information similar to the following is displayed:
DBInstanceId ClassId InstNumber Tenant AzName IP Port State DBType ... backuprdb-0-999 single backuprdb-0-999 manager cn-global-1-a 10.7.162.90 26522 -- redis ... cloudsopdbsvr-1-0@2-0 primary cloudsopdbsvr-1-0 cdo service 10.7.162.93 32080 Up gauss ... cloudsopdbsvr-1-0@2-0 primary cloudsopdbsvr-2-0 cdo service 10.7.162.92 32080 Up gauss ... dbmgr_rdb-0-999 single dbmgr_rdb-0-999 manager cn-global-1-a 10.7.162.90 32091 -- redis ... ...
- If the value of State is Up or --, the database instance is running properly. Go to 4.
- If the value of State is Down, the database instance is stopped. Perform the following operations.
- Run the following commands on PuTTY as the ossadm user to disable the switchover between the master and slave database instances within 180 minutes only when the management plane or the product is deployed in cluster mode:
> cd /opt/oss/manager/agent/bin
> bash dbha_switch_tool.sh -cmd set-ignore-nodes -nodes all -expire 180
If Successful is not displayed, the command execution fails. Contact Huawei technical support.
- Run the following commands to start the database of the management plane.
> source /opt/oss/manager/bin/engr_profile.sh
> ipmc_adm -cmd startdc -tenant manager
If information similar to the following is displayed and success is displayed for all processes, the database is restarted successfully. Otherwise, contact Huawei technical support.
============================ Starting data container processes... Starting redis process woadapterrdb-1-14 ... success ... Starting redis process serviceinspectionrdb-1-3 ... success Starting redis process privilegerdb-1-28 ... success ============================ Starting data container processes is complete.
- Run the following commands to enable the switchover between the master and slave database instances only when the management plane or the product is deployed in cluster mode:
> cd /opt/oss/manager/agent/bin
> bash dbha_switch_tool.sh -cmd del-ignore-nodes
If Successful is not displayed, the command execution fails. Contact Huawei technical support.
- Use PuTTY to log in to the management node as the sopuser user in SSH mode.
- Check the service status of the management plane.
- Use PuTTY to log in to the management node as the sopuser user in SSH mode.
If the management plane is deployed in cluster mode, perform operations on OMP_01 and then on OMP_02.
- Run the following command to switch to the ossadm user:
> su - ossadm
Password: password for the ossadm user
- Run the following commands to check the running status of the management plane:
> source /opt/oss/manager/bin/engr_profile.sh
> ipmc_adm -cmd statusapp -tenant manager
Information similar to the following is displayed:
Process Name Process Type App Name Tenant Name Process Mode IP PID Status backupwebsite-0-0 backupwebsite BackupWebsite manager cluster 10.93.95.239 341187 RUNNING unideploywebsite-0-0 unideploywebsite UniDeployWebsite manager cluster 10.93.95.239 341202 RUNNING mcfebservice-0-0 mcfebservice MCFEBService manager cluster 10.93.95.239 341553 RUNNING ... [All Processes: 16] [Running: 16] [Not Running: 0]
- If the value of Not Running is 0, all processes are running properly.
- If the value of Not Running is not 0, certain processes are not running or faulty. Go to 4.d.
- Run the following commands to start the management plane service:
> ipmc_adm -cmd startapp -tenant manager
If information similar to the following is displayed and success is displayed for all processes, the management plane service is restarted successfully. Otherwise, contact Huawei technical support.
Starting process backupwebsite-0-0 ... success Starting process smapp-0-0 ... success Starting process cron-0-0 ... success ...
- Use PuTTY to log in to the management node as the sopuser user in SSH mode.