No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight Patch Package Fails to Be Uploaded

Publication Date:  2019-04-12 Views:  71 Downloads:  0
Issue Description
When the SPC002 patch is installed on FusionInsightV100R002C60U10, the patch upload progress is stuck in 94%. As a result, the patch fails to be uploaded. The patch file has been successfully uploaded to the active node, but the synchronization on the standby node fails.
Handling Process

1. Analyze the problem prerequisites.

At about 06:00:00 on November 19, 2016, node 11 was the active node and node 12 was the standby node.

The /srv/BigData/LocalBackup file in the backup directory was deleted from the active node (11). The configuration of the synchronization directory was as follows:

<file name="/srv/BigData/LocalBackup" auto="no" delete="no"/>-------: Indicates that when the active node was deleted, the standby node was not deleted synchronously and needed to be manually deleted.

The following figure shows the logs of the node 12 (standby node).


2. Analyze the upgrade process.

The corresponding HA configuration file was not modified when the IP address of the old version was modified. As a result, the HA of node 12 failed to be started after the upgrade.

During this period, node 12 failed to be started. As a result, the contents of the backup directory /srv/BigData/LocalBackup on the active and standby nodes were inconsistent.

3. Analyze the patch process.

After the upgrade, node 12 became the active node. After node 12 was restarted, the full file synchronization was triggered immediately. All files were synchronized. The log on the standby node (11) of the HA system recorded file synchronization logs (From 13:45:55 to 15:18:58).

The backup directory of the active node (12) was synchronized and recorded in the log, as shown in the following:

All the preceding files were synchronized. At 15:18:58, the synchronization was interrupted, because node 11 (standby node) was restarted. The operation logs indicate that the node is manually restarted. The following figure shows the detailed log records.


4. After node 11 was started, the active and standby nodes recovered.

The system automatically triggered full file synchronization and relayed the last file synchronization. The synchronization period was from 15:22:11 to 16:48:23.

At 16:48:23, node 11 was still downloading files in the backup directory during file synchronization.

The following figure shows the log records.


Root Cause

The patch is uploaded at 15:27:30. At this time, all files are synchronized between the active and standby nodes, and the patch file synchronization is not started. As a result, the patch uploading times out.


Solution

Preventive measures

1. Workaround for the problem that the backup directory on the active/standby OMS nodes is full.

1. Log in to the active OMS node.

2. Open the file.

/opt/huawei/Bigdata/OMSV100R001C00x8664/workspace/ha/module/hasync/plugin/conf/filesync.xml

3. Delete the delete="no" parameter from the /srv/BigData/LocalBackup, /opt/huawei/Bigdata/LocalBackup, and /srv/BigData/Manager/bak configuration items. The modified content is shown in Figure 1.

Figure 1 Modified contents in the synchronization configuration file of the OMS



4. Run the "ps -ef | grep ha.bin | grep OMS" command to query the PID of the HA process. Run the kill -9 <hapid> command to stop the HA process. In the command, <hapid> indicates the value of the second column in the ps command output.

5. About two minutes later, run the ps -ef | grep ha.bin | grep OMS command again to check whether the HA process is started.

6. Log in to the standby OMS node and check whether the content of step 3 is modified in the /opt/huawei/Bigdata/OMSV100R001C00x8664/workspace/ha/module/hasync/plugin/conf/filesync.xml file. If the content is not synchronized, delete it manually.

7. Repeat steps 4 and 5 on the standby OMS node. Restart the HA process.

8. Log in to the active OMS node, switch to user omm, and run the /opt/huawei/Bigdata/OMSV100R001C00x8664/workspace/ha/module/hacom/tools/ha_client_tool --syncallfile command to synchronize all files.

9. After the full synchronization is complete, wait for about two minutes and check whether the redundant files in the preceding directories are deleted from the standby OMS node.

2. Workaround for the problem that the backup directory on the active/standby DBServer nodes is full.

1. Log in to the active DBServer node.

2. Open the /opt/huawei/Bigdata/FusionInsight/dbservice/setup/conf/ha_plugin/ha_sync_conf/dbservice_sync.xml file. Delete the delete="no" parameter from the #DBSERVICE_INSTALL_HOME#/bak configuration item. The modified content is shown in Figure 2.

Figure 2 Modified contents in the initial configuration file of the DBService


3. Open the /opt/huawei/Bigdata/FusionInsight/dbservice/ha/module/hasync/plugin/conf/dbservice_sync.xml file.

4. Delete the delete="no" parameter from the /opt/huawei/Bigdata/FusionInsight_V100R002C60XXX/dbservice/bak configuration item. For details about the modified contents, see Figure 3. XXX varies depending on the specific version.

Figure 3 Modified contents in the synchronization configuration file of the DBService


5. Run the ps -ef | grep ha.bin | grep dbservice command to query the PID of the HA process. Run the kill -9 <hapid> command to stop the HA process. In the command, <hapid> indicates the value of the second column in the ps command output.

6. Wait for about 2 minutes and run the ps -ef | grep ha.bin | grep dbservice command again to check whether the HA process is started.

7. Log in to the standby DBServer node and check whether the content of step 3 is modified in the /opt/huawei/Bigdata/FusionInsight/dbservice/ha/module/hasync/plugin/conf/dbservice_sync.xml file. If the content is not synchronized, delete it manually.

8. Repeat steps 4 and 5 on the standby DBServer node to restart the HA process.

9. Log in to the active DBServer node, switch to user omm, and run the /opt/huawei/Bigdata/FusionInsight/dbservice/ha/module/hacom/tools/ha_client_tool --syncallfile command to perform full synchronization of HA files.

After the full synchronization is complete, wait for about two minutes and check whether the redundant files in the preceding directories are deleted from the standby DBServer node.

Check whether the files in the backup directory /srv/BigData/LocalBackup on the active and standby OMS nodes are consistent. If the files are inconsistent, back up the files in the backup directory to other disks and delete the files from the directories on both active and standby nodes.

Check whether the files in /opt/huawei/Bigdata/FusionInsight/dbservice/bak on the active and standby DBService nodes are consistent. If the files are inconsistent, back up the files in the backup directory to other disks and delete the files from the directories on both active and standby nodes.

The problem that the number of files in the backup directory on the standby OMS node and the standby DBServer node increase continuously is rectified on the FusionInsight HD V100R002C60U10SPC003.
Suggestions

This problem is caused by the software bug. Due to the bug, the number of files in the backup directory increases continuously and the disk space may be fully occupied after a long time. This problem can be avoided in the initial design stage or by installing the C60U10SPC003 patch.

END