Node Replacement
Replacing a FusionInsight Node
Replacing a Data Node
Constraints and Precautions
- If FusionInsight and iMaster NCE-Campus are co-deployed, back up iMaster NCE-Campus data before replacing a FusionInsight node.
- The new node must have the same OS and OS encoding scheme as the faulty node.
- The new node must have the same disk names and disk size as well as the same OS partition names and partition size as the faulty node.
- The new node must have the same number of NICs, NIC names, and IP addresses as the faulty node.
- The new node must have the same host name as the faulty node.
- The new node must be configured with the same time zone and time as the faulty node.
Check Before Node Replacement
- Configure the permission of the omm user to start scheduled tasks on the new node.
- If only the /etc/cron.allow file is available in the system, add the omm user to this file.
- If only the /etc/cron.deny file is available in the system, delete the omm user from this file.
- If both the /etc/cron.allow and /etc/cron.deny files are available, add the omm user to the /etc/cron.allow file.
- If the /etc/cron.deny file is available in the system and the omm user is not added to this file, no operation is required.
Preparations for Node Replacement
- Install a new node. For detail, see FusionInsight Product Documentation.
- Log in to the new node as the root user.
- (Optional) Expand the disk space of the /opt directory.
This step is mandatory if FusionInsight and iMaster NCE-Campus are co-deployed. Before replacing a node, you must expand the disk space of the /opt directory.
- VM
- Obtain the disk_expand.sh script in the EasySuite installation directory \tools\osconfig\maintain_tools directory.
- Upload the disk_expand.sh script to the /opt directory of the VM.
- Log in to the OS as the root user, go to the /opt directory, and run the following commands to expand the disk capacity:
# cd /opt # bash disk_expand.sh
- Run the df -h command to view the size of the /opt partition and check whether the capacity expansion is successful.
If the size of the /opt partition increases, the capacity expansion is successful.
- Physical Machine
- Obtain the lvm.sh script in the EasySuite installation directory \tools\osconfig\maintain_tools directory.
- Upload the lvm.sh script to the /opt directory of the VM.
- Log in to the OS as the root user, go to the /opt directory, and run the following commands to expand the disk capacity:
# cd /opt # bash lvm.sh
- Run the df -h command to view the size of the /opt partition and check whether the capacity expansion is successful.
If the size of the /opt partition increases, the capacity expansion is successful.
- VM
- Partition the data disk on the new node. Change the permission of the /opt/fi_tools folder to 755.
- Run the following command on the new node to create the /opt/fi_tools directory.
mkdir -p /opt/fi_tools/ chmod 755 /opt/fi_tools
- Find the primary management node in the FusionInsight cluster. Log in to FusionInsight, click
- Log in to the primary management node of the iMaster NCE-Campus cluster as the sopuser user. Switch to root user, and delete the host key specified in the management IP address record of the new node from the /root/.ssh/known_hosts file.
su - rootvi /root/.ssh/known_hosts
Run the vi /root/.ssh/known_hosts command and delete the host key specified in the management IP address record of the new node.
- Log in to the primary management node of the cluster as the sopuser user. Switch to root user, and copy all files in /opt/fi_tools of the primary management node to /opt/fi_tools and /opt of the new node. The IP address specified in the following command is the IP address of the new node.
su - root cd /opt/fi_tools scp ./* root@192.168.57.70:/opt/fi_tools scp -r /opt/fi_install root@192.168.57.70:/opt/
- After the command is executed, log in to the new node and run the following command to check whether the files are successfully copied.
cd /opt/fi_tools && ll
- Execute the disk partition script on the new node as the root user.
cd /opt/fi_tools/ && sh create_vol.sh "/dev/xxx"
In the preceding command, xxx indicates the data disk configured for FusionInsight with a size greater than 1.8 TB, for example, vdb. The data disk must be a new disk without partitions configured.
- Run the df -h command to check whether the disk partitions are created successfully. For example, if the command output contains the five partitions marked in the red box shown in the following figure, the disk partitions are created successfully.
If multiple nodes need to be installed, create disk partitions for all of them.
- Run the following command on the new node to create the /opt/fi_tools directory.
- Log in to the new node as the root user and add the mappings between host names and service IP addresses of all nodes (including the new node) in the FusionInsight cluster to the /etc/hosts file.
You can view the mapping between hostnames and service IP addresses in the /etc/hosts file on a normal node.
After the configuration is complete, run the hostname -i command. If the service distribution plane IP address is displayed in the command output, the configuration is successful.
- Log in to the new node and execute the preset script.
cd /opt/fi_tools/ tar -zxvf FusionInsight_SetupTool_x.x.x.tar.gz //x.x.x indicates the FusionInsight version number. cd /opt/fi_tools/FusionInsight_SetupTool/preset/ sh preset.sh
The following information indicates that the script is executed successfully.
- Restart SSH service.
sed -i '/ssh_host_ecdsa_key/s/^/#&/' /etc/ssh/sshd_config && service sshd restart
Installing the New FusionInsight Data Node
- Log in to FusionInsight Manager, choose , select the new node, and choose More > Reinstall to install the node.
If services fail to be started during reinstallation, manually restart the cluster.
- Click Finish when the message Operation Success is displayed.
- (Optional) Log in to the new node and run the following commands.
Perform this step if the version is earlier than V100R19C10SP203. For V100R19C10SP203 and later versions, the problem has been resolved in patches. Therefore, you do not need to perform this step.
sed -i 's/SPARK_HISTORY_OPTS -Djava/SPARK_HISTORY_OPTS -Djava.io.tmpdir=\/opt\/huawei\/Bigdata\/tmp\/spark2x -Djava/g'
Replacing a Management Node
Constraints and Precautions
- If FusionInsight and iMaster NCE-Campus are co-deployed, back up iMaster NCE-Campus data before replacing a FusionInsight node.
- The new node must have the same OS and OS encoding scheme as the faulty node.
- The new node must have the same disk names and disk size as well as the same OS partition names and partition size as the faulty node.
- The new node must have the same number of NICs, NIC names, and IP addresses as the faulty node.
- The new node must have the same host name as the faulty node.
- The new node must be configured with the same time zone and time as the faulty node.
Check Before Node Replacement
- Configure the permission of the omm user to start scheduled tasks on the new node.
- If only the /etc/cron.allow file is available in the system, add the omm user to this file.
- If only the /etc/cron.deny file is available in the system, delete the omm user from this file.
- If both the /etc/cron.allow and /etc/cron.deny files are available, add the omm user to the /etc/cron.allow file.
- If the /etc/cron.deny file is available in the system and the omm user is not added to this file, no operation is required.
Preparations for Node Replacement
- Install a new node. For detail, see FusionInsight Product Documentation.
- Log in to the new node as the root user.
- (Optional) Expand the disk space of the /opt directory.
This step is mandatory if FusionInsight and iMaster NCE-Campus are co-deployed. Before replacing a node, you must expand the disk space of the /opt directory.
- VM
- Obtain the disk_expand.sh script in the EasySuite installation directory \tools\osconfig\maintain_tools directory.
- Upload the disk_expand.sh script to the /opt directory of the VM.
- Log in to the OS as the root user, go to the /opt directory, and run the following commands to expand the disk capacity:
# cd /opt # bash disk_expand.sh
- Run the df -h command to view the size of the /opt partition and check whether the capacity expansion is successful.
If the size of the /opt partition increases, the capacity expansion is successful.
- Physical Machine
- Obtain the lvm.sh script in the EasySuite installation directory \tools\osconfig\maintain_tools directory.
- Upload the lvm.sh script to the /opt directory of the VM.
- Log in to the OS as the root user, go to the /opt directory, and run the following commands to expand the disk capacity:
# cd /opt # bash lvm.sh
- Run the df -h command to view the size of the /opt partition and check whether the capacity expansion is successful.
If the size of the /opt partition increases, the capacity expansion is successful.
- VM
- Partition the data disk on the new node. Change the permission of the /opt/fi_tools folder to 755.
- Run the following command on the new node to create the /opt/fi_tools directory.
mkdir -p /opt/fi_tools/ chmod 755 /opt/fi_tools
- Find the primary management node in the FusionInsight cluster. Log in to FusionInsight, click
- Log in to the primary management node of the iMaster NCE-Campus cluster as the sopuser user. Switch to root user, and delete the host key specified in the management IP address record of the new node from the /root/.ssh/known_hosts file.
su - rootvi /root/.ssh/known_hosts
Run the vi /root/.ssh/known_hosts command and delete the host key specified in the management IP address record of the new node.
- Log in to the primary management node of the cluster as the sopuser user. Switch to root user, and copy all files in /opt/fi_tools of the primary management node to /opt/fi_tools and /opt of the new node. The IP address specified in the following command is the IP address of the new node.
su - root cd /opt/fi_tools scp ./* root@192.168.57.70:/opt/fi_tools scp -r /opt/fi_install root@192.168.57.70:/opt/
- After the command is executed, log in to the new node and run the following command to check whether the files are successfully copied.
cd /opt/fi_tools && ll
- Execute the disk partition script on the new node as the root user.
cd /opt/fi_tools/ && sh create_vol.sh "/dev/xxx"
In the preceding command, xxx indicates the data disk configured for FusionInsight with a size greater than 1.8 TB, for example, vdb. The data disk must be a new disk without partitions configured.
- Run the df -h command to check whether the disk partitions are created successfully. For example, if the command output contains the five partitions marked in the red box shown in the following figure, the disk partitions are created successfully.
If multiple nodes need to be installed, create disk partitions for all of them.
- Run the following command on the new node to create the /opt/fi_tools directory.
- Log in to the new node as the root user and add the mappings between host names and service IP addresses of all nodes (including the new node) in the FusionInsight cluster to the /etc/hosts file.
You can view the mapping between hostnames and service IP addresses in the /etc/hosts file on a normal node.
After the configuration is complete, run the hostname -i command. If the service distribution plane IP address is displayed in the command output, the configuration is successful.
- Log in to the new node and execute the preset script.
cd /opt/fi_tools/ tar -zxvf FusionInsight_SetupTool_x.x.x.tar.gz //x.x.x indicates the FusionInsight version number. cd /opt/fi_tools/FusionInsight_SetupTool/preset/ sh preset.sh
The following information indicates that the script is executed successfully.
- Restart SSH service.
sed -i '/ssh_host_ecdsa_key/s/^/#&/' /etc/ssh/sshd_config && service sshd restart
Reinstalling FusionInsight Manager
- Log in to the another normal management node of the iMaster NCE-Campus cluster as the sopuser user. Switch to root user
su - root
- Perform the following operations.
vi /root/.ssh/known_hosts
Delete the host key specified in the management IP address record of the new node.
vi /opt/huawei/Bigdata/om-server/om/etc/om/known_hosts
Delete the host key specified in the management IP address record of the normal node.
- Restart the controller service.
su - omm -c "sh /opt/huawei/Bigdata/om-server/om/sbin/restart-controller.sh"
- Go to /opt/fi_install/FusionInsight_Manager/, and copy the directory to the new node as the root user.
cd /opt/fi_install/FusionInsight_Manager/ scp -r /opt/fi_install/FusionInsight_Manager/ root@192.168.103.213:/opt/fi_tools
After the directory is copied successfully, run the following command to change the permission on the /opt/fi_tools/FusionInsight_Manager folder and files in the folder to 755.
chmod -R 755 /opt/fi_tools/FusionInsight_Manager
- Log in to the new node as the root user, go to /opt/fi_tools/FusionInsight_Manager/software, and modify the install.ini file based on the cluster information.
- Change the value of local_ip1 to the IP address of the new node.
- Change the value of peer_ip1 to the IP address of the primary management node.
If an external NTP server is configured for FusionInsight, set ntp_server_ip in the install.ini file to the IP address of the external NTP server. Otherwise, you do not need to set ntp_server_ip.
cd /opt/fi_tools/FusionInsight_Manager/software vi install.ini
- Log in to the new node, perform the preset operation, and run the following commands to install FusionInsight Manager:
sh /opt/fi_tools/FusionInsight_SetupTool/preset/preset.sh cd /opt/fi_tools/FusionInsight_Manager/software/ sh install.sh -f install.ini
Enter y twice.
If the message "failed to config and start nodeagent" is displayed, switch to the omm user and run the following commands to restart the nodeagent process. If the message "The node agent is running" is displayed, the installation is successful. Otherwise, the installation fails.su - omm sh /opt/huawei/Bigdata/om-agent/nodeagent/bin/stop-agent.sh sh /opt/huawei/Bigdata/om-agent/nodeagent/bin/start-agent.sh sh /opt/huawei/Bigdata/om-agent/nodeagent/bin/status-agent.sh | grep 'The node agent is running'
- Log in to the primary management node and the new node as the omm user, and run the ls -l $BIGDATA_HOME/common command to check whether the runtime soft links point to the same targets.
For details about how to obtain the information and initial passwords of the omm and ommdba users, see FusionInsight HD Product Documentation.
- If so, go to 8.
- If not, switch the common workspace of the new node to make it consistent with that of the primary management node. Then, go to 9.
To switch the common workspace, log in to the new node as the root user and go to the directory where the workspace switching script is located. Run the script to switch to the desired workspace (target to which the runtime soft link of the primary management node points, for example, runtime0 or runtime1).cd /opt/fi_tools/FusionInsight_Manager/software/om/script/ ./switchCommonWorkspace.sh Target workspace
- Log in to the new node as the omm user, and run the following commands to delete the default user data and synchronize Lightweight Directory Access Protocol (LDAP) data from the primary management node:
rm -f ${BIGDATA_DATA_HOME}/ldapData/oldap/data/* cp ${CONTROLLER_HOME}/ldapserver/ldapserver/local/conf/DB_CONFIG ${BIGDATA_DATA_HOME}/ldapData/oldap/data/
- Log in to the new node as the omm user, query the PID of the oldap process, and stop the process. Then the system automatically starts the process.
ps -ef |grep ldap |grep om-server kill -2 PID
- Log in to the new node as the omm user, query the PIDs of two kerberos processes krb5kdc and kadmind, and stop the processes. Then, the system automatically starts the two processes.
If the processes do not exist, they are being restarted. In this case, skip this step.
ps -ef | grep krb5kdc | grep om-server kill -9 PID ps -ef | grep kadmind | grep om-server kill -9 PID
- Log in to the primary management node as the omm user, and run the following command to check the GaussDB status on the primary and secondary management nodes. If the resources are abnormal, wait for 1 to 3 minutes.
sh ${CONTROLLER_HOME}/sbin/status-oms.sh
- If the database password is different from the default password before you replace the secondary management node, perform the following operations:
Log in to the secondary management node as the omm user. Run the following command to stop the management node:
bash ${BIGDATA_HOME}/om-server/om/sbin/stop-oms.sh
- Log in to the secondary management node as the omm user, switch to the root user, and switch to the ommdba user, and run the following commands to synchronize data between the databases on the primary and secondary management nodes:
su - root su - ommdba gs_ctl build
If the following information is displayed, the data is synchronized successfully:
ommdba@192-168-64-154:~> gs_ctl build waiting for server to shut down.... done server stopped gs_ctl: connect to server, build started. xlog start point: 1/49000020 gs_ctl: starting background WAL receiver 1525133/1525133 kB (100%), 1/1 tablespace xlog end point: 1/4906B908 gs_ctl: waiting for background process to finish streaming... gs_ctl: build completed. server starting.... done server started
- Log in to the secondary management node as the omm user. Run the following command to start the management node.
bash ${BIGDATA_HOME}/om-server/om/sbin/start-oms.sh
- Log in to the secondary management node as the omm user, switch to the root user, and switch to the ommdba user, and check whether database data is synchronized between the primary and secondary management nodes.
su - root su - ommdba gs_ctl query -P Password of the management node database administrator
The default username and password are available in iMaster NCE-Campus Default Usernames and Passwords (Enterprise Network or Carrier). If you have not obtained the access permission of the document, see Help on the website to find out how to obtain it.
If the value of SYNC_PERCENT is 99% or 100%, data synchronization is complete.
- Synchronize files between the primary and secondary management nodes. Log in to the primary management node as the omm user, and run the following commands to check whether file synchronization between the primary and secondary management nodes is complete:
cd ${OMS_RUN_PATH}/workspace0/ha/module/hacom/tools ./ha_client_tool --syncallfile ./ha_client_tool --getsyncfilestatus
If the following information is displayed, the files are synchronized successfully:
- Write data to a script to disable active/standby switchover.
The following operations use the current version as an example. For other versions, operations may be different.
Log in to the active OMS node as user omm and run the following command to create the forbid.txt file:
touch ${OMS_RUN_PATH}/workspace0/ha/local/haarb/conf/forbid.txt
Write data to the forbid.txt file.
Write the current time in the first line.
echo $(date +%s) > ${OMS_RUN_PATH}/workspace0/ha/local/haarb/conf/forbid.txt
Write 72000 to the second line.
echo 72000 >> ${OMS_RUN_PATH}/workspace0/ha/local/haarb/conf/forbid.txt
Restart the HA process to load the forbid.txt file.
Stop and restart the HA process.
sh ${OMS_RUN_PATH}/workspace0/ha/module/hacom/script/stop_ha.sh sh ${OMS_RUN_PATH}/workspace0/ha/module/hacom/script/config_ha.sh -j active sh ${OMS_RUN_PATH}/workspace0/ha/module/hacom/script/start_ha.sh
Wait for about 2 to 3 minutes. The HA process restart is complete.
- (Optional) Log in to the primary FusionInsight node as the omm user and check whether the patch package and verification file exist.
For an ARM-based server: Check whether the files exist in the /opt/huawei/Bigdata/packaged-distributables/patch/redhat-aarch64 directory. If the patch package and verification file do not exist, copy them from the /opt/huawei/Bigdata/packaged-distributables/patch/aarch64 directory.
For an x86-based server: Check whether the files exist in the /opt/huawei/Bigdata/packaged-distributables/patch/redhat-x86_64 directory. If the patch package and verification file do not exist, copy them from the /opt/huawei/Bigdata/packaged-distributables/patch/x86_64 directory.
- Install the new node.
Log in to FusionInsight Manager and choose
.Select the new node and choose More > Reinstall.
If services fail to be started during reinstallation, manually restart the cluster.
- Log in to the new node as the omm user, and run the following commands to delete the default user data and synchronize LDAP data from the primary management node:
rm -rf ${BIGDATA_DATA_HOME}/ldapData/oldap/data/* cp ${BIGDATA_HOME}/om-server/om/ldapserver/ldapserver/local/conf/DB_CONFIG ${BIGDATA_DATA_HOME}/ldapData/oldap/data
- Run the following command to check whether user passwords on the new node are short ciphertext:
vi ${CONTROLLER_HOME}/ldapserver/ldapserver/local/cert/password.property
- Example of a short ciphertext:
password=90E173DD8BB8939CBF672548418D6B4F
- Example of a long ciphertext:
password=d2NjX2NyeXB0ATQxNDU1MzVGNDM0MjQzOzMyMzQ0MjQ0Mzg0MTM4MzEzNTQxMzUzNzQxMzAzMjMxMzMzNzM5NDM0MTM0Mzk0N
jM3MzQ0NDQzNDEzMTM5Mzg7OzMyMzUzMDMwOzg3NUY4RjRBMDk5QzUwOTdFOTlCMTJCMTM4OTQxNTUxOzdCNFBNzVFNThBM0IwNjA7MzY
zODM3MzgzODY0NjYzOTJENjU2NDY0NjUyRDM0MzkzMzY2MkQzOTMwNjMzODJEMzAzODY2MzUzMDYxMzY2NDM2MzUzNTMwOw
If user passwords are short ciphertext, run the following commands to delete the password.changed file from the new node. Otherwise, no action is required.
rm -f ${CONTROLLER_HOME}/security/cert/subcert/certFile/password.changed rm -f ${CONTROLLER_HOME}/ldapserver/ldapserver/local/cert/password.changed
- Example of a short ciphertext:
- Query the PID of the oldap process and stop the process. Then the system automatically starts the process.
ps -ef |grep ldap |grep om-server_6.5.1/om kill -2 PID
- Log in to the new node. Run the following commands to query the PIDs of two kerberos processes krb5kdc and kadmind, and stop the processes. Then, the system automatically starts the two processes.
If the processes do not exist, they are being restarted. In this case, skip this step.
ps -ef | grep kerberos_user_specific_binay/kerberos/sbin/krb5kdc | grep -v grep kill -9 PID ps -ef | grep kerberos_user_specific_binay/kerberos/sbin/kadmin | grep -v grep kill -9 PID
- Log in to the primary management node as the omm user, and run the following commands to enable active/standby switchover:
cd ${OMS_RUN_PATH}/workspace/ha/module/hacom/tools/ ./ha_client_tool --cancelforbidswitch --name=product
Verifying the Replacement
- Log in to FusionInsight Manager. If you can log in successfully and information is correctly displayed, the management nodes are running properly.
- Click Host. Check whether the newly added host is in Normal state and whether the CPU, memory, and disk information is correct.
- Click Cluster. On the Host and Service pages, check whether instances and services are in normal state.
- If a management two-node cluster is installed, choose to check for the alarm "ALM-12010 Manager Heartbeat Communication Between the Active and Standby Nodes Interrupted". If the alarm does not exist, the primary and secondary management nodes can communicate with each other properly.
- Log in to the primary management node as the root user, run the su - omm command to switch to the omm user, and run the following script to check whether the ResHAStatus resource is in Normal state on both the primary and secondary management nodes:
su - omm sh ${BIGDATA_HOME}/om-server/om/sbin/status-oms.sh
- Data can be accessed and active/standby switchover can be performed after data is synchronized between the primary and secondary management nodes. Within several minutes (depending on the data volume to be synchronized in the database) after the secondary management node is replaced, the database on the secondary management node may be in Repairing state, which means that database data is being synchronized. After data synchronization is complete, the database transitions to Standby_normal state.
Replacing a iMaster NCE-Campus Node
Context
If a node in an iMaster NCE-Campus cluster is faulty, you can replace the node.
Constraints and Precautions
- Ensure that the power supply and network are stable during node replacement.
- Ensure network connectivity during node replacement.
- The new node must have the same operating system (OS) and OS encoding scheme as the faulty node.
- The new node must have the same disk names and disk size as well as the same OS partition names and partition size as the faulty node.
- The new node must have the same number of NICs and the same NIC names as the faulty node.
- The new node must have the same IP addresses and host name as the faulty node.
- The dependency package *.ICMR.zip required during node replacement must be the same as that used when the faulty node was installed.
- Node replacement can be performed only if one node fails and is not supported if multiple nodes fail at the same time.
- The new node must be configured with the same time zone and time as the faulty node.
- If the faulty node runs both iMaster NCE-Campus and FusionInsight, restore FusionInsight data before iMaster NCE-Campus data on the new node.
- Before backing up data, ensure that a minimum of 500 GB disk space is available on the backup server.
- Ensure that the management plane and database services of at least one iMaster NCE-Campus node are running properly.
- The configuration and data that can be modified cannot be stored on the local host for any service.
- Currently, only one faulty node can be recovered at a time.
- The reinstallation script file cannot be executed on non-management nodes. You must run the reinstallation script file on a management node. The primary management node is recommended.
- Obtain the passwords for all non-root users on the faulty node in advance. After the faulty node is recovered, change the passwords for non-root users to be the same as those on other iMaster NCE-Campus.
Prerequisites
A backup server has been configured and can communicate with nodes in the cluster.
Procedure
- Install a new node.
- Configure RAID Groups for the 2288H V5 or TaiShan 2280.
- VM: For details, see "iMaster NCE-Campus Installation (OS+Product, 2288H V5, FusionCompute)" or "iMaster NCE-Campus Installation (OS+Product, TaiShan, FusionCompute)" in the Software Installation Guide.
- Physical Machine: For details, see "iMaster NCE-Campus Installation (OS+Product, 2288H V5, Physical Machine)" or "iMaster NCE-Campus Installation (OS+Product, TaiShan, Physical Machine)" in the Software Installation Guide.
- Install an OS on the new node.
- VM: For details, see "iMaster NCE-Campus Installation (OS+Product, 2288H V5, FusionCompute)" or "iMaster NCE-Campus Installation (OS+Product, TaiShan, FusionCompute)" in the Software Installation Guide.
- Physical Machine: For details, see "iMaster NCE-Campus Installation (OS+Product, 2288H V5, Physical Machine)" or "iMaster NCE-Campus Installation (OS+Product, TaiShan, Physical Machine)" in the Software Installation Guide.
- The new node must have the same OS and OS encoding scheme as the faulty node.
- The new node must have the same disk names and disk size as well as the same OS partition names and partition size as the faulty node.
- (Optional) Expand the capacity of /opt partition.
This step is mandatory is iMaster NCE-Campus and FusionInsight are co-deployed. Before replacing a FusionInsight node, expand the disk space of the /opt directory.
- VM
- Obtain the disk_expand.sh script in the EasySuite installation directory \tools\osconfig\maintain_tools directory.
- Upload the disk_expand.sh script to the /opt directory of the VM.
- Log in to the OS as the root user, go to the /opt directory, and run the following commands to expand the disk capacity:
# cd /opt # bash disk_expand.sh
- Run the df -h command to view the size of the /opt partition and check whether the capacity expansion is successful.
If the size of the /opt partition increases, the capacity expansion is successful.
- Physical Machine
- Obtain the lvm.sh script in the EasySuite installation directory \tools\osconfig\maintain_tools directory.
- Upload the lvm.sh script to the /opt directory of the VM.
- Log in to the OS as the root user, go to the /opt directory, and run the following commands to expand the disk capacity:
# cd /opt # bash lvm.sh
- Run the df -h command to view the size of the /opt partition and check whether the capacity expansion is successful.
If the size of the /opt partition increases, the capacity expansion is successful.
- VM
- After the OS is installed on the new node, configure network information.
- The new node must have the same number of NICs and the same NIC names as the faulty node.
- The new node must have the same IP addresses and host name as the faulty node.
- Configure RAID Groups for the 2288H V5 or TaiShan 2280.
- Restore data on the new node.
- Pre-process the new node.
- (Optional) Log in to the new node as the sopuser user and install the dependency package. Switch to the ossadm user.
If you prepare a server by yourself, install the required dependency packages. For details, see "iMaster NCE-Campus Installation (OS+Product, VMWare)" in the Software Installation Guide.
- Obtain the NCEV100R019C00_ICMR_linux_x64.zip or NCEV100R019C00_ICMR_linux_arm.zip package from the iMaster NCE-Campus installation package, and then upload the NCEV100R019C00_ICMR_linux_x64.zip or NCEV100R019C00_ICMR_linux_arm.zip package to the /opt directory on the new node.
- (Optional) Log in to the new node as the sopuser user and install the dependency package. Switch to the ossadm user.
- Log in to a normal management node as the sopuser user, and switch to the ossadm user.
- Go to the following directory:
su ossadm cd /opt/tools/recoverNode/
- Run the following command to execute the reinstallation script file on the faulty node.
bash oneButtonRepairNode.sh
- Enter the internal communication IP address of the new node.
2019-10-26 20:31:59| Please enter the fault node manageIP
- Enter the password of the root user of the new node.
Please enter the 192.168.62.82 root password :
- If the following information is displayed, the new node is reinstalled successfully:
20:33 || INFO || Repair node 192.168.62.82 successfully
- Change the passwords for all non-root users on the new node.
The passwords for all non-root users on the new node must be the same as those on the faulty node.
Enter a new password as prompted to change the password for a non-root user. Press q or Q to exit password change for the current user and modify the password for another non-root user as prompted.
- Go to the following directory:
- (Optional) Log in to the new node as the root user and run the following commands to modify parameters and restart the NTP service:
Run the cat /usr/lib/systemd/system/ntpd.service |grep -v "^#"|grep "ntp:ntp" command. If the query result is empty, run the following commands. Otherwise, you do not need to run the following commands.
sed -i 's/log -f/log -u ntp:ntp -f/g' /etc/sysconfig/ntpd service ntpd restart
- Log in to the new node as the root user, create the /var/share-disk directory, set the folder permission to 755, and change the owner and owner group of the folder.
mkdir -p /var/share-disk chmod 755 /var/share-disk chown -R ossadm:ossgroup /var/share-disk
- Log in to the OS of any normal node and check the IP address of ethx:on. Access the management plane at https://ethx:on:18102.
- Verify the restoration. Wait for about one minute, and then log in to the management plane. Choose
If a service is not running, select the abnormal node and click Start. Wait for several minutes until the task is complete. If the fault persists, contact technical support engineers.
to check service status. If the connection, database, and service status of all nodes is normal, the node replacement is successful.
- Pre-process the new node.
- Log in to the normal OMP_01 or OMP_02 node as the sopuser user, switch to the root user, and run the following commands:
su root scp -r /opt/sudobin/campus/manager sopuser@IP address of the faulty node:/tmp scp -r /etc/crontab sopuser@IP address of the faulty node:/tmp
Log in to the faulty node as the sopuser user, switch to the root user, and run the following commands:
su root mv /tmp/manager /opt/sudobin/campus/ mv /tmp/crontab /etc chown -R root:root /opt/sudobin/campus/manager chown -R root:root /etc/crontab service cron reload swapoff -a sed -i 's,^[^#].*swap,#&,g' /etc/fstab
- Perform security hardening on the new node as required. Log in to the new node as the root user and run the following commands to perform security hardening:
# cd /opt/SEK # bash RunSEK.sh
Replacing an iMaster NCE-Campus Node in a Geographic Redundancy System
Context
Before replacing a faulty node in the geographic redundancy scenario, you need to remove the geographic redundancy relationship between the primary and secondary iMaster NCE-Campus clusters. After the faulty node is replaced, you need to re-establish a geographic redundancy relationship between the primary and secondary clusters.
Procedure
- Remove the geographic redundancy relationship between the primary and secondary iMaster NCE-Campus clusters. For details, see Separating the Primary and Secondary Site Products.
- Replace the faulty node. For details, see Replacing a iMaster NCE-Campus Node.
After replacing a node in a DR system, re-configure mutual trust between the primary and secondary clusters.
- Re-establish a geographic redundancy relationship between the primary and secondary iMaster NCE-Campus clusters. For details, see Connecting the Primary and Secondary Site Products.