eSight V300R010C00SPC200, 300, and 500 Maintenance Guide 20

Veritas HA System FAQs

Veritas HA System FAQs

What Can I Do If the Active/Standby Switchover Fails Due to Misoperations

Question

During the switchover between the active and standby servers, if you perform incorrect operations (for example, bringing resources online) on the Veritas, the error message "Unable to switch group AppService. Group is in the middle of a remote operation" is displayed, and the switchover fails.

Answer

  1. Log in to the server as the root user.
  2. Run the following command to clear all previous operations:

    # hagrp -flush AppService -sys $(hostname)

  3. Run the following command to switch over the active and standby servers again:

    # hagrp -switch AppService -any -clus Host name of the standby server -localclus Host name of the active server

What Do I Do When eSight Does Not Receive Alarms from Managed Servers After Remote Two-Node Cluster Switchover

Question

What do I do when eSight does not receive alarms from managed servers after remote two-node cluster switchover?

Answer

  1. Log in to eSight.
  2. Choose Resource > Server > Server Device.
  3. Select a managed server and click in the Operations column.

What Do I Do When Software Sources Do Not Exist for Operating System Deployment Tasks After Two-Node Cluster Switchover

Question

What do I do when software sources do not exist for operating system deployment tasks after two-node cluster switchover?

Answer

  1. Log in to eSight.
  2. Choose Resource > Server > Configuration&Deployment.
  3. Click Software Source Management in the navigation tree on the left.
  4. Click Add and upload an operating system again.
  5. Use the loaded software source to deploy the operating system for the server again.

What Do I Do When the DHCP Server IP Address Range Is Restored to the Default Value After Remote Two-Node Cluster Switchover

Question

What do I do when the DHCP server IP address range is restored to the default value after remote two-node cluster switchover?

Answer

  1. Log in to eSight.
  2. Choose Resource > Server > Service Settings.
  3. Click DHCP Service in the navigation tree on the left.
  4. Set the DHCP server IP address range.
  5. Click Apply.

What Do I Do When the Stateless Computing Device Activation Progress Is Not Updated After Remote Two-Node Cluster Switchover

Question

What do I do when the stateless computing device activation progress is not updated after remote two-node cluster switchover?

Answer

  1. Log in to eSight.
  2. Choose Resource > Server > Stateless Computing.
  3. Choose Domain Management > Domain.
  4. Select a device that is being activated, and click Configuration in the Operation column.
  5. On the Configuration page, remove the device that is being activated, add the device, associate it to a profile, and activate it again.

How Do I Do If eSight Cannot Normally Provide the Location Function After an Active/Standby Switchover in a Remote Two-Node Cluster

Answer

After an active/standby switchover between eSight servers, you need to modify the eSight location template on the AC or AP to change the destination IP address in the template to the heartbeat IP address of the active eSight server. For detailed commands, see the related product documentation.

Failed to Log In to eSight When the NMSServer Resource Is Online

Symptom

Run the hares -state NMSServer -sys $(hostname) command as the root user. The result is online, but eSight cannot be accessed.

Possible Causes

If the eSight process stops unexpectedly, the system automatically restarts the eSight process. In this case, if you run the hares -state NMSServer -sys $(hostname) command as the root user, the result is online.

Procedure

  1. Wait until the system successfully restarts the eSight process.

    • If the restart succeeds, you can log in to eSight of the active server.
    • If the restart fails, a switchover is automatically triggered. Then you can log in to eSight of the standby server.

    When the eSight process is faulty, the system automatically attempts to restart the process for a maximum of three times.

Why Is the Veritas HA System Not Automatically Switched Over After the Management Port Is Forcibly Powered Off

Question

Why is the Veritas HA system not automatically switched over after the management port is forcibly powered off?

Answer

The implementation mode of the management port software varies depending on the server. Whether active/standby switchover is triggered for a server when the server is forcibly powered off depends on whether the server operating system sends the normal shutdown signal. If the server operating system does not send a normal shutdown signal, active/standby switchover is triggered. If the server operating system sends a normal shutdown signal, active/standby switchover is not triggered.

  • To verify the automatic switchover capability in the case of abnormal power-off of the active server, use the following methods to solve the problem:
    Method 1: Manually remove the power supply cable of the active server.

    Software and hardware may be damaged when the power supply cable is forcibly removed. Therefore, method 1 is not recommended.

    Method 2: Disable the heartbeat network adapter on the active server.

    In a local HA environment where this scenario is simulated, the standby server automatically switches over. However, the floating IP address of the active server still exists. As a result, the standby server cannot be started.

    The active and standby servers communicate with each other through the heartbeat network. If the heartbeat network adapter on the active server is disabled, the standby server cannot detect the status of the active server. In this case, you can simulate the abnormal power failure of the active server.

    1. Log in to the active server as the root user, and run the following command:

      # ifdown bond1

    2. (Optional) Run the following command to enable the heartbeat network adapter after the verification:

      # ifup bond1

  • To start the standby server, run the following command as the root user:

    After the standby server is started, the HA system works in active-active mode.

    # hagrp -online -force AppService -sys <Host name of the standby server>

How Do I Do If Heartbeat IP Connection Between a Remote Server and the Local Two-Node Cluster Is Interrupted

Answer

The following uses bond1 as an example to describe how to check a heartbeat connection.

  1. Log in to the active and standby hosts as the root user and run the following command to check the network cable connection.

    # ethtool bond1

    If the following information is displayed, the cables are correctly connected. If the following information is not displayed, connect the network cables correctly.

    Settings for bond1:  Link detected: yes
  2. Run the following command on the active and standby servers as the root user to check whether the heartbeat network port is enabled:

    # ifconfig | grep bond1

    If no information is displayed, the heartbeat network port is disabled.

    Run the ifconfig bond1 up command. If the following information is displayed, the heartbeat network port is enabled.

     bond1     Link encap:Ethernet  HWaddr A4:DC:BE:1E:5D:F8
  3. Check whether the heartbeat IP addresses of the active and standby hosts are on the same network segment.

    Configure heartbeat IP addresses on the same network segment for the active and standby hosts.

What Can I Do If the VVRService Resource Group of the Veritas HA System Is Faulty and the Replication Enters the passthru State

Question

What can I do if the VVRService resource group of the Veritas HA system is faulty and the replication enters the passthru state?

Symptom

  1. The VVRService resource group is faulty.
  2. When the following command is executed on the active server as the root user to query the RVG resource status, the RVG resource is in the passthru state:

    # vradmin -g datadg repstatus datarvg | grep "RVG state"

    The query result is as follows:

    RVG state:                  enabled for I/O (passthru)

    Or

    RVG state:                  disabled for I/O (passthru)

Prerequisites

The heartbeat IP addresses of the active and standby servers can communicate with each other.

Answer

  1. Log in to the active server as the ossuser user.
  2. Run the following commands:

    cd /opt/eSight/mttools/tools

    ./force_primary.sh

  3. If the message "force primary finish." is displayed, the execution is successful and the replication status is restored.

What Can I Do If the Active and Standby Servers of the Veritas System Failed to Be Connected Because the lvdbdata Partition Sizes Are Different

Question

An error message is reported when the maintenance tool is used to connect the active and standby servers, as shown in Figure 3-9. After I log in to the two servers and check the lvdbdata partition sizes, it is found that the sizes are different. What can I do?

# hares -online mountDBRes -sys Server host name

# df -h | grep /dev/vx/dsk/datadg/lvdbdata

/dev/vx/dsk/datadg/lvdbdata    345G  9.1G  318G    3% /opt/eSightData
Figure 3-9 Error message reported by the maintenance tool

Answer

  1. Check the lvdbdata partition on the active and standby servers and record the server whose lvdbdata partition size is smaller.

    1. Log in to either server as the root user.
    2. Run the following command to query the lvdbdata partition size and record the size:
      # fdisk -l 2>/dev/null | grep Disk | grep /dev | grep -i VxVM | awk '{print $2 $3}'|head -n 1
      /dev/VxVM27000: 350GiB, 375809638400
    3. Repeat 1.a and 1.b to check the lvdbdata partition size on the other server and record the IP address of the server with a smaller partition size.

  2. Based on the information in Step 1, expand the lvdbdata partition of the server with a smaller partition size to be the same as that on the other server.

    1. Log in to the server with a smaller lvdbdata partition size as the root user.
    2. Bring the resource group AppService offline.

      # hagrp -offline AppService -sys `hostname`

      Run the following command to query the AppService resource status. Perform subsequent operations after ensuring that the resource status is displayed as OFFLINE:

      # hagrp -state AppService -sys Server host name

    3. Run the following command to expand the lvdbdata partition:

      # /etc/vx/bin/vxresize -f -F ext3 -g datadg lvdbdata XXXg

      Ensure that the lvdbdata partition sizes on the active and standby servers are the same. In the preceding command, the value of XXX indicates the larger size of the lvdbdata partition.

    4. Bring the resource online.

      # hares-online mountDBRes -sys `hostname`

  3. Run the following command on the active and standby servers to check whether the partition sizes are the same:

    # fdisk -l 2>/dev/null | grep Disk | grep /dev | grep -i VxVM | awk '{print $2 $3}'|head -n 1

    /dev/VxVM27000: 350GiB, 375809638400

  4. Connect the active and standby servers again.

What Should I Do if a datarvg Exception Occurs and Cannot Be Brought Online During Force Active

Symptom

When force active was performed for an HA system in dual-active state, the datarvg resource on the active server became faulty, and the AppService resource group automatically went offline. After the datarvg fault is rectified, the datarvg resource cannot go online.

Possible Causes

During force active, the datarvg resource is automatically created for the active and standby servers. The creation will fail when an exception occurs.

Procedure

  1. Log in to the faulty node as the root user.
  2. Run the following command to adjust the monitoring parameters of datarvg:

    haconf -makerw

    hatype -modify RVG MonitorInterval 60

    hatype -modify RVG ToleranceLimit 10

    haconf -dump -makero

  3. Run the following command to rectify the datarvg fault:

    hares -clear datarvg

  4. Run the following command to check the datarvg status:

    vxprint -g datadg -Vl datarvg

    The command output is similar to the following information:
    Rvg:      datarvg 
    info:     rid=0.1140 version=0 rvg_version=45 last_tag=2 
    state:    state=ACTIVE kernel=ENABLED 
    assoc:    datavols=lvdbdata,lvfiledata 
              srl=srl 
              rlinks=(none) 
              exports=(none) 
              vsets=(none) 
    att:      rlinks=(none) 
    flags:    closed primary enabled attached logging 
    device:   minor=9003 bdev=199/9003 cdev=199/9003 path=/dev/vx/dsk/datadg/datarvg 
    perms:    user=root group=root mode=0600

    According to the information, the datavols and srl configurations of datarvg are normal. You can bring the datarvg resource and AppService resource group online and continue force active.

    If datarvg is not found or datavols=(none) or srl=(none) is displayed, perform the following steps to restore the datarvg resource:

  5. Create the datarvg resource again.

    1. Run the following commands in sequence to clear the datarvg resource:

      vxrlink -g datadg -f det datarlk

      vxrlink -g datadg -f dis datarlk

      vxvol -g datadg -f dis lvdbdata

      vxvol -g datadg -f dis lvfiledata

      vxvol -g datadg -f dis srl

      vxedit -g datadg -rf rm datarlk

      vxedit -g datadg -rf rm datarvg

    2. Run the following commands in sequence to create the datarvg resource:

      vxmake -g datadg rvg datarvg primary=<true|false>

      vxvol -g datadg aslog datarvg srl

      vxrvg -g datadg start datarvg

      vxvol -g datadg assoc datarvg lvdbdata

      vxvol -g datadg assoc datarvg lvfiledata

      To restore datarvg on the active server, use primary=true; to restore datarvg on the standby server, use primary=false.

    If an error occurs during the creation of the datarvg resource, contact Huawei technical support.

  6. Check the datarvg status. For details, see 4.
  7. Bring the datarvg and AppService resource groups online. For details, see "Bringing a Resource Online" in Maintenance Guide.
  8. For details about force active, see "Forcibly Setting the Local Server as the Active Server" in Maintenance Guide.
  9. After forcibly setting a server as the active server, run the following command to restore the monitoring parameters of datarvg:

    haconf -makerw

    hatype -modify RVG MonitorInterval 60

    hatype -modify RVG ToleranceLimit 0

    haconf -dump -makero

Translation
Favorite
Download
Update Date:2022-08-09
Document ID:EDOC1100044373
Views:488785
Downloads:1117
Average rating:1.0Points

Digital Signature File

digtal sigature tool