No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

N8500 all nodes fall into ADMIN_WAIT status

Publication Date:  2014-09-19 Views:  66 Downloads:  0
Issue Description
Customer reported that N8500 cannot work. After checked by our engineer, we found that all nodes are in ADMIN_WAIT status. Cluster was not started.
Alarm Information
Run command "hasys -state" to check node status, all nodes are in ADMIN_WAIT status.
Handling Process
1. Verify main.cf file on all nodes(file path: %VCS_HOME%/conf/config/main.cf):

    hacf -verify .

    Note: The argument of this command is folder name not file name. Generally we go into the configuration file's folder and run this command. If there is any error in main.cf file, it will report and send some output about it. If there is no error, there will be no output.

2. After making sure all nodes' main.cf is correct, choose one node and delete(or rename) all other nodes' main.cf configuration file.

3. Power down all nodes.

4. Power on the chosen node. Now this node will read its own main.cf and try to start cluster. As the storage unit is working correctly now, cluster can be started. Use command hasys to check cluster status, you'll find that there is one node in RUNNING status.

5. Power on all other nodes. These nodes don't have main.cf file, they will try to fetch main.cf file from running node and then join cluster.

6. After all nodes are powered on, wait for a few minutes. Use command hasys to check cluster status, make sure all nodes are in RUNNING status.
Root Cause
After checking the N8500 log, we found that there is a power outage in customer's datacenter. After N8500 was restarted, as storage unit started slower than NAS engine, NAS engine cannot find fencing disks:

2014/02/14 09:18:31 VCS NOTICE V-16-1-52006 UseFence=SCSI3. Fencing is enabled
2014/02/14 09:18:31 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2014/02/14 09:18:46 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2014/02/14 09:19:01 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2014/02/14 09:19:16 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...

After re-tried for several times, NAS engine reported that it failed to find fencing disks, VCS cannot be started, all nodes went into ADMIN_WAIT status:
2014/02/14 09:19:46 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2014/02/14 09:20:01 VCS CRITICAL V-16-1-10031 VxFEN driver not configured. VCS Stopping. Manually restart VCS after configuring fencing
2014/02/14 09:20:05 VCS ERROR V-16-1-10322 System MPSEAC_01 (Node '0') changed state from LOCAL_BUILD to FAULTED
2014/02/14 09:20:05 VCS NOTICE V-16-1-10322 System MPSEAC_02 (Node '1') changed state from CURRENT_PEER_WAIT to ADMIN_WAIT
2014/02/14 09:20:05 VCS NOTICE V-16-1-10322 System MPSEAC_03 (Node '2') changed state from CURRENT_PEER_WAIT to ADMIN_WAIT
2014/02/14 09:20:05 VCS NOTICE V-16-1-10322 System MPSEAC_04 (Node '3') changed state from CURRENT_PEER_WAIT to ADMIN_WAIT
Suggestions
N8000 includes NAS engine and storage unit, VCS(Veritas Cluster System) runs above them. It's a complicated system. Please make sure to follow the correct power on/off procedure. Or you'll have problem.

Power on: FC Switch -> Storage Unit -> NAS engine
Power off: NAS engine -> Storage Unit -> FC Switch

END