No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

The Cluster Service Fails to Start on Windows 2003

Publication Date:  2015-06-19 Views:  47 Downloads:  0
Issue Description
Hardware configuration:
An X6000
Software configuration
Windows 2003

Symptom:

After the server is restarted, the cluster service cannot restart, and a failover cannot occur for the database clusters that rely on the cluster service when a fault occurs. The event logs show that the cluster service fails to connect to domain China, as shown in Figure 1 and Figure 2.

Figure 1 Event log for the cluster service startup failure



Figure 2 Event log for the failure in connecting to domain China


Handling Process
1.  Create the database clusters again and check configuration parameters. The problem persists.

2.  According to the event log shown in Figure 3, the NETLOGON service cannot find a domain controller in domain China, and an event log with the ID 5719 is generated.

Figure 3 Event log for the failure in finding a domain controller in domain China



3.  Manually start the cluster service. The service is started successfully.

4.  On the Recovery tab, set the cluster service properties to Restart the Service, as shown in Figure 4. Restart the server. The cluster service successfully restarts.

Figure 4 Setting cluster service properties



5.  In conclusion, initializing network components takes a long period of time. As a result, services that rely on the domain account cannot properly start during the server system startup. Network component initialization includes NIC and network parameters and external network settings, which are closely associated with hardware and the network environment.

6.   Uninstall the NIC driver and install a new driver. The problem persists.

7.  Set carrier down-hold-time to 2000 ms and the port down latency to 2s for the S5328 switch that is connected to the X6000. The switch cannot quickly respond to port up and down changes. The problem persists. By default, the port up latency is 2000 ms and down latency is 0 ms. Set carrier down-hold-time to 3000 ms, as shown in Figure 5. The problem is solved. The cluster service properly starts.

Figure 5 Setting carrier down-hold-time to 3000 ms



8.  Microsoft confirms that the delay start function can be set for a service on Windows 2008 and later products. Therefore, the cluster service can be set to start after all services are prepared to solve the startup failure, as shown in Figure 6.

Figure 6 Setting the cluster service to start later

Root Cause
During OS startup, the port status is down-up-down-up. The OS is started before the NIC is started. When the states of ports on the NIC frequently changes between the up and down states, if the down event is triggered and not recovered in Down Hold Time, the network status is not ready for startup, and the OS services cannot connect to the active directory (AD) for authentication. As a result, the NETLOGON service fails to start, and the cluster service cannot start.

Solution
  • (Recommended) Refer to process 4.
  • Start the startup script, set the cluster service to start after all services are prepared. This prevents an error event log from being generated for a cluster service start failure.
  • Set the cluster service to start later on Windows 2008 and later.
  • Set carrier down-hold-time to 3000 ms for the switch that is connected to the server service planes.
Suggestions
Select the first or second solution for Windows 2003 and the third solution for Windows 2008.

END