No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Replication is normal, heartbeat is normal, query hastatus shows remote server LOST_CONN

Publication Date:  2012-07-25 Views:  26 Downloads:  0
Issue Description
Phenomenon description:query hastatus shows remote server LOST_CONN
----------------------------------------------------Secondary (Online-Active)…………………………………………….

root@Secondary # hastatus
attempting to connect....
attempting to connect....connected


group resource cluster:system message
--------------- -------------------- -------------------- --------------------
PrimaryCluster LOST_CONN
HB:Icmp PrimaryCluster ALIVE
localclus:Secondary LEAVING
^C
root@Secondary #


----------------------------------------------------Primary (Standby)…………………………………………….

root@Primary # hastatus
attempting to connect....
attempting to connect....connected


group resource cluster:system message
--------------- -------------------- -------------------- --------------------
HB:Icmp SecondaryCluster ALIVE
localclus:Primary RUNNING
AppService localclus:Primary OFFLINE
ClusterService localclus:Primary ONLINE
-------------------------------------------------------------------------
VVRService localclus:Primary ONLINE
NMSServer localclus:Primary OFFLINE
DataFilesystem localclus:Primary OFFLINE
RVGPrimary localclus:Primary OFFLINE
DatabaseServer localclus:Primary OFFLINE
-------------------------------------------------------------------------
BackupServer localclus:Primary OFFLINE
wac localclus:Primary ONLINE
ntfr localclus:Primary ONLINE
datarvg localclus:Primary ONLINE
^C
root@Primary #

query heartbeat and replication normal.

#vradmin datadg -g repstatus datarvg
…………………….Secondary (Active)server……………………

root@Secondary # bash
root@Secondary # cat /etc/hosts
#
# Internet host table
#
::1 localhost
127.0.0.1 localhost
10.37.192.20 Secondary loghost
root@Secondary # ls -l /etc/hostname*
-rw-r--r-- 1 root root 10 Jan 25 2011 /etc/hostname.e1000g0
root@Secondary # ping 10.10.1.80
10.10.1.80 is alive
root@Secondary # ping -s 10.10.1.80 -------------------------------------àPing to primary server
PING 10.10.1.80: 56 data bytes
64 bytes from 10.10.1.80: icmp_seq=0. time=50.3 ms
64 bytes from 10.10.1.80: icmp_seq=1. time=16.3 ms
64 bytes from 10.10.1.80: icmp_seq=2. time=3.90 ms
64 bytes from 10.10.1.80: icmp_seq=3. time=3.95 ms
64 bytes from 10.10.1.80: icmp_seq=4. time=3.85 ms
64 bytes from 10.10.1.80: icmp_seq=5. time=3.77 ms
64 bytes from 10.10.1.80: icmp_seq=6. time=142. ms
64 bytes from 10.10.1.80: icmp_seq=7. time=4.09 ms
^C
----10.10.1.80 PING Statistics----
8 packets transmitted, 8 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 3.77/28.6/142./49.
root@Secondary #

…………………….Primary(Standby)server……………………


# Internet host table
#
::1 localhost
127.0.0.1 localhost
10.10.1.80 Primary loghost
root@Primary # ls -l /etc/hostname*
-rw-r--r-- 1 root root 8 Jan 12 2011 /etc/hostname.e1000g0
root@Primary # ping -s 10.37.192.20 ------------à Ping to Secondary Server
PING 10.37.192.20: 56 data bytes
64 bytes from 10.37.192.20: icmp_seq=0. time=4.04 ms
64 bytes from 10.37.192.20: icmp_seq=1. time=3.71 ms
64 bytes from 10.37.192.20: icmp_seq=2. time=3.92 ms
64 bytes from 10.37.192.20: icmp_seq=3. time=3.71 ms
64 bytes from 10.37.192.20: icmp_seq=4. time=3.90 ms
^C
----10.37.192.20 PING Statistics----
5 packets transmitted, 5 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 3.71/3.86/4.04/0.14
root@Primary #


Alarm Information

No


Handling Process

Resolve:
primary and secondary force restart VCS

# hastop -all -force
# ps -ef | grep had
# kill -9 进程ID
# hastart -onenode


Root Cause

Confirm veritas heartbeat IP communicate normally, using ping X.X.X.X check the connection.
If heartbeat is down, please solve heartbeat problem first.

If confirmed heartbeat if OK, this is bug of veritas. and haven't solved yet. this happen when the network connection if unstable.


Suggestions

END