No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

N8000 NFS service abnormal lead to cluster fault

Publication Date:  2014-09-30 Views:  108 Downloads:  1
Issue Description
When create an share filesystem  to an specific IP, and the IP related to several name , the share resource will be faulted, can’t be visited, and the issue will make cluster status be abnormal .
For example:
n8300fsp> nfs share show
/vx/nfs-cp-boss                  cp-boss.tele.co.yu (rw,no_root_squash,sync,secure,wdelay)
/vx/nfs-cp-boss                  cp-boss.tele.co.yu (rw,no_root_squash,sync,secure,wdelay)
.
.
.
Faulted Shares:
/vx/nfs-cs-ackup                          nwcuer.tele.co.rs.:
nwcuer.tele.co.yu.                                            :
nwcuer.tele.rs                      n8300fsp_01  n8300fsp_02  :

Alarm Information
1. the cluster was abnormal and couldnot be visited , all services stoped
n8300fsp> nfs share show
/vx/nfs-cp-boss                  cp-boss.tele.co.yu (rw,no_root_squash,sync,secure,wdelay)
/vx/nfs-cp-boss                  cp-boss.tele.co.yu (rw,no_root_squash,sync,secure,wdelay)
.
.
.
Faulted Shares:
/vx/nfs-cs-ackup                          nwcuer.tele.co.rs.:
nwcuer.tele.co.yu.                                            :
nwcuer.tele.rs                      n8300fsp_01  n8300fsp_02  :

2.      Check the vcs configuration file ,it will give us an error like this.
n8300fsp_02:/etc/VRTSvcs/conf/config # hacf -verify /etc/VRTSvcs/conf/config
VCS WARNING V-16-1-12021 Error - Quotes mismatch
(") in file ./main.cf:532
n8300fsp _01:~ # vim /etc/VRTSvcs/conf/config/main.cf    # look over VCS configuration file
                 #fix to line 532
                   Share share_114 (
                            Critical = 0
                            PathName = "/vx/nfs-cs5-backup"
                            Client = "nwcuerto.tor.co.rs.
                 nwcuerto.tor.co.yu.
                 nwcuerto.tor.rs"
                            Options = "fsid=1424033844,rw,no_root_squash,sync,secure,wdelay"
                            )

Handling Process
Solustion:
1. Before operating , backup this file .      
/etc/VRTSvcs/conf/config/main.cf

2. Check the vcs configuration file ,it will give us an error like this.
n8300fsp _01:~ # hacf -verify /etc/VRTSvcs/conf/config  
VCS WARNING V-16-1-12021 Error - Quotes mismatch
(") in file ./main.cf:328
n8300fsp _01:~ # vim /etc/VRTSvcs/conf/config/main.cf    # look over VCS configuration file
#fix to line 328
             Share share_100 (
                     Critical = 0
                     PathName = "/vx/nfs-cs5-backup"
                     Client = "fanct.taoyi.weihu.xiao.tian.qiang.zzs.huawei.symantec.hsnc.com.
     hs.fanct1.hsnc.com" #have two host with one IP
                     Options = "fsid=1834563753,rw,no_root_squash"
                     )

3.     share delete /vx/nfs-cs5-backup   nwcuerto.telenor.co.rs.

4. the blow operations are needed to be done in all nodes.
hares -state | grep share_10
ishare_100                  State                     SPC002B010_01 ONLINE
ishare_100                  State                     SPC002B010_02 ONLINE
share_100                   State                     SPC002B010_01 FAULTED
share_100                   State                     SPC002B010_02 FAULTED
n8300fsp_01:~ # haconf -makerw  
n8300fsp_01:~ # haconf -makerw                    #two times
n8300fsp_01:~ # hares -delete share_100      
n8300fsp_01:~ # haconf -dump -makero      

5. Check the vcs configuration file again, it’s outcome change to nothing.
n8300fsp_01:~ # hacf -verify /etc/VRTSvcs/conf/config/

6. find this file “/var/lib/nfs/etal”, delete the corresponding information of faulted shares , the blow operations are needed to be done in all nodes.
cat /var/lib/nfs/etab
/vx/nfs-cs5-backup       172.18.182.22/24
(rw,sync,wdelay,hide,nocrossmnt,insecure,no_root_squash,no_all_squash,subtree_check,insecure_locks,acl,fsid=1683389345,mapping=identity,anonuid=65534,anongid=65534)
/vx/nfs-cs5-backup       *
(rw,sync,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,subtree_check,secure_locks,acl,fsid=1683389345,mapping=identity,anonuid=65534,anongid=65534)

7. Change the corresponding relation in DNS server, this version only support one to one . or when you add this share don’t set any security policy,  it means do not share this fs specially to an address.

8. Re-add share.
n8300fsp>NFS> share add rw,no_root_squash /vx/nfs-cs5-backup 129.22.50.73
Checking if IP:129.22.50.73 can be resolved. Resolved to nwcuerto.telenor.co.rs.
Exporting /vx/nfs-cs5-backup with options rw,no_root_squash to client nwcuerto.telenor.co.rs.
....Success.
n8300fsp>NFS> share show
/vx/nfs-cs5-backup     nwcuerto.telenor.co.rs. (rw,no_root_squash)
n8300fsp>NFS>
n8300fsp_01:~ # vim /etc/VRTSvcs/conf/config/main.cf
        Share share_100 (
                Critical = 0
                PathName = "/vx/nfs-cs5-backup "
                Client = " nwcuerto.telenor.co.rs. "
                Options = "fsid=1450811618,rw,no_root_squash"
                )
n8300fsp _01:~ # hacf -verify /etc/VRTSvcs/conf/config/    
Root Cause
1.If the IP address have several hostname /alias ,  the hostnames will be queried in turn when running reverse nslookup.

2.So if NFS file has been shared to this IP, the queried domain name will be hostname_A at this time. But next  time, it maybe change to hostname_B. Now ,N8000 find that there is different with the domain name queried before. The system can’t separate the right one from them, so the NFS service go down with faulted.

3.Why the system can’t separate the right one from them?
The file recorded the information is locked by Veritas Cluster Service, and also the parameter can’t  support  multi-record ,it’s related to safety and stability of the cluster,  the software is belong to symantec. So we talked with symantec, they said we can’t change it manually.
 
Suggestions
1. change relationship of ip and hostname from ‘one to multi’ to ‘one to one’ in DNS server or host.

2. Disable reverse nslookup in N8300
It causes one issue, all NFS filesystem can only be share with IP address.

3. Print some warning when try to add/create share and not allow behavior as it is now (N8300 should not allow creating faulted shares), it is already realized in the new version.

END