Service Operations Fail Due to Abnormal Tomcat Process on a Computing Node

Publication Date:  2012-07-17 Views:  277 Downloads:  0
Issue Description
Related information about the product and version: CSE V100R001C00.
All operations involving computing nodes such as uploading files creating folders, and performing synchronization fail.

Operations involving the computing nodes include creating folders, deleting folders, uploading files, deleting files, previewing pictures, playing audios and videos, editing Microsoft Office files, viewing PDF files, replicating files and folders, moving files and folders, backing up files, recovering files, sharing resource repositories, and managing storage resources.
Alarm Information
Handling Process
Step 1     Run the /opt/apache-tomcat-6.0.18/bin/ stop command on all computing nodes to stop the Tomcat process.

                               Step 2     Run the /opt/apache-tomcat-6.0.18/bin/ start command on all computing nodes to start the Tomcat process.

                               Step 3     Run the tail -f /opt/apache-tomcat-6.0.18/log/catalina.out  command on all computing nodes to view logs. No error information is included in logs.

                               Step 4     Run the ps ef|grep java command to check whether the Tomcat process is running properly.

                               Step 5     Perform operations involving computing nodes to check whether they can be executed successfully. If yes, the fault is removed. If no, contact technical support engineers.


Root Cause
On the ISM interface, the alarm listed in Table 1-2 is reported.
Table 1-1 Alarm on a computing node response failure
Alarm Severity Alarm ID Alarm Name Alarm Description
Critical 0xB02230002 Computing node response failure The computing node does not respond. (The computing node name will be displayed.)
1.         Log in to the computing node by using KVM. Then run the ping command on the Base plane (external). The execution result shows that the network connection between this node and the ISM server is interrupted.
2.         Check the switches, T8000 server, and system data nodes. These devices are properly powered on. Network cables among devices are correctly connected. Status of network port indicators are normal. Therefore, the computing node response failure is not caused by hardware faults or abnormal network communication.
3.         Mount the Client Agent (CA) on the host in which the CA resides. If CA mount ok is displayed, the wushan CA is mounted correctly.

The wushan CA is installed on computing nodes. You need to mount it manually. If the wushan CA has not been mounted on the failed computing node, run the mount.wfs -t wlitex.x.x.xy.y.y.y:/domain/default /mnt/wsfs command to mount it. x.x.x.x indicates the IP address of the active MDS and y.y.y.y indicates the IP address of the standby MDS.
4.         Run the ps ef|grep httpd command on the failed computing node. The Tomcat process is found disabled. If the Tomcat process is enabled, the following information will be displayed after the execution of the command:
root     13481  3468  0 16:41 pts/1    00:00:00 grep java root     26709     1  1 09:47 ?        00:07:16 /opt/jdk1.6.0_14/bin/java -Djava.util.logging.config.file=/opt/apache-tomcat-6.0.18/conf/ -server -Xms1024M -Xmx8192M -XX:PermSize=512M -XX:MaxNewSize=256m -Dfile.encoding=GBK -XX:MaxPermSize=2048m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/opt/apache-tomcat-6.0.18/endorsed -classpath /opt/apache-tomcat-6.0.18/bin/bootstrap.jar -Dcatalina.base=/opt/apache-tomcat-6.0.18 -Dcatalina.home=/opt/apache-tomcat-6.0.18 org.apache.catalina.startup.Bootstrap run start
Therefore, the computing node response failure is caused by the abnormal Tomcat process on this node.
In CSE V100R001C00, you need to mount the wushan CA manually.

The Network File System operations are canceled in CSE V100R001C00.