Key Processing Monitoring
Overview
Regular monitoring is provided for key processes of the system. If any key process ends unexpectedly, the time at which the process ends is recorded in the log, and the system attempts to restore the process. If the process fails to be restored, an alarm will be reported. In this way, system administrators can detect process faults immediately and check whether the process is restored.
Configuration Description
The configuration files of key processing monitoring are stored in /etc/sysmonitor/process. Each process or module has a configuration file.
Example configuration file:
NAME=cron RECOVER_COMMAND=systemctl restart crond MONITOR_COMMAND=systemctl status crond STOP_COMMAND=systemctl stop crond
Table 1 Configuration description describes configuration items in the configuration file.
Configuration Item |
Description |
Mandatory or Not |
Default Value |
---|---|---|---|
NAME |
Process or module name |
Yes |
N/A |
RECOVER_COMMAND |
Command used to restore the process |
Yes |
N/A |
MONITOR_COMMAND |
Command used to monitor the process NOTE:
If value 0 is returned, the process is running normally. If any other value greater than 0 is returned, the process is abnormal. |
No |
pidof -x process name |
STOP_COMMAND |
Command used to stop the process |
No |
N/A |
- After the configuration file of key process monitoring is modified, run the systemctl reload sysmonitor command so that the modifications can take effect in a new monitoring period.
- Do not repeatedly run restoration and monitoring commands. Otherwise, the monitoring thread of the key process becomes abnormal.
- If the process restoration command has been run for more than 90s, the stop command will be run to end the process.
- If a key process is abnormal and cannot be started after three attempts, this process will be started based on the value of PROCESS_RECALL_PERIOD specified in the configuration file.
Error Log
If a module or process is detected to be abnormal, the following information is printed in /var/log/sysmonitor.log:
[2015-03-16:18:41:38]sysmonitor[5018]: cron is abnormal, use "systemctl restart crond" to recover [2015-03-16:18:41:38]sysmonitor[5018]: cron is abnormal