No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionStorage Block Has Alarm "ALM-51804 Abnormal ZooKeeper Process"

Publication Date:  2019-04-22 Views:  23 Downloads:  0
Issue Description
FusionStorage Block has alarm "ALM-51804 Abnormal ZooKeeper Process". The ZooKeeper process cannot serve MDC, reducing the reliability of the MDC service. If more than half of ZooKeeper processes are faulty, ZooKeeper does not provide services.
Alarm Information

ALM-51804 Abnormal ZooKeeper Process

Handling Process
  1. Check the disk status of the faulty ZK node. The disk status is normal. Considering that an alarm indicating that disabling disk write cache fails was reported earlier, conduct an in-depth analysis of the related script. It is found that the MegaCli tool is invoked to disable disk write cache, but an error message is reported after the /opt/MegaRAID/MegaCli/MegaCli64 -PDList –aALL command is executed on the faulty node to obtain disk information.

  2. According to the log information, the tool is faulty and needs to be reinstalled. However, if MegaCli is reinstalled, the OSD process will be restarted, so the tool needs to be installed after six o'clock p.m.

  3. Reinstall MegaCli.

Before reinstalling Megacli, ensure that the storage pool status is normal and no data migration is in progress, and then enable this node to enter the maintenance mode.

  1. On the FSPort page, set the maintenance mode for the server of the faulty node.

  2. Run the rpm -qa |grep -i mega command to query the mega tool name.

  3. Run the rpm -e MegaCli-8.07.07-1 command to uninstall the tool.

  4. Obtain the new installation package, import it using WinSCP, and run the rpm -ivh installation package name command.

  5. Restart the smio service.

    # cd /opt/dsware/osd/ko/`uname -r`/smio;

    # killall -9 dsware_osd;sleep 3;sh smio_stop;sh smio_start;

  6. Check whether the OSD process is normal. If the process time is reset, the OSD process is restarted successfully.

    ps -ef | grep osd

  7. Log in to the FusionStorage portal and cancel the maintenance mode of the server. Ensure that the storage pool status is normal, no data reconstruction task is in progress, and the alarm is cleared.

Root Cause

Check the disk status of the faulty ZK node. The disk status is normal. Considering that an alarm indicating that disabling disk write cache fails was reported earlier, conduct an in-depth analysis of the related script. It is found that the MegaCli tool is invoked to disable disk write cache, but an error message is reported after the /opt/MegaRAID/MegaCli/MegaCli64 -PDList –aALL command is executed on the faulty node to obtain disk information.

END