No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-73010 Faulty File System

ALM-73010 Faulty File System

Description

The alarm collection module monitors the input/output (I/O) status of the file system. This alarm is generated when an I/O processing exception occurs.

Attribute

Alarm ID

Alarm Severity

Auto Clear

73010

Major

Yes (restart)

You need to restart the server to clear the alarms, that can be automatically cleared, mentioned in this document. If the file system fault is not rectified, the alarm is reported again.

Parameters

Name

Meaning

Fault Location Info

host_id: specifies the ID of the host for which the alarm is generated.

disk: specifies the name of an exception file.

Additional Info

  • error_info: provides alarm exception information.
  • host_id: specifies the ID of the host for which the alarm is generated.
  • hostname: specifies the name of the host for which the alarm is generated.
  • HostIP: specifies the IP address of the host for which the alarm is generated.

Impact on the System

  • Most processes on a server are abnormal due to the read-only root partition of the server and VMs are severely faulty.

    The log partition is fully occupied by error logs. Logs may not be dumped in a timely manner and some may be lost.

  • The read-only /var/log partition may lead to the failure of log write and log dumping.
  • The read-only /var/ceilometer partition may cause a MongoDB service exception. If the file system is /dev/hioa1, the MongoDB service uses SSDs.

Possible Causes

  • The server is powered off unexpectedly, causing a write I/O operation loss or data write to a wrong sector.
  • The server disk or the RAID controller card is damaged, or the RAID controller card does not have a battery.
  • The version of the disk driver does not match the hardware version.
  • If the file data is not stored locally, the host is disconnected with the local storage.
  • The disk data is damaged.

Procedure

  1. Log in to the FusionSphere OpenStack web client.

    For details, see Logging In to the FusionSphere OpenStack Web Client (ManageOne Mode).

  2. On the Summary page, obtain the management IP address of the host in the OM IP Address column based on the host ID or host name in the alarm additional information.
  3. Use PuTTY to log in to the host for which the alarm is generated using the management IP address of the host.

    The default user name is fsp. The default password is Huawei@CLOUD8.

    The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

  4. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  5. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  6. Run the following command to import environment variables:

    source set_env

    Information similar to the following is displayed:

      please choose environment variable which you want to import: 
      (1) openstack environment variable (keystone v3) 
      (2) cps environment variable 
      (3) openstack environment variable legacy (keystone v2) 
      (4) openstack environment variable of cloud_admin (keystone v3) 
      please choose:[1|2|3|4] 

  7. Enter 1 to enable Keystone V3 authentication and enter the password of OS_USERNAME as prompted.

    Default account format: DCname_admin; default password: FusionSphere123.

  8. Run the following command to query the device ID of the host accommodating the partition for which the alarm is generated:

    ll /sys/dev/block/ | grep -w xxx$ | awk '{print $9}'

    xxx specifies the target partition, for example, dm-12 and sda1.

  9. Run the following command to query the physical disk drive letter of the target partition:

    lsblk

    For example, in the following figure, the device ID obtained from 8 is 253:2, and the physical drive letter is sda.

  10. Run the following command to determine whether the partition is provided by local storage:

    lsscsi -t | grep -w xxx

    xxx specifies the physical drive letter obtained in 9. Check whether the character string in the third column in the command output starts with fc: for IP SAN or iqn. for FC SAN.

    • If yes, the target partition is provided by remote storage, go to 13.

    • If no, the target partition is provided by local storage or Huawei multi-path storage, run the following command.

      upadmin show vlun | grep -w xxx

      If "command not found" is displayed, the Huawei UltraPath package is not installed on the server and local storage is used. In this case, go to 11. If the command has output, check whether the "xxx disk" is displayed in the command output.

      • If yes, the target partition is provided by Huawei multi-path storage, go to 13.

      • If no, the target partition is provided by local storage, go to 11.

  11. Run the last reboot command to check whether the host has been restarted.

    • If the affected host has restarted and can be logged in properly, manually clear the alarm. No further action is required. If the login fails, go to 18.
    • If the affected host has not restarted, go to 12.

  12. Check whether the server disk or RAID controller card has been damaged.

    Check whether a hardware alarm is generated on the BMC of the required node.

  13. Check whether the disk array has been damaged.

    • If yes, replace the hardware and go to 14.
    • If no, go to 14.

  14. Check whether the host can properly communicate with the remote storage.

    • If yes, go to 15.
    • If no, recover the communication between the host and the remote storage and go to 15.

  15. Check the Swift partition in the XFS format and perform restoration operation.

    • Check whether the Swift partition in the XFS format is read-only.

      Run the following command to check whether the Swift partition is normal:

      ll /opt/HUAWEI/swift

      ls: cannot access /opt/HUAWEI/swift: Input/output error

      If the message "Input/output error" is displayed, the Swift partition is read-only. In this case, perform restoration operation.

      If a file list is displayed, the Swift partition is normal, and no restoration operation is required.

    • Restore the Swift partition in the XFS format. If any command fails to be executed, contact technical support for assistance.
      1. Run the following command to query the Swift mount directory:

        cat /proc/mounts | grep swift

        /dev/mapper/extend_vg-swift /opt/HUAWEI/swift xfs rw,relatime,attr2,inode64,noquota 0 0

        In this example, the Swift partition is /dev/mapper/extend_vg-swift, and the mount directory is /opt/HUAWEI/swift.

      2. Run the following command to stop the swift-store service:

        cps host-template-instance-operate --service swift swift-store --action stop --host Host ID

        The host ID can be obtained from the alarm additional information.

      3. Run the following command to unmount the Swift partition.

        The operation is successful if no command output is displayed.

        umount /opt/HUAWEI/swift

      4. Run the following command to restore the Swift partition:

        xfs_repair /dev/mapper/extend_vg-swift

        Phase 7 - verify and correct link counts...
        done
      5. Run the following command to mount the Swift partition:

        mount /dev/mapper/extend_vg-swift /opt/HUAWEI/swift

      6. Run the following command to start the swift-store service:

        cps host-template-instance-operate --service swift swift-store --action start --host Host ID

        The host ID can be obtained from the alarm additional information.

  16. Check whether the partition in the extX format is read-only and perform restoration operation. MongoDB is used as an example.

    • Check whether the partition in the extX f format is read-only.

      Run the following command to check whether the Swift partition is read-only:

      cat /proc/mounts | grep extend_vg

      /dev/mapper/extend_vg-ceilometer--data /var/ceilometer ext4 ro,relatime,data=ordered 0 0

      If the partition in the ext4 format is in the ro status, it is read-only and you need to perform restoration operation. In this example, the MongoDB partition is /dev/mapper/extend_vg-ceilometer--data, and the mount directory is /var/ceilometer.

    • Restore the partition in the extX format. If any command fails to be executed, contact technical support for assistance.
      1. Run the following command to stop the MongoDB service:

        cps host-template-instance-operate --service mongodb mongodb --action stop --host Host ID

        The host ID can be obtained from the alarm additional information.

      2. Run the following command to unmount the MongoDB partition. The operation is successful if no command output is displayed.

        umount /var/ceilometer

      3. Run the following command to restore the MongoDB partition:

        fsck.ext4 /dev/mapper/extend_vg-ceilometer--data

        Pass 5: Checking group summary information
        NOTE:

        If the file system format of the detected read-only partition is ext3, run the following command to restore the ext3 file system:

        fsck.ext3 /dev/mapper/extend_vg-ceilometer--data

      4. Run the following command to mount the Swift partition:

        mount /dev/mapper/extend_vg-ceilometer--data /var/ceilometer

      5. Run the following command to start the MongoDB service:

        cps host-template-instance-operate --service mongodb mongodb --action start --host Host ID

        The host ID can be obtained from the alarm additional information.

  17. On the FusionSphere OpenStack web client, check whether MongoDB or other components have been recovered.

    • If yes, manually clear the alarm.
    • If no, go to 18.

  18. Reinstall an OS and rebuild VMs for the faulty host. After the system is restored, the alarms related to disk loading in /etc/fstab are automatically cleared. Check whether other disk alarms are cleared upon restart and whether they are reported again. If these alarms are not reported, no further action is required. If the fault cannot be rectified, contact Contact technical support for assistance..

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 45740

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next