No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-70113 Local Nvme SSD Error

ALM-70113 Local Nvme SSD Error

Description

FusionSphere OpenStack periodically checks the health status of the NVME SSD disk or card on the host. This alarm is generated when the system detects that the disk or card is in the unhealthy status.

Attribute

Alarm ID

Alarm Severity

Auto Clear

70113

Critical

Yes

Parameters

Name

Meaning

Fault Location Info

  • host_id: specifies the ID of the host for which the alarm is generated.
  • pci-address: specifies the PCI address of the SSD card for which the alarm is generated.
  • error-type: specifies the exception type of an alarm.

Additional Info

  • host_id: specifies the ID of the host accommodating the VM for which the alarm is generated.
  • Local Nvme SSD: specifies the NVMe SSD used by the VM for which the alarm is generated.
  • instance_uuid: specifies the ID of the VM for which the alarm is generated.

Impact on the System

Services of VMs using this disk or card will be interrupted. If you replace the faulty disk or card, services of all VMs running on this host will be interrupted.

Possible Causes

The NVME SSD disk or card has a hardware fault.

Procedure

  1. Log in to the Service OM web client.

    For details, see Logging In to and Logging Out of Service OM.

  2. Open the alarm details to obtain the host ID and PCI address, as shown in the following figure.

  3. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default user name is fsp. The default password is Huawei@CLOUD8.

    The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

    NOTE:
    To obtain the IP address of the External OM plane, search for the required parameter on the Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
    • Region Type I scenario:

      Cascading system: Cascading-ExternalOM-Reverse-Proxy

      Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

    • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  4. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  5. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  6. Query the system slot corresponding to the NVMe SSD disk or card.

    1. Run the cps host-list | grep Host ID command to obtain the management plane IP address of the host.
    2. Run the su fsp command to switch to user fsp.
    3. Run the ssh fsp@ Management plane IP address command to switch to the management plane IP address.
    4. Repeat 4 to 5.
    5. Run the ls /sys/bus/pci/slots/ command to obtain all system slot IDs.
    6. Run the following command repeatedly until the command output is the same as the PCI address obtained in 2. In this case, the value of $slot indicates the system slot corresponding to the NVMe SSD disk or card.

      cat /sys/bus/pci/slots/$slot/address

      NOTE:
      • In the command, $slot indicates the slot number of the system. The value is obtained from 6.e.
      • Theoretically, the system slot number obtained in 6.e is the same as that on the server.

  7. Ask all tenants on this host to stop the VMs for replacing the hard disk.

    Data of only the faulty VM will be lost.

  8. After all VMs are stopped, run the following command and securely remove the disk or card:

    echo 0 > /sys/bus/pci/slots/$slot/power

  9. Observe the NVMe PCIe hard disk indicator. If the fault indicator is blinking slowly (0.5 Hz/s), remove the disk or card.
  10. Power off the host and replace the faulty disk or card with a functional one having the same specification (disk specification, disk type, and installation slot).
  11. Clear the alarm on Service OM.
  12. Power on the host. After all the VMs running on this host are started, notify the tenants.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 37138

Downloads: 31

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next