No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

ES3000 V3 NVMe PCIe SSD User Guide 20

This document provides the product information about the Huawei ES3000 V3 PCIe SSD (ES3000 V3 for short) and describes how to install, configure, operate, and maintain the ES3000 V3.
Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Disk Internal Faults or Other Faults

Disk Internal Faults or Other Faults

Disk internal faults indicate scenarios where disk internal alarms occur while the OS can properly identify disks. If such faults occur, the disks do not function properly or cannot be used. This topic uses an example to describe how to identify such faults.

Use Instance

  1. Determine the disk health status. For details, see Querying Basic Information About a Device.

    # Query the basic information of nvme0.

    [root@localhost tool]# hioadm info -d nvme0 
     
    Namespace<1> size:  1.6TB,    1600321314816Byte  
                 formatted LBA size:         512 Byte  
                 formatted metadata size:    0 Byte  
      
    maximum capacity                 : 1.6TB   
    current capacity                 : 1.6TB   
    volatile write cache             : Enable  
    serial number                    : 0503023HDCN107C80013  
    model number                     : HWE32P430016M00N  
    firmware version                 : 2.14  
    NVMe version                     : 1.2  
    device status                    : healthy
    

    device status in the command output indicates the SSD controller health status.

    The value healthy indicates that the device is healthy.

    The value warning indicates that an exception occurs on the device. For details, see 2.

  2. Determine the SMART status of the disk. For details, see Querying the SMART Information About a Device.

    # Query the SMART information of nvme0.

    [root@localhost tool]# hioadm info -d nvme0 -s 
    critical warning              : no warning 
    composite temperature         : 308 degrees Kelvin (35 degrees Celsius) 
    available spare               : 100% 
    available spare threshold     : 10% 
    percentage used               : 0% 
    data units read               : 68.8 MB 
    data units written            : 0.0 MB 
    host read commands            : 17748  
    host write commands           : 0  
    controller busy time          : 0 mins 
    power cycles                  : 89 times 
    power on hours                : 1164 h 
    unsafe shutdowns              : 35 times 
    media and data integrity errors: 0  
    number of error information log entries: 0  
    warning composite temperature time: 0 min 
    critical composite temperature time: 0 min 
    data status                   : OK
    
    Critical Warning: Critical exceptions occur on the device, and emergency handling is required.
    Table 5-1  Critical Warning parameters

    Parameter

    Description

    Bit

    critical warning

    Critical warning, such as overtemperature and insufficient redundant space.

    • 0: No warning.
    • 1: The available spare space has fallen below the threshold.
    • 2: The temperature is above an over temperature threshold or below an under temperature threshold.
    • 3: The NVM subsystem reliability has been degraded due to significant media related errors or any internal error that degrades NVM subsystem reliability.
    • 4: The media has been placed in read only mode.
    • 5: The volatile memory backup device has failed.
    Typical fault causes are as follows:
    • Available spare space below the threshold: Causes include excessive number of damaged blocks.
    • Temperature above the overtemperature threshold or below the undertemperature threshold: Causes include disk temperature below 0°C or above 78°C.
    • NVM subsystem reliability degraded due to internal errors: Causes include excessive number of failed disk granules or internal subsystem operating exceptions.
    • Media placed in read only mode: Causes include capacitor failures.
    • Volatile memory backup device failure: Causes include capacitor voltage below 28 V or above 35 V.

Fault Diagnosis Instance

The available space is smaller than the threshold.
  1. Check whether the value of available spare in the SMART information exceeds 10%, if the value is greater than or equal to 10%, stop using the disk and back up data immediately. Otherwise, go to 2.
  2. Obtain disk logs by following instructions in One-Click Log Collection and contact Huawei technical support.
The service life exceeds the threshold.
  1. Check whether the value of percentage used in the SMART information exceeds 100%, if the value is greater than or equal to 100%, stop using the disk and back up data immediately. Otherwise, go to 2.
  2. Collect disk log information. For details, see One-Click Log Collection. Contact Huawei technical support.
The temperature exceeds the threshold.
  1. Check that the server where the disk resides provides proper cooling. If the server reports no temperature alarm and the fan modules are operating properly, go to 2.
  2. Check that the server where the disk resides provides proper cooling. If the server reports no temperature alarm and the fan modules are operating properly, go to 2.
  3. Obtain disk logs by following instructions in One-Click Log Collection and contact Huawei technical support.
  • An internal error downgrades the NVM subsystem reliability.
  • The medium is read-only.
  • Device backup failed.
  1. Obtain disk logs by following instructions in One-Click Log Collection and contact Huawei technical support.
Translation
Download
Updated: 2019-03-12

Document ID: EDOC1000101091

Views: 65976

Downloads: 1260

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next