No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

5288 V3 Server V100R003 User Guide 26

This document describes the features, appearance, and technical specifications of the 5288 V3 server, and how to install, power on, and power off the 5288 V3 server, connect cables, and install an operating system (OS) for it.
Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
RAS Features

RAS Features

Table 2-10 describes the RAS features supported by the 5288 V3. You can configure these features to improve server RAS.

NOTE:

For details about how to configure RAS features, see HUAWEI Server Grantley Platform BIOS Parameter Reference.

Table 2-10 RAS features

Module

Feature

Description

CPU

Corrected machine check interrupt (CMCI)

Corrects error-triggered interrupts.

DIMM

Failed DIMM isolation

Identifies a faulty dual in-line memory module (DIMM), and isolates it from others before replacing the DIMM.

Memory thermal throttling

Automatically adjusts DIMM temperatures to avoid damage due to overheating.

Rank sparing

Allocates memory ranks as backup ranks to prevent the system from crashing due to uncorrectable errors.

Memory address parity protection

Detects memory command and address errors.

Memory demand and patrol scrubbing

Provides the memory patrol function for promptly correcting correctable errors upon detection. If these errors are not corrected promptly, uncorrectable errors may occur.

Memory mirroring

Improves system reliability.

Single device data correction (SDDC)

Provides a single-device, multi-bit error correction capability to improve memory reliability.

Device tagging

Degrades and rectifies DIMM device faults to improve DIMM availability.

Data scrambling

Optimizes data stream distribution and reduces the error possibility to improve the reliability of data streams in the memory and the capability to detect address errors.

PCIe

PCIe advanced error reporting

Improves server serviceability.

QPI

Intel QPI link level retry

Provides a retry mechanism upon encountering errors to improve QPI reliability.

Intel QPI protocol protection via CRC

Provides cyclic redundancy check (CRC) protection for QPI packets to improve system reliability.

OS

Core disable for fault resilient boot (FRB)

Isolates a faulty CPU during startup to improve system reliability and availability.

Corrupt data containment mode

Identifies the memory storage unit that contains corrupted data to minimize the impact on running programs and improve system reliability.

Socket disable for FRB

Isolates a faulty socket during startup to improve system reliability.

Architected error records

With the eMCA feature, the basic input/output system (BIOS) collects error information recorded in hardware registers in compliance with UEFI specifications, sends the error information to the OS over the APEI of the Advanced Configuration and Power Interface (ACPI), and locates the error unit, improving system availability.

Error injection support

Injects errors to verify various RAS features.

Machine check architecture (MCA)

Provides software recovery for uncorrectable errors, which improves system availability.

Enhanced MCA (eMCA): Gen2

Improves system availability.

OOB access to MCA registers

The out-of-band system accesses MCA registers by using the Platform Environment Control Interface (PECI). If a fatal error occurs in the system, the out-of-band system collects onsite data to facilitate error analysis and locating and improve system serviceability.

BIOS abstraction layer for error handling

The BIOS processes errors and reports error information to the OS and iBMC in compliance with specifications to improve system serviceability.

BIOS-based predictive failure analysis (PFA)

The BIOS provides physical unit information for DIMM errors, and the OS traces and predicts errors, and isolates error memory pages.

Translation
Download
Updated: 2018-11-26

Document ID: EDOC1000080031

Views: 126191

Downloads: 2333

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next