No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionServer Pro XH321 V5 Server Node V100R005 Maintenance and Service Guide 07

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
RAS Features

RAS Features

The server supports a variety of Reliability, Availability, and Serviceability (RAS) features. You can configure these features for better RAS.

For details about how to configure RAS features, see Huawei Server Purley Platform BIOS Parameter Reference.

Table A-6 RAS features

Module

Feature

Description

CPU

Corrected Machine Check Interrupt

Corrects error-triggered interruption.

DIMM

Failed DIMM Isolation

Identifies a faulty DIMM and isolates it from others before it is replaced.

Memory Thermal Throttling

Automatically adjusts DIMM temperatures to avoid damage due to overheating.

Rank Sparing

Allocates some memory ranks as backup ranks to prevent the system from crashing due to uncorrectable errors.

Memory Address Parity Protection

Detects memory command and address errors.

Memory Demand and Patrol Scrubbing

Corrects errors upon detection. If these errors are not corrected promptly, uncorrectable errors may occur.

Memory Mirroring

Improves system reliability.

Single Device Data Correction

Provides a single-device multi-bit error correction capability to improve memory reliability.

Device Tagging

Degrades and rectifies DIMM device faults to improve DIMM availability.

Data Scrambling

Optimizes data stream distribution and reduces the error possibility to improve the reliability of data streams in the memory and the capability to detect address errors.

PCIe

PCIe Advanced Error Reporting

Improves server serviceability.

UPI

Intel UPI Link Level Retry

Provides a retry mechanism upon errors to improve UPI reliability.

Intel UPI Protocol Protection via CRC

Provides cyclic redundancy check (CRC) protection for UPI packets to improve system reliability.

System

Core Disable for Fault Resilient Boot (FRB)

Isolates a faulty CPU core during startup to improve system reliability and availability.

Corrupt Data Containment Mode

Identifies the memory storage unit that contains corrupted data to minimize the impact on the running programs and improve system reliability.

Socket disable for Fault Resilient Boot (FRB)

Isolates a faulty socket during the BIOS startup process to improve system reliability.

Architected Error Records

With the enhanced machine check architecture (eMCA) feature, the BIOS collects error information from hardware registers in compliance with UEFI specifications, sends the error information to the OS over the APEI of the Advanced Configuration and Power Interface (ACPI), and locates the error unit, improving system availability.

Error Injection Support

Injects errors to verify various RAS features.

Machine Check Architecture

Provides software recovery for uncorrectable errors to improve system availability.

eMCA: Gen2

Improves system availability.

OOB access to MCA registers

The OBB system accesses MCA registers by using the Platform Environment Control Interface (PECI). If a fatal error occurs in the system, the out-of-band system collects onsite data to facilitate fault analysis and locating and improve system serviceability.

BIOS Abstraction Layer for Error Handling

The BIOS processes errors and reports the error information to the OS and the server in compliance with specifications to improve system serviceability.

BIOS-based Predictive Failure Analysis (PFA)

The BIOS provides physical unit information for DIMM errors, and the OS traces and predicts errors, and isolates error memory pages.

Download
Updated: 2019-12-25

Document ID: EDOC1000183891

Views: 30992

Downloads: 110

Average rating:
This Document Applies to these Products

Related Version

Related Documents

Share
Previous Next