Overview
This section explains basic concept of Reliability, Availability and Serviceability (RAS).
Why Is RAS Necessary?
The wide use of information technology (IT) enhances the role of IT in large-scale data centers and network centers such as stock exchanges, telecom equipment rooms, bank database centers. It is more and more important to keep systems operational as long as possible and make systems fault tolerant. RAS design has become an integral part of system design.
What Is RAS?
The core idea of RAS is to keep customer services running properly for as long as possible, that is, to minimize the possibility of downtime. To achieve high availability, single-node systems must have a reliable underlying layer (including hardware and underlying software) and capabilities of fault tolerance, quick recovery, and quick maintenance.
Why Is RAS Important?
Service continuity is essential to mission-critical servers. Even a very short downtime of mission-critical servers could incur a great service loss. As Enterprises rely more and more heavily on IT systems, downtime becomes more and more costly.
RAS design should at least guarantee component reliability, that is, minimize hardware faults.
To ensure that the most appropriate components can be correctly used, component-level reliability design must fully consider three factors: material reliability, product reliability, and production reliability. These factors are correlated to each other.