High-Availability Applications
The storage system has a highly reliable design, achieving a long mean time between failures (MTBF), and ensuring high availability of storage applications. It also incorporates a variety of data protection technologies, and protects data integrity and service continuity against catastrophic disasters.
In-Service Routine Maintenance
In traditional storage systems, routine maintenance tasks, such as component replacement and capacity expansion, must be implemented in offline mode. The storage system, however, assembles advanced technologies for in-service routine maintenance:
- Turbo Module
Enables online replacement of components and requires no system restart.
- Online capacity expansion
Allows online addition of disks and expansion of storage pools.
Tolerance of Single Points of Failures
The storage system incorporates a hierarchical redundancy design to eliminate the impact of single points of failure:
- Hardware redundancy
All components of the series are in redundancy and work in active-active mode. If one component fails, the other speeds up to compensate so that the storage system can continue operating.
- Link redundancy
If there is only one link between the storage system and an application server, the disconnection of the link terminates their communication. To eliminate this failure, the series storage system uses two or more links to communicate with the application server. Therefore, if one link is down, the other links take over the services to continue the data transmission.
- Application server clustering
If the storage system cooperates with only one application server, the failure of the application server interrupts services. Application server clustering can address this issue. A cluster consists of two or more application servers that share loads. If one application server in the cluster fails, the other application servers take over its loads, and the whole process is transparent to users. Application server clustering supported by the series ensures business continuity.
Based on the previous protection mechanisms, the storage system has proven tolerance of single points of failure, as shown in Figure 3-2.
In the example in Figure 3-2, application server A and controller A are faulty, and a link between the cluster and the storage system is down. Under this circumstance, the redundant components and links compensate for the failed ones, and services are switched to application server B that is running properly. This ensures the nonstop system operations and greatly improves the service availability.
Resilience Against Disasters
The storage system compliments various data protection methods for backup and disaster recovery. Those methods eliminate the risks of unexpected downtime and data loss caused by natural disasters, serious device failures, or man-made misoperations.
The supported data protection methods include:
- Backup
The storage system processes a huge amount of data, and the loss of any data can lead to a disastrous result. Therefore, enterprises are used to periodically back up their critical data. The following backup technologies are most commonly used because they complete data backup in a hitless manner:
- HyperSnap: locally generates a virtual duplicate for a source LUN at a specified point in time. The duplicate is immediately usable and any access to it will have no impact on the source LUN data.
- HyperClone: locally generates a complete copy for a source LUN at a specified point in time. After the clone task, the destination LUN stores the same data as the source LUN, and their relationship can be split. Then any access to the destination LUN has no impact on the source LUN data.
- HyperCopy: replicates data from the source LUN to the destination LUN at block level. A LUN copy task can be performed within a storage system or among storage systems (even if they are heterogeneous).
- HyperMirror: backs up data in real time. If the source data becomes unavailable, applications can automatically use the data copy, ensuring data security and application continuity.
- HyperMetro: synchronizes and replicates data between storage arrays, monitors service operating status, and performs failovers. In addition, it can switch over services and implement service load sharing when storage arrays are running.
- Disaster recovery
Disaster recovery is essential for critical applications that must continue operating even during catastrophic disasters. Disaster recovery technologies involve many aspects such as storage systems, application servers, application software, and technicians. From the storage system aspect, the remote replication technology is used for disaster recovery because it can back up data in real time.
The technology duplicates backup data in real time across sites, and utilizes the long distance between sites to eliminate data loss. This ensures that data is readily available on other sites if one site is destroyed.