Application Scenarios
Leveraging RoCEv2 Ethernet, AI Fabric integrates the service network, compute network, and storage network. This addresses the issue of multiple technologies and multiple networks deployed for one data center on a traditional DCN across the IP, FC, and InfiniBand eras. Additionally, AI Fabric achieves zero packet loss, low latency, and high throughput for compute and storage services of data centers. Currently, AI Fabric is mainly used in the following service scenarios:
- Distributed (cloud) storage and centralized storage
A distributed storage system stores data on multiple independent devices and adopts a scalable system architecture. Multiple storage servers can share the storage load, and location servers can locate storage information. This improves the system scalability, reliability, availability, and access efficiency. As distributed storage becomes increasingly popular, some performance-hungry applications such as databases of financial systems, are also using distributed storage.
A centralized storage system is one set of storage system consisting of multiple devices. Enterprises often deploy their storage devices in a centralized storage system. For example, the storage system of company EMX may need several cabinets. In terms of technical architectures, centralized storage can be classified into SAN and NAS. SAN can be further classified into FC SAN, IP SAN, and FCoE SAN. AI Fabric is dedicated to replacing FC SAN by RoCE networks.
- AI GPU
AI is a type of computer technology which is concerned with making machines work in an intelligent way, similar to the way that the human mind works. AI applications include robotics, voice recognition, image recognition, autonomous driving, and intelligent recommendation. The deep learning algorithm, which uses compute-intensive iterative floating-point operations, is crucial to AI. This algorithm extracts features of a large number of samples through multi-layer neural networks, and continuously performs adjustments and learning through parameters for training and then inference. To improve the deep computing capability, distributed nodes are used for AI training. AI has stringent requirements on the compute performance. Therefore, mainly GPUs are used for computing.
- HPC
An HPC system is built to improve the computing speed to a manner of tera operations per second (TOPS), and is mainly used for large-scale scientific data computing and processing of huge volumes of data, for example, scientific research, weather forecasting, computational simulation, military research, biopharmacy research, gene sequencing, and image processing.