Software Deployment Solution
The Ascend software supports four deployment scenarios: running environment (physical machine mode, VM mode, and container mode) and development environment.
Based on application scenarios, the Atlas solution combines the CANN software and system tools into the following software packages:
- Neuron Network Runtime (NNRT): used for offline inference and contains components such as AscendCL, GE, and Runtime.
- Neuron Network Accelerate Engine (NNAE): used for online training and online inference, including the AscendCL, GE, HCCL, and operator library components.
- Toolkit (development kit): includes all components required for CANN software development, such as AscendCL, GE, Runtime, HCCL, operator library, and profiling tool.
- Framework Adapter: contains TFPlugin and is used to connect to the TensorFlow framework.
- Toolbox (system management tool set): includes Ascend-DMI, Ascend-SDK-Manager and MSInstaller.
In the following scenarios, Atlas Server refers to an Atlas 800 AI training server (model 9000), Atlas 800 AI training card (model 9010), Atlas 900 AI cluster (model 9000), or Atlas 300T AI training card (model 9000).
PM-based Deployment Scheme
In the PM-based deployment scenario, the CANN software (NNRT and NNAE), AI applications (including the AI framework), and NPU drivers (including the hardware management tools) are deployed in the same OS environment. Toolbox can be installed for hardware health management, performance test, and software deployment. In the training scenario, the Framework Adapter needs to be deployed for interconnection with the AI framework.
VM Deployment Scheme
The VM deployment scenario is basically the same as the PM-based deployment scenario. The difference is that the NPU driver needs to be installed on the host machine for firmware upgrade and hardware health management.
Container-based Deployment Solution
Containers can be directly deployed on host machines or VMs. In the container-based scenario, the Framework Adapter, NNRT/NNAE, and Toolbox are deployed in containers and are separated from the NPU Driver deployed in the host OS. In addition, the Ascend container plugin (Ascend Docker Runtime) and K8s cluster device plugin (K8s Device Plugin) are deployed on the host OS.
Generally, containers are upgraded independently. Therefore, the NPU driver can be decoupled from the CANN software stack (such as NNRT and NNAE) in container-based scenarios.
Development Environment Deployment Scheme
You can install the development suite on the Atlas server independently. Based on the command-line interface (CLI), the development suite supports software compilation and debugging.
When MindStudio is used for development, the development environment is logically divided into the UI host and the AI host. The UI host can be a general-purpose server without the need to insert Atlas cards (NPU cards and training cards). With the Toolkit development suite installed, the UI host supports software development, compilation, model conversion, and operator development. The AI host, which can be an Atlas server with the Toolkit development suite installed, supports software debugging.
Develop Machine refers to the hardware system used to deploy the development environment. It can be any general-purpose server that can run the standard commercial OS.