Atlas 800 Inference Server (Model 3000) 23.0.0 Ascend Software Installation Guide 01

Installation Description

Installation Description

This document describes how to quickly install the Ascend NPU driver and firmware and Compute Architecture for Neural Networks (CANN, AI heterogeneous computing architecture) software on the Atlas 800 inference server (model 3000) configured with the Atlas 300 inference card. Table 2-1 describes the software.

Table 2-1 Ascend software

Software Type

Description

Ascend NPU firmware

The firmware contains the OS, power component, and power consumption management control software of an Ascend AI Processor. It is used for model calculation, processor startup control, and power consumption control that are loaded to the Ascend AI Processor.

Ascend NPU driver

The driver is deployed on an Ascend server. It manages and queries the Ascend AI Processor and provides processor control and resource allocation interfaces for the upper-layer CANN software.

CANN

CANN is deployed on an Ascend server. It includes the Runtime, operator package (OPP), graph engine, and media data processing components. It uses Ascend Computing Language (AscendCL) to provide APIs for external systems to enable functions such as device management, context management, stream management, memory management, model loading and execution, operator loading and execution, and media data processing. This helps developers develop and run AI services on Ascend software and hardware platforms.

The CANN packages include the Toolkit (development kit), Kernels (binary OPP), NNAE (deep learning engine), NNRT (offline inference engine), and TFPlugin (TensorFlow framework plugin). The functions of each software package are as follows:
  • Toolkit: supports training and inference services, model conversion, and operator/application/model development and build.
  • Kernels: depends on Toolkit or NNAE to save the operator compile time. The binary OPP is required for networks with dynamic shape, and single-operator APIs (such as aclnn APIs).
  • NNAE: supports training and inference services.
  • NNRT: only supports offline inference.
  • TFPlugin: interconnects with the TensorFlow framework to help it call the underlying CANN APIs to run training services.

Ascend Docker

Ascend Docker (container engine plugin) is essentially Docker Runtime implemented based on the Open Container Initiative (OCI) standard without modifying the Docker engine and provides the Ascend NPU adaptation function for Docker as a plugin, so that AI jobs can run smoothly on Ascend devices as Docker containers.

Installation Scenarios

Figure 2-1 shows the deployment architecture on physical machines (PMs), virtual machines (VMs), and containers. The CANN software uses Toolkit and Kernels as an example. For containers, Ascend Docker (container engine plugin) must be deployed.

In the provided solution, the conventional Docker is used and is not customized by Huawei.

Figure 2-1 Installation scenarios

Hardware Mapping and Supported OSs

Table 2-2 and Table 2-5 list the hardware versions and supported OSs in each installation scenario. The system architecture is Arm.

In foundation model inference scenarios, you are advised to use the Atlas 300I Duo inference card with 96 GB memory to ensure that the memory meets the parameter requirements of foundation models.

Table 2-2 PM

Hardware Model

OS

Atlas 800 inference server (model 3000)+Atlas 300I Pro inference card

CentOS 7.6, Ubuntu 18.04.1, Ubuntu 18.04.5, Ubuntu 20.04, CUlinux 3.0, Kylin V10 SP1, Kylin V10 SP2, openEuler 20.03 LTS, openEuler 22.03 LTS

Atlas 800 inference server (model 3000)+Atlas 300V Pro video analysis card

CentOS 7.6, Ubuntu 18.04.1, Ubuntu 18.04.5, Ubuntu 20.04, UOS20 1050e, Kylin V10 SP1, openEuler 20.03 LTS, openEuler 22.03 LTS

Atlas 800 inference server (model 3000)+Atlas 300V video analysis card

Ubuntu 20.04, openEuler 22.03 LTS

Atlas 800 inference server (model 3000)+Atlas 300I Duo inference card

Ubuntu 20.04

Table 2-3 Container

Hardware Model

PM OS

Container OS

Atlas 800 inference server (model 3000)+Atlas 300I Pro inference card

CentOS 7.6, Ubuntu 18.04.1, Ubuntu 18.04.5, Ubuntu 20.04, CUlinux 3.0, Kylin V10 SP1, openEuler 22.03 LTS

CentOS 7.6, Ubuntu 18.04.5

Atlas 800 inference server (model 3000)+Atlas 300V Pro video analysis card

CentOS 7.6, Ubuntu 18.04.1, Ubuntu 18.04.5, Ubuntu 20.04, Kylin V10 SP1, openEuler 22.03 LTS

CentOS 7.6, Ubuntu 18.04.5

Atlas 800 inference server (model 3000)+Atlas 300V video analysis card

Ubuntu 20.04, openEuler 22.03 LTS

CentOS 7.6, Ubuntu 18.04.5

Atlas 800 inference server (model 3000)+Atlas 300I Duo inference card

Ubuntu 20.04

CentOS 7.6, Ubuntu 18.04.5

Table 2-4 VM (NPUs passing through to VMs)

Hardware Model

PM OS

VM OS

Atlas 800 inference server (model 3000)+Atlas 300I Pro inference card

CUlinux 3.0

CUlinux 3.0

Atlas 800 inference server (model 3000)+Atlas 300V Pro video analysis card/Atlas 300V video analysis card

Ubuntu 20.04, openEuler 20.03 LTS

Ubuntu 20.04, openEuler 20.03 LTS, CentOS 8.2, Kylin V10 SP2

Table 2-5 VM (NPUs passing through to VMs after computing power allocation)

Hardware Model

PM OS

VM OS

Atlas 800 inference server (model 3000)+Atlas 300V Pro video analysis card

UOS20 1050e

UOS20 1050e