Installation Scenarios
Development Environment
The development environment is used to develop and commission operators.
In addition to developing and commissioning operators, if you need to train network models in the development environment, deploy the development environment on an Ascend AI device (such as the training server or training card).
In a pure development environment, only the development kit needs to be installed, and the Ascend AI device does not need to be configured.
In the model training scenario, the driver, firmware, and deep learning framework need to be installed.
Operating Environment
The operating environment is the actual environment for model training. The Ascend AI devices (such as training serves or training cards) must be configured in the installation environment.
The operating environment supports the following installation modes:
- Physical machine scenario:
- Install the Ascend AI Processor driver and firmware.
- Install the training software, including the deep learning acceleration engine package, framework plug-in package, and toolbox package.
- Install the deep learning framework. After the training software is installed, you need to install TensorFlow before developing and verifying operators and training services.
- Install protobuf Python. If the training script depends on the Python version of protobuf to store data in the serialized structure (for example, the serialization interfaces of TensorFlow), you need to install protobuf Python.
- Configuring the IP address of the NPU card: If the AI training device is deployed in a cluster, you need to configure the NPU IP address after installing the driver, firmware, and training software. The IP address is used to transmit network model parameters between servers during model training, implementing network model parameter synchronization between servers.
- In the container scenario, you need to install Docker using the following procedure:
- Install the Ascend AI Processor driver and firmware on the host.
- Install the toolbox on the host.
- Use the Dockerfile to create a container image.
- Deploy the image file: Upload the packed image file to the operating environment for deployment.
- Configuring the IP address of the NPU card: If the AI training device is deployed in a cluster, you need to configure the NPU IP address after installing the driver, firmware, and training software. The IP address is used to transmit network model parameters between servers during model training, implementing network model parameter synchronization between servers.