Configuring Environment Variables
Training involves startup parameters. You are advised to build a bash startup script and upload it to the operating environment. During subsequent training, you can directly run the startup script to perform training.
The startup script is used to configure the environment variables on which the training process startup depends, and start the training script.
Currently, multiple-process training on the same device is not supported.
Configuration Example of Startup Script
- Configuration of a startup script for performing training on a device is as follows.
# Set the environment variables for the installation paths of the training component dependencies as follows: # Method 1: Install Ascend-CANN-Toolkit for training on an Ascend AI device in the development environment. export install_path=/home/HwHiAiUser/Ascend/ascend-toolkit/latest# Ascend-CANN-Toolkit installation path. Replace it as required. # Method 2: Install Ascend-CANN-NNAE for training on an Ascend AI device. export install_path=/home/HwHiAiUser/Ascend/nnae/latest# Ascend-CANN-NNAE installation path. Replace it as required. # The following environment variables are required: # Driver dependency export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH # Required only in the container-based training scenario export LD_LIBRARY_PATH=/usr/local/Ascend/add-ons:$LD_LIBRARY_PATH # FwkACLlib dependency export LD_LIBRARY_PATH=${install_path}/fwkacllib/lib64:$LD_LIBRARY_PATH export PYTHONPATH=${install_path}/fwkacllib/python/site-packages:${install_path}/fwkacllib/python/site-packages/auto_tune.egg/auto_tune:${install_path}/fwkacllib/python/site-packages/schedule_search.egg:$PYTHONPATH export PATH=${install_path}/fwkacllib/ccec_compiler/bin:${install_path}/fwkacllib/bin:$PATH # TFPlugin dependency export PYTHONPATH=/home/HwHiAiUser/Ascend/tfplugin/latest/tfplugin/python/site-packages:$PYTHONPATH # Replace the TFPlugin installation path with the actual one. # OPP dependency export ASCEND_OPP_PATH=${install_path}/opp #Path of the startup script export PYTHONPATH=/home/test:$PYTHONPATH export JOB_ID=10087 export ASCEND_DEVICE_ID=0 export RANK_TABLE_FILE=/home/test/rank_table.json # Replace it with the actual path. export RANK_ID=0 export RANK_SIZE=8 # Specify the dataset path. DATA_URL=/home/test/data/train/ EVAL_DATA_URL=/home/test/data/eval/ MODEL_NAME=resnet_v1_50 # Start the training script. python3.7 /home/test/xxx.py /
- To perform distributed training on multiple devices, you need to use the startup script to start all training processes in sequence. Therefore, you need to specify ASCEND_DEVICE_ID and RANK_ID before starting each training process as follows:
export ASCEND_DEVICE_ID=1 export RANK_ID=1
Configuring Environment Variables
Variable |
Description |
Required/Optional |
---|---|---|
LD_LIBRARY_PATH |
Dynamic library path. Set this variable by referring to the preceding example. CAUTION:
If GCC in the training environment (such as CentOS, Debian, and BClinux) needs to be upgraded, add ${install_path}/lib64 (replace {install_path} with the GCC installation path) to this variable. For details, see 5. |
Required |
PYTHONPATH |
Python path. Set this variable by referring to the preceding example. |
Required |
PATH |
Executable program path. Set this variable by referring to the preceding example. |
Required |
ASCEND_OPP_PATH |
Operator package (OPP) root directory. Set this variable by referring to the preceding example. |
Required |
JOB_ID |
Training job ID, which is user-defined The value can contain letters, digits, hyphens (-), and underscores (_). You are not advised to set JOB_ID to digits starting with 0. |
Required |
DEVICE_ID |
NOTICE:
This environment variable is supported by the current version, but will be discarded in subsequent versions. Therefore, for new versions, you are advised to use ASCEND_DEVICE_ID. In the training scenario or when the Auto Tune tool is used for optimization, use this environment variable to specify the physical ID of a processor, that is, the serial number of the device on the server corresponding to the HDC channel ID. To view the value of DEVICE_ID, run the ls -l /dev | grep davinci command on the physical machine or VM on the host side as the root user. The following values in bold are the device IDs: [root@localhost home]# ls -l /dev | grep davinci crw-rw----. 1 HwHiAiUser HwHiAiUser 241, 0 Jul 23 2020 davinci0 crw-rw----. 1 HwHiAiUser HwHiAiUser 241, 0 Jul 23 2020 davinci1 crw-rw----. 1 HwHiAiUser HwHiAiUser 241, 0 Jul 23 2020 davinci2 crw-rw----. 1 HwHiAiUser HwHiAiUser 241, 0 Jul 23 2020 davinci3 ... crw-rw----. 1 HwHiAiUser HwHiAiUser 240, 0 Jul 23 2020 davinci_manager When both DEVICE_ID and ASCEND_DEVICE_ID are supported in the current version, the system processing logic is as follows:
|
Optional |
ASCEND_DEVICE_ID |
NOTICE:
This environment variable will replace DEVICE_ID in later versions. Therefore, you are advised to use ASCEND_DEVICE_ID for a newly installed version. In the training scenario or when the Auto Tune tool is enabled, use this environment variable to specify the logical ID of a processor. The value range is [0, N–1], where N indicates the number of devices on the physical machine, VM, or container. The default value is 0. When both DEVICE_ID and ASCEND_DEVICE_ID are supported in the current version, the system processing logic is as follows:
|
Optional |
RANK_TABLE_FILE |
Processor resource information of distributed training. Set this parameter to the path of the ranktable file, including the file path and file name. NOTE:
For details about the ranktable file, see Configuring Processor Resources. |
Required |
RANK_ID |
When ranktable template 1 is used, the value of this parameter is the same as that of rank_id. When ranktable template 2 is used, the value of this parameter is the same as that of pod_name. |
Required |
RANK_SIZE |
Number of devices in the cluster. |
Required |
GE_USE_STATIC_MEMORY |
Static memory allocation enable for network execution. Set this parameter to 1 to enable static memory allocation. This is especially useful when the inference network has too many layers, for example, the BERT24 network whose intermediate data volume in feature map computation could reach 25 GB. In this case, enabling static memory allocation can improve the collaboration efficiency between the communication DIMMs in multi-device scenarios. You are advised to retain the default value to use dynamic memory allocation for networks other than BERT24. In static memory allocation mode, the default allocation is 31 GB, which is determined by the sum of graph_memory_max_size and variable_memory_max_size. In dynamic memory allocation mode, the allocation is within the sum of graph_memory_max_size and variable_memory_max_size. |
Optional |
TE_PARALLEL_COMPILER |
Maximum number of operator build processes in parallel. Defaults to 8. When the value is greater than 0, parallel build is enabled. Parallel build is especially useful when a large network is used. The maximum value is calculated as follows: Maximum value = 80% the number of CPU cores/Number of Ascend AI Processors. |
Optional |
PROFILING_MODE |
Profiling enable.
|
Optional |
PROFILING_OPTIONS |
Option (or options separated by colons) to be traced in profiling.
NOTE:
|
Optional |
FP_POINT |
Specifies the start point of the forward propagated operator in iteration traces, to record the start timestamp of forward propagation. This variable is required if training_trace is selected. Set the value to the name of the top operator in forward propagation. You can save the graph as a .pbtxt file by using tf.io.write_graph in the training script to obtain this name. Start the search from the top. The first node (excluding the data and storage nodes) is the fp_point operator. Data and storage nodes can be identified by the name or op field. Generally, operators with op set to Const, VariableV2, IteratorV2, Identity, Reshape, or Cast, or name containing step, Dataset, seed, or kernel should be excluded. The following figure shows the fp_point operator of ResNet-50. The operator may have been fused or renamed. To solve this problem, look up the name of the operator in the ge_proto_xxxxx_Build.txt graph generated by GE. If an exact match is found, the operator name can be used directly. If a fuzzy match is found (generally with a _1 suffix), the operator name in GE should be used. |
Optional |
BP_POINT |
Specifies the end point of the backward propagated operator in iteration traces, to record the end timestamp of backward propagation. BP_POINT and FP_POINT are used to compute the time used by forward and backward propagation. This variable is required if training_trace is selected. Set the value to the name of the bottom operator in backward propagation. You can save the graph as a .pbtxt file by using tf.io.write_graph in the training script to obtain this name. As for the bp_point operator, start the search from the bottom. The first node with gradient is the bp_point operator. The following figure shows the bp_point operator of ResNet-50. The operator may have been fused or renamed. To solve this problem, look up the name of the operator in the ge_proto_xxxxx_Build.txt graph generated by GE. If an exact match is found, the operator name can be used directly. If a fuzzy match is found (generally with a _1 suffix), the operator name in GE should be used. |
Optional |
AICPU_PROFILING_MODE |
Whether to collect the profile data of AI CPU data augmentation.
|
Optional |
HCCL_INTRA_PCIE_ENABLE |
Whether to use the PCIe path for communication between the Ascend AI Processors in a server in the Atlas 300T training card (model: 9000) scenario. Use this environment variable in conjunction with HCCL_INTRA_ROCE_ENABLE. HCCL_INTRA_PCIE_ENABLE and HCCL_INTRA_ROCE_ENABLE control only the communication mode between the Ascend AI Processors in a server in the Atlas 300T training card (model: 9000) scenario. The mode of communication between servers is fixed to RoCE path communication. The following describes the configuration combination of HCCL_INTRA_PCIE_ENABLE and HCCL_INTRA_ROCE_ENABLE.
NOTE:
HCCL_INTRA_PCIE_ENABLE and HCCL_INTRA_ROCE_ENABLE cannot be both set to 1. |
Optional |
HCCL_INTRA_ROCE_ENABLE |
Whether to use the RoCE path for communication between the Ascend AI Processors in a server in the Atlas 300T training card (model: 9000) scenario. |
Optional |
SKT_ENABLE |
Superkernel enable. If enabled, operator tasks are fused into one for delivery to accelerate task scheduling and network execution.
|
Optional |
OP_NO_REUSE_MEM |
Operator to skip in memory reuse. (Memory reuse is enabled by default.) The specified operator (or operators separated by commas) will use exclusively allocated memory.
|
Optional |
ENABLE_NETWORK_ANALYSIS_DEBUG |
Graph build failures ignore. If this environment variable is set (to any value), GE always returns a success even if graph build fails. In this way, the adapter can still deliver the graph to GE. |
Optional |
DUMP_GE_GRAPH |
Graph dump mode.
If this environment variable is not set, the generated built graph is dumped by default. |
Optional |
DUMP_GRAPH_LEVEL |
Graph to dump.
This environment variable takes effect only when DUMP_GE_GRAPH is enabled. |
Optional |