Profiling
Overview
The system can collect profile data during training and use the Profiling tool to analyze performance. Currently, the following options can be traced in profiling.
- training_trace: iteration tracing. Collects software profile data of a training job and the AI Software Stack to profile the training job. Focuses on data augmentation, forward and backward propagation, and gradient aggregation and update.
- task_trace: task tracing. Collects the HWTS and AI Core hardware information of the Ascend AI Processor and the start and end of each task.
- op_trace: single-operator tracing. To do so, you need to construct a single-operator network and train the network using a training script. This option is exclusive with training_trace and task_trace.
By default, profile data is not collected during training. To collect profile data, modify the training script as follows.
Enabling Tracing with Estimator
In Estimator mode, use profiling_config, an NPURunConfig configuration option, to enable tracing.
from npu_bridge.estimator.npu.npu_config import NPURunConfig from npu_bridge.estimator.npu.npu_config import ProfilingConfig profiling_options = ['task_trace','training_trace'] profiling_config = ProfilingConfig(enable_profiling=True, enable_options = profiling_options, fp_point="resnet_v1_50_1/conv1/Conv2D", bp_point="add_1") session_config=tf.ConfigProto() config = NPURunConfig(profiling_config=profiling_config, session_config=session_config)
To collect the AI CPU augmentation profile data, set AICPU_PROFILING_MODE to true.
export AICPU_PROFILING_MODE=true
Enabling Tracing with sess.run()
In sess.run() mode, use profiling_mode and profiling_options, session configuration options, to enable tracing.
custom_op = config.graph_options.rewrite_options.custom_optimizers.add() custom_op.name = "NpuOptimizer" custom_op.parameter_map["use_off_line"].b = True custom_op.parameter_map["profiling_mode"].b = True custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes("task_trace:training_trace") custom_op.parameter_map["fp_point"].s = tf.compat.as_bytes("resnet_v1_50_1/conv1/Conv2D") custom_op.parameter_map["bp_point"].s = tf.compat.as_bytes("add_1") config.graph_options.rewrite_options.remapping = RewriterConfig.OFF # Disable remapping. with tf.Session(config=config) as sess: sess.run()
To collect the AI CPU augmentation profile data, set AICPU_PROFILING_MODE to true.
export AICPU_PROFILING_MODE=true
More Methods
In addition to the preceding two methods, you can modify the environment variables in the startup script to enable Profiling.
export PROFILING_MODE=true export PROFILING_OPTIONS=training_trace:task_trace export FP_POINT=resnet_v1_50_1/conv1/Conv2D export BP_POINT=add_1
To collect the AI CPU augmentation profile data, set AICPU_PROFILING_MODE to true.
export AICPU_PROFILING_MODE=true
Viewing and Analyzing Profile Data
After the training is complete, check whether a folder prefixed with JOB is generated to the /var/log/npu/profiling/ directory. Then, analyze the profile data by referring to Profiling Tool Instructions.