Single-Operator Profiling
Collect profile data over a single-operator with the Profiling script and view the profiling results in the CLI. With the results, you can identify opportunities for performance improvement by locating where a single-operator becomes sluggish and analyzing the generated code.
This section describes how to use the Profiling tool. HwHiAiUser is used as the installation user and /home/HwHiAiUser/Ascend/ascend-toolkit/latest is used as the default installation path. Replace them as required.
Before starting Profiling, build and run the operator project to generate an executable file of the project.
Collecting Profile Data
Perform the following steps to collect profile data:
- Log in to the system as the HwHiAiUser user created during installation.
- Copy the executable file and model files (all files are in the out directory) of the operator project to the operating environment. Ensure that the owner of the copied files is HwHiAiUser.
scp -r /home/HwHiAiUser/AscendProjects/MyOperator/out HwHiAiUser@x.x.x.x:/home/HwHiAiUser/HIAI_PROJECTS/MyOperator
In this section, paths and file names in the commands are examples only. Replace them with the actual paths and file names.
- /home/HwHiAiUser/AscendProjects/MyOperator/out: path of the generated executable file after building. It is advised to copy the entire operator project folder to avoid missing some reference files.
- /home/HwHiAiUser/HIAI_PROJECTS/MyOperator: path of the executable file in the operating environment. Replace MyOperator with the actual project name.
- x.x.x.x: IP address of the host for Ascend EP. IP address of the board environment for Ascend RC.
- Go to the directory of the hiprof.pyc script, for example, /home/HwHiAiUser/Ascend/ascend-toolkit/latest/toolkit/tools/profiler/profiler_tool/analysis/command.
- Set the environment variable for the hiprof command in the development environment.
export LD_LIBRARY_PATH=/home/HwHiAiUser/Ascend/ascend-toolkit/latest/toolkit/lib64:${LD_LIBRARY_PATH}
If .so files such as libgflags.so.2.2 cannot be found when the single-operator is running, fix the error by referring to Profiling Fails Due to App Execution Error.
- Collect profile data over software and hardware modules. For details about the command-line options, see Table 7-2.
- Example of executing a single-operator without arguments (with app_dir)
python3.7.5 hiprof.pyc --ip_address=x.x.x.x:port --result_dir=/home/HwHiAiUser/tools/out/ --profiling_options=op_trace --app_dir=/home/HwHiAiUser/HIAI_PROJECTS/MyOperator/out --app=main
- Example of executing an application project without arguments (without app_dir)
python3.7.5 hiprof.pyc --ip_address=x.x.x.x:port --result_dir=/home/HwHiAiUser/tools/out/ --profiling_options=op_trace --app=/home/HwHiAiUser/HIAI_PROJECTS/MyAppname/out/main
- Example of executing a single-operator with arguments
In this case, the executable file name and arguments must be enclosed in double quotation marks (""). In the following example, benchmark is the single-operator executable file name, and --om and --dataDir are the additional arguments. Format of the arguments must be consistent with those in the single-operator executable file.
python3.7.5 hiprof.pyc --ip_address=x.x.x.x:port --result_dir=/home/HwHiAiUser/tools/out/ --profiling_options=op_trace --app_dir=/home/HwHiAiUser/HIAI_PROJECTS/MyOperator/out --app="benchmark --om model/resnet50_aipp_b8_fp16_output_FP32.om --dataDir datasets/ImageNet2012-1024/"
- Command-line Profiling supports prefix matching. A correct and unambiguous prefix for any command-line option can trigger proper execution.
For example, --profiling_option will be matched as --profiling_options.
- The command must be entered in English input method. Ensure that the space is in correct format. Otherwise, the command execution may fail.
- Options in the command must be assigned with values. Otherwise, an exception is reported. This is a native Python problem.
- The value of the --app option cannot contain the following special characters in the double quotation marks: [';*?`!#$%^&+=<>{}]|"
If a custom option contains the preceding special characters, write the corresponding execution statements to the executable file, start the application project using the executable file, and use the name and path of the executable file as the values of --app and --app_dir.
- If you press Ctrl+C to stop profiling after it has been started, wait for 10s before running profiling command again. Otherwise, the execution may fail.
- If the error message Data folder is locked is reported during the command execution, the profiling command may exit abnormally last time. Delete the files in the output result folder specified by result_dir and run the command again.
- Here is a quick tip. Create an alias for the hiprof.pyc script with the command alias hiprof='python3.7.5 /home/HwHiAiUser/Ascend/ascend-toolkit/latest/toolkit/tools/profiler/profiler_tool/analysis/command/hiprof.pyc' as the HwHiAiUser user. Then, you can start profiling with the shortcut hiprof in any directory.
Table 7-2 Command-line optionsOption
Description
--result_dir
(Required) Directory of the profiling result file. If it exists, add a suffix .old to the directory name. If not, create one. The path must be an absolute path.
The application running user must have the read and write permissions on the path specified by result_dir. Set the specified path to the home directory of the user, for example, /home/HwHiAiUser/tools/out.
Note: This path must not be a path occupied by other applications. Otherwise, the folder name will be changed and thus the application execution will be affected.
--ip_address
(Required) IP address and port number of the operating environment. The port number is optional, default to 22118.
IP address of the host for Ascend EP. IP address of the board environment for Atlas 200 DK.
--profiling_options
(Required) Option to trace, fixed at op_trace.
--app
(Required) Application executable file in the operating environment. Set to a file name or a full path. For example:
- Set to a file name: --app=main.
- Set to a full path: --app=/home/HwHiAiUser/HIAI_PROJECTS/MyApp2019/out/main.
If app has arguments, enclose the arguments in double quotation marks (""). For example, --app="main parameters1 parameters2 parameters3".
If the arguments are set to the executable file name, app_dir is required.
--app_dir
(Optional) Path for storing the application executable file in the operating environment.
Example: --app_dir=/home/HwHiAiUser/HIAI_PROJECTS/MyApp2019/out
NOTE:- If you change the default value of WORK_PATH in the ide_daemon.cfg file in the ADA module, do not use ~ to indicate the home directory of the ADA running user in the operating environment. Instead, you need to write the absolute path of the application executable file.
- Ensure that the ADA running user in the operating environment has the read and write permissions on the path of the application executable file.
--devices
(Optional) Device ID (or device IDs separated by commas), default to 0.
--app_location
(Optional) Execution target of an application project or a single-operator, either host (default) or device.
NOTE:If this option is set to device, the --devices option must be set to the ID of the device that stores the application project or single-operator.
--ai_core_profiling
(Optional) AI Core profiling switch, either on (default) or off.
--ai_core_profiling_mode
(Optional) AI Core profiling mode, either task-based (default) or sample-based.
In task-based mode, profile data is collected task by task; in sample-based mode, profile data is collected at a fixed interval.
To collect AI Core profile data, set --ai_core_profiling to on.
--aicore_sampling_interval
(Optional) AI Core sampling interval (ms), default to 10. The value range is [10, 1000].
--ai_core_metrics
(Optional) AI Core metrics: aicoreArithmeticThroughput, aicorePipeline, aicoreSynchronization, aicoreMemoryBandwidth, aicoreInternalMemoryBandwidth, aicorePipelineStall, and aicoreMetricsAll. For details about the AI Core metrics, see AI Core Metrics.
- aicoreArithmeticThroughput: percentages of arithmetic throughput.
- aicorePipeline (default): percentages of time taken by the compute units and MTEs.
- aicoreSynchronization: percentages of synchronization instructions.
- aicoreMemoryBandwidth: percentages of external memory read/write instructions.
- aicoreInternalMemoryBandwidth: percentages of internal memory read/write instructions.
- aicorePipelineStall: percentages of pipeline stall instructions.
- aicoreMetricsAll: all the preceding metrics, including aicoreArithmeticThroughput, aicorePipeline, aicoreSynchronization, aicoreMemoryBandwidth, aicoreInternalMemoryBandwidth, and aicorePipelineStall. This option is available only when --profiling_options is set to op_trace.
--app_env
(Optional) A custom environment variable required in the operating environment during profiling.
Enclose the arguments with double quotation marks (""). Separate multiple ones with semicolons (;).
Example: --app_env="LD_LIBRARY_PATH=/home/HwHiAiUser/Ascend/ascend-toolkit/latest/acllib/lib64"
- Example of executing a single-operator without arguments (with app_dir)
Viewing the Profiling Results
- Runtime API calls
The information in the figure is described as follows:
- Name: API name
- Stream ID: stream ID of an API
- Time (%): percentage of time taken by an API
- Time (ns): time taken by an API
- Calls: number of API calls
- Avg, Min, Max: average, minimum, and maximum time taken by API calls
In Figure 7-27, N/A in the StreamID column indicates that the API is directly called and does not belong to any stream.
- Task SchedulerFigure 7-28 Task Scheduler analysis result
The information in the figure is described as follows:
- Time(%): percentage of time taken by a task
- Time(ns): time taken by a task
- Count: number of task execution times
- Avg, Min, Max: average, minimum, and maximum time
- Waiting: total wait time of a task
- Running: total run time of a task. If a task has been running for a long time, the operator implementation may be incorrect.
- Pending: total pending time of a task
- Type: task type
- API: API name
- Task ID: task ID
- Op Name: operator name
- Stream ID: stream ID
- AI Core (--ai_core_profiling_mode and --ai_core_metrics are both set to default values, as shown in Figure 7-29.)
The analysis result of AI Core metrics is described as follows:
- Task ID: task ID
- Stream ID: stream ID
- Op Name: operator name
- aicore_time: time taken to execute all instructions
- total_cycles: number of cycles taken to execute all instructions
- vec_time: time taken to execute Vector instructions
- vec_ratio: percentage of cycles taken to execute Vector instructions
- mac_time: time taken to execute Cube instructions
- mac_ratio: percentage of cycles taken to execute Cube instructions
- scalar_time: time taken to execute Scalar instructions
- scalar_ratio: percentage of cycles taken to execute Scalar instructions
- mte1_time: time taken to execute MTE1 instructions (L1-to-L0A/L0B movement)
- mte1_ratio: percentage of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement)
- mte2_time: time taken to execute MTE2 instructions (DDR-to-AI Core movement)
- mte2_ratio: percentage of cycles taken to execute MTE2 instructions (DDR-to-AI Core movement)
- mte3_time: time taken to execute MTE3 instructions (AI Core-to-DDR movement)
- mte3_ratio: percentage of cycles taken to execute MTE3 instructions (AI Core-to-DDR movement)
- icache_miss_rate: I-Cache miss rate
- memory_bound: identifies a memory bottleneck when the AI Core is computing operators, calculated as: mte2_ratio/max(mac_ratio, vec_ratio). If the value is less than 1, no memory bottleneck exists. Otherwise, a memory bottleneck exists.
- AscendCL Module, operators, and Runtime APIFigure 7-30 AscendCL Module analysis result
- Name: AscendCL API name
- Type: AscendCL API type
- Start Time: AscendCL API start time
- Duration: time taken to run an AscendCL API
- Process ID: process ID of an AscendCL API
- Thread ID: thread ID of an AscendCL API
- Time taken by each module in the inference workflowFigure 7-31 Top-down information
The information in the figure is described as follows:
- Infer ID: inference iteration ID
- Module Name: module name
- API: API name
- Start Time: start time
- Duration: total time
Suggestions for Effective Profiling Results
In the large amount of profile data collected and analyzed by the Profiling tool, the following ones deserve more attention.
aicorePipeline analysis result:
- mac_ratio: percentage of cycles taken to execute Cube instructions
- vec_ratio: percentage of cycles taken to execute Vector instructions
- scalar_ratio: percentage of cycles taken to execute Scalar instructions
- mte1_ratio: percentage of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement)
- mte2_ratio: percentage of cycles taken to execute MTE2 instructions (DDR-to-AI Core movement)
- mte3_ratio: percentage of cycles taken to execute MTE3 instructions (AI Core-to-DDR movement)
aicoreMemoryBandwidth analysis result:
- main_mem_read_bw: main memory read bandwidth (GB/s)
- main_mem_write_bw: main memory write bandwidth (GB/s)
Based on the preceding analysis and feedback of other users, the following suggestions on performance optimization are for reference.