Job Profiling
- Delivering a Training Job
- Obtaining Training Traces
- Obtaining Task Traces
- HWTS Result of a Device (Chrome Trace)
- Task-Based HWTS Result of a Device
- AI Core Result of a Device
- HCCL Result of a Device
- Operator Result of a Device
- L2 Cache Result of a Device
- Degree of Parallelism of AI Core, AI CPU, and All Reduce
- AI Core Operator Count Table of a Device
- AI Core Operator Summary Result of a Device
- Profile Data Path of a Training Job
- Data Augmentation Result of a Device
Delivering a Training Job
Cloud Scenario
To view the job profiling result, it is necessary to deliver a training job first.
In Profiling-HUAWEI CLOUD interconnection, you will need to configure and enable related parameters when delivering a training job from ModelArts. For details, consult the HUAWEI CLOUD documentation.
Non-Cloud Scenario
In non-cloud scenarios, you will need to enable Profiling in the training script by referring to Environment Configuration in Non-Cloud Scenarios. The profile data and profiling result of a training job will vary according to the options to be traced.
After the training script is executed, the collected profile data is output to the /var/log/npu/profiling directory.
If you want to analyze the profile data with the hiprof.sh script, copy the profile data folder to the /usr/local/Ascend/toolkit/tools/profiler/profiler_data/result_dir directory and save it as Host_IP/profile_data_file. Then, refer to "Command Line Operations."
Obtaining Training Traces
Iteration Count of a Training Job
Function |
Outputs the iteration count of a training job. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --export_trace_data --total_count --job_id %s |
|||
Example |
bash hiprof.sh --export_trace_data --total_count --job_id=1234567890 This command outputs the iteration count of training job 1234567890. |
|||
Options |
Option |
Required or Not |
Description |
|
--export_trace_data |
Yes |
Exports the trace data. It is the first argument. |
||
--total_count |
Yes |
Exports trace data of the iteration count. |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "20", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Query result |
Number of iterations |
None |
Iteration, FP/BP Propagation, Data Augmentation Bound or Gradient Refresh Bound Elapsed Time of a Training Job
The profiling result of a training job is calculated based on these five timestamps: fp start, bp end, iteration end, all reduce start and all reduce end.
If you use a single device, the allReduces field output is empty.
Function |
Outputs the elapsed time of the iteration, data augmentation bound, FP/BP propagation, or gradient refresh bound of a training job. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --export_trace_data --type %d --page_index %d --page_length %d --job_id %s --sort_type %s |
|||
Example |
bash hiprof.sh --export_trace_data --type=0 --page_index=0 --page_length=20 --job_id=1234567890 --sort_type=asc This command exports the iteration elapsed time trace data (type=0) of training job 1234567890 to pages of 20 data records in ascending order and outputs the data records on page 0. |
|||
Options |
Option |
Required or Not |
Description |
|
--export_trace_data |
Yes |
Exports the trace data. It is the first argument. |
||
--type |
Yes |
0: iteration elapsed time 1: data augmentation bound elapsed time 2: FP/BP propagation elapsed time 3: gradient refresh bound elapsed time |
||
|
Yes |
Exports trace data to pages of --page_length (an integer greater than 1) data records and outputs data records on page --page_index (an integer starting at 0). |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--sort_type |
Yes |
Sets the output sorting:
|
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/training_trace_1234567890_0_asc.zip", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Query result |
Result |
None |
The training_trace_{job id}_{type id}_{sort_type}.zip file contains the output result in CSV and JSON formats.
Example of a dump file in JSON format and the key fields:
{"data": [{"hostId": "10.10.10.10", "FPStart": "1571037746512755832", "allReduces": [{"start": "1571037746549145332", "end": "1571037746553878832"}, {"start": "1571037746555040952", "end": "1571037746555340042"}], "BPEnd": "1571037746549004032", "deviceId": 0, "iterationEnd": "1571037746555656782", "targetValue": "42900950", "iterationId": 0}, {"hostId": "10.10.10.10", "FPStart": "1571037754090332812", "allReduces": [{"start": "1571037754126808852", "end": "1571037754131543382"}, {"start": "1571037754132693932", "end": "1571037754132995332"}], "BPEnd": "1571037754126669322", "deviceId": 0, "iterationEnd": "1571037754133283492", "targetValue": "42950680", "iterationId": 1}, {"hostId": "10.10.10.10", "FPStart": "1571037754331836732", "allReduces": [{"start": "1571037754368224552", "end": "1571037754373125292"}, {"start": "1571037754374262022", "end": "1571037754374565812"}], "BPEnd": "1571037754368084722", "deviceId": 0, "iterationEnd": "1571037754374851432", "targetValue": "43014700", "iterationId": 5}, {"hostId": "10.10.10.10", "FPStart": "1571037754392209412", "allReduces": [{"start": "1571037754428589632", "end": "1571037754433690622"}, {"start": "1571037754434829222", "end": "1571037754435127842"}], "BPEnd": "1571037754428446832", "deviceId": 0, "iterationEnd": "1571037754435415822", "targetValue": "43206410", "iterationId": 6}, {"hostId": "10.10.10.10", "FPStart": "1571037754493080402", "allReduces": [{"start": "1571037754529933212", "end": "1571037754534666012"}, {"start": "1571037754535863562", "end": "1571037754536109322"}], "BPEnd": "1571037754529790512", "deviceId": 1, "iterationEnd": "1571037754536414132", "targetValue": "43333730", "iterationId": 7}], "status": 0, "info": ""}
- hostId: IP address of the host
- deviceId: device ID
- iterationId: iteration ID
- targetValue: value of target type
- FPStart: FP start time
- BPEnd: BP end time
- iterationEnd: iteration end time
- allReduces: collective communication time. start and end indicate the start time and end time of an iteration.
Iteration Traces of a Device
Function |
Outputs the iteration traces of a device in an AI Server to a file. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d |
|||
Example |
bash hiprof.sh --save_file --type=trace_one --job_id=1234567890 --ip_address=192.168.0.10 --device_id=0 This command outputs the iteration traces of training job 1234567890 on device 0 in AI Server 192.168.0.10 to a file. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the trace data to a file. It is the first argument. |
||
--type |
Yes |
Selects the iteration trace type. Value: trace_one |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/training_trace_1234567890_192.168.0.10_0.zip", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Dump path, including the dump file name |
Result |
None |
The training_trace_{job id}_{host_ip}_{device_id}.zip file contains the output result in CSV and log formats.
- Example of a log dump file and the key fields:{"name": "Iteration Time 998", "cat": "Iteration Time", "ph": "B", "ts": 115011035497, "pid": 998, "tid": 0, "args": {"Iteration ID": 998, "FP Start": 11501103549760}}, {"ph": "E", "ts": 115011054542, "pid": 998, "tid": 0, "args": {"Iteration End": 11501105454232, "Iteration Time(10ns)": 1904472}}, {"name": "FP_BP Time 998", "cat": "FP_BP Time", "ph": "B", "ts": 115011035497, "pid": 998, "tid": 1, "args": {"Iteration ID": 998, "FP Start": 11501103549760}}, {"ph": "E", "ts": 115011054031, "pid": 998, "tid": 1, "args": {"BP End": 11501105403183, "FP_BP Time(10ns)": 1853423}}, {"name": "Grad_refresh Bound 998", "cat": "Grad_refresh Bound", "ph": "B", "ts": 115011054031, "pid": 998, "tid": 1, "args": {"Iteration ID": 998, "BP End": 11501105454232}}, {"ph": "E", "ts": 115011054542, "pid": 998, "tid": 1, "args": {"Iteration End": 11501105454232, "Grad_refresh Bound(10ns)": 51049}}, {"name": "Data_aug Bound 998", "cat": "Data_aug Bound", "ph": "s", "ts": 115011054542, "pid": 998, "tid": 0, "id": "Data_aug Bound 998", "args": {"Iteration ID": 998}}, {"name": "Data_aug Bound 998", "cat": "Data_aug Bound", "ph": "t", "ts": 115011055714, "pid": 999, "tid": 0, "id": "Data_aug Bound 998", "args": {"Data_aug Bound(10ns)": 117207}}, {"name": "Reduce_998_0", "cat": "Reduce", "ph": "B", "ts": 115011046801, "pid": 998, "tid": 2, "args": {"Iteration ID": 998, "reduce_start 0": 11501104680173}}, {"ph": "E", "ts": 115011050849, "pid": 998, "tid": 2, "args": {"Reduce End 0": 11501105084981}}, {"name": "Reduce_998_1", "cat": "Reduce", "ph": "B", "ts": 115011054049, "pid": 998, "tid": 2, "args": {"Iteration ID": 998, "reduce_start 1": 11501105404903}}, {"ph": "E", "ts": 115011054357, "pid": 998, "tid": 2, "args": {"Reduce End 1": 11501105435727}},
- Iteration ID: iteration ID
- FP Start: FP start time
- BP End: BP end time
- Iteration End: end time of the last gradient aggregation of an iteration
- Iteration Time: iteration elapsed time (= current Iteration End – previous Iteration End). The elapsed time of iteration 0 should be calculated as current Iteration End – current FP Start when the previous Iteration End is absent.
- FP_BP Time: FP/BP propagation elapsed time (= BP End – FP Start)
- Grad_refresh Bound: gradient refresh bound elapsed time (= Iteration End – BP End)
- Data_aug Bound: data augmentation bound elapsed time (= current FP Start – previous Iteration End). The elapsed time of iteration 0 is default to N/A when the previous Iteration End is absent.
- Reduce: collective communication elapsed time (may involve groups of iterations). ph: B indicates start time of an event, and ph: E indicates the end time. If there is only one device, no Reduce data is output.
- name: module name+ID
- cat: module name
- ph: event type, either B (start event) or E (end event).
- ts: start time (B event) or end time (E event).
- pid: iteration ID
- tid: each group of data in rows and columns
- To view the iteration traces in the dump file *.log, access chrome://tracing in the Chrome browser, and drop the file in blank space.
Iteration Traces of a Training Job
If a training job runs on different devices, the iteration traces with the same iteration ID on all devices will be exported.
Function |
Outputs all iteration traces of a specified training job to a file. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s |
|||
Example |
bash hiprof.sh --save_file --type=trace_batch --job_id=1234567890 This command outputs all iteration traces of training job 1234567890 to a file. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the trace data to a file. It is the first argument. |
||
--type |
Yes |
Selects the iteration trace type. Value: trace_batch |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/training_trace_1234567890.zip", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Dump path, including the dump file name |
Result |
None |
The training_trace_{job id}.zip file contains the iteration data files of the specified job ID on all devices. Each device corresponds to a CSV file and a log file. The file names contain the device ID.
- Example of a log file and the key fields:
{"name": "Iteration Time 0", "cat": "Iteration Time", "ph": "B", "ts": 416197863625, "pid": 0, "tid": 0, "args": {"Iteration ID": 0, "FP Start": 41619786362524}}, {"ph": "E", "ts": 416197925880, "pid": 0, "tid": 0, "args": {"Iteration End": 41619792588012, "Iteration Time(10ns)": 6225488}}, {"name": "FP_BP Time 0", "cat": "FP_BP Time", "ph": "B", "ts": 416197863625, "pid": 0, "tid": 1, "args": {"Iteration ID": 0, "FP Start": 41619786362524}}, {"ph": "E", "ts": 416197880779, "pid": 0, "tid": 1, "args": {"BP End": 41619788077960, "FP_BP Time(10ns)": 1715436}}, {"name": "Grad_refresh Bound 0", "cat": "Grad_refresh Bound", "ph": "B", "ts": 416197880779, "pid": 0, "tid": 1, "args": {"Iteration ID": 0, "BP End": 41619792588012}}, {"ph": "E", "ts": 416197925880, "pid": 0, "tid": 1, "args": {"Iteration End": 41619792588012, "Grad_refresh Bound(10ns)": 4510052}}, {"name": "Data_aug Bound 0", "cat": "Data_aug Bound", "ph": "s", "ts": 416197925880, "pid": 0, "tid": 0, "id": "Data_aug Bound 0", "args": {"Iteration ID": 0}}, {"name": "Data_aug Bound 0", "cat": "Data_aug Bound", "ph": "t", "ts": 416197927148, "pid": 1, "tid": 0, "id": "Data_aug Bound 0", "args": {"Data_aug Bound(10ns)": 126846}}, {"name": "Reduce_0_0", "cat": "Reduce", "ph": "B", "ts": 416197875063, "pid": 0, "tid": 2, "args": {"Iteration ID": 0, "reduce_start 0": 41619787506300}}, {"ph": "E", "ts": 416197921871, "pid": 0, "tid": 2, "args": {"Reduce End 0": 41619792187182}}, {"name": "Reduce_0_1", "cat": "Reduce", "ph": "B", "ts": 416197921888, "pid": 0, "tid": 2, "args": {"Iteration ID": 0, "reduce_start 1": 41619792188808}}, {"ph": "E", "ts": 416197925614, "pid": 0, "tid": 2, "args": {"Reduce End 1": 41619792561444}}
- Iteration ID: iteration ID
- FP Start: FP start time
- BP End: BP end time
- Iteration End: end time of the last gradient aggregation of an iteration
- Iteration Time: iteration elapsed time (= current Iteration End – previous Iteration End). The elapsed time of iteration 0 should be calculated as current Iteration End – current FP Start when the previous Iteration End is absent.
- FP_BP Time: FP/BP propagation elapsed time (= BP End – FP Start)
- Grad_refresh Bound: gradient refresh bound elapsed time (= Iteration End – BP End)
- Data_aug Bound: data augmentation bound elapsed time (= current FP Start – previous Iteration End). The elapsed time of iteration 0 is default to N/A when the previous Iteration End is absent.
- Reduce: collective communication elapsed time (may involve groups of iterations). ph: B indicates start time of an event, and ph: E indicates the end time. If there is only one device, no Reduce data is output.
- name: module name+ID
- cat: module name
- ph: event type, either B (start event) or E (end event).
- ts: start time (B event) or end time (E event).
- pid: iteration ID
- tid: each group of data in rows and columns
- To view the iteration traces in the dump file *.log, access chrome://tracing in the Chrome browser, and drop the file in blank space.
Obtaining Task Traces
HWTS Result of a Device (Chrome Trace)
Function |
Outputs the HWTS result of a training job in an iteration on a device in an AI Server. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d --iteration_id %d |
|||
Example |
bash hiprof.sh --save_file --type=hwts_one_iter --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 --iteration_id=0 This command outputs all HWTS result of training job 1234567890 in iteration 0 on device 0 in AI Server 192.168.1.1 to a file. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the HWTS data type. Value: hwts_one_iter |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--iteration_id |
Yes |
Sets the iteration ID. If ten iterations are delivered, the value range is [0, 9]. An integer should be less than or equal to the iteration number. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/hwts_log_1234567890_192.168.0.10_0_0.log", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Dump path, including the dump file name |
Result |
None |
- Description of key fields in dump file hwts_log_{job id}_{ai_server_ip}_{device_id}_{iteration_id}.log:
For example: {"device_id": 0, "traceEvents": [{"pid": 514, "ts": "76399450382", "dur": "35", "ph": "X", "name": "35258", "args": {"ms": "0.03511"}}, ......}
- pid: stream ID
- ts: start time of a current task
- dur: time taken by a current task (timestamp)
- ph: event type, fixed to X (complete event)
- name: ID of a current task
- args: time taken by a current task, in ms
- To view the HWTS dump file, access chrome://tracing in the Chrome browser, and drop the file in blank space.
Task-Based HWTS Result of a Device
Function |
Outputs the task-based HWTS result of a training job in an iteration on a device in an AI Server to a table. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d --iteration_id %d |
|||
Example |
bash hiprof.sh --save_file --type=get_task_total --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 --iteration_id=0 This command outputs all task-based HWTS result of training job 1234567890 in iteration 0 on device 0 in AI Server 192.168.1.1 to a table. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the task-based HWTS data type. Value: get_task_total |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
--iteration_id |
Yes |
Sets the iteration ID. If ten iterations are delivered, the value range is [0, 9]. An integer should be less than or equal to the iteration number. |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/1234567890_192.168.0.10_0_0_task_info.csv", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Dump path, including the dump file name |
Result |
None |
The content format of the task_info.csv file:
- kernel_name: kernel name
- kernel_type: kernel type
- stream_id: stream ID
- task_id: task ID
- task_time: time taken to execute a task
- task_start: task start time
- task_stop: task end time
AI Core Result of a Device
Function |
Outputs the AI Core result of a training job on a device in an AI Server. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d |
|||
Example |
bash hiprof.sh --save_file --type=get_ai_core_data --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 This command outputs all AI Core result of training job 1234567890 on device 0 in AI Server 192.168.1.1. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the AI Core data type. Value: get_ai_core_data |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/1234567890_192.168.1.1_0_ai_core_slice0.csv", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Dump path, including the dump file name |
Result |
None |
- The profiling result varies according to the AI Core metrics. For details about related parameters, see AI Core Metrics.
- By default, Profiling does not collect AI Core profile data. If you need the data, set ai_core_profiling to on in the profile.cfg file in the /var/log/npu/conf/profiling path.
HCCL Result of a Device
Function |
Outputs the HCCL result of a training job on a device in an AI Server to a file. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d --iteration_id %d |
|||
Example |
bash hiprof.sh --save_file --type=get_hccl_data --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 --iteration_id=0 This command outputs all HCCL result of training job 1234567890 in iteration 0 on device 0 in AI Server 192.168.1.1 to a file. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the HCCL data type. Value: get_hccl_data |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
--iteration_id |
Yes |
Sets the iteration ID. If ten iterations are delivered, the value range is [0, 9]. An integer should be less than or equal to the iteration number. |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/HCCL_1234567890_192.168.1.1_0.log", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Dump path, including the dump file name |
Result |
None |
- If you use multiple devices, HCCL profile data will be not collected.
- Key fields in the dump file HCCL_{job id}_{ai_server_ip}_{device_id}.log:
For example: {"device_id": 0, "traceEvents": [{"pid": 514, "ts": "76399450382", "dur": "35", "ph": "X", "name": "35258", "args": {"ms": "0.03511"}}, ......
{"device_id": 0, "traceEvents": [{"name": "Iteration 1", "cat": "Iteration", "ph": "B", "ts": 10397064891, "pid": "Iteration", "tid": "Iteration", "args": {"Iteration ID": 1, "FP Start": 1039706489178}}, ......}
- name: module name, which may include Iteration+ID, Reduce+ID, Step+ID, Stage+ID and Task+ID.
- cat: module name, which may include Iteration, Reduce, Stage, Step, and Task.
- ph: event type, either B (start event) or E (end event).
- ts: start time (B event) or end time (E event).
- pid: process name, which may include Iteration, Reduce, Stage, Step, and Task.
- tid: task ID for a Task process, process name for an Iteration or Reduce process, or process count for Stage or Step process.
- args: the field displayed varies according to the process type. The preceding field displays the iteration ID, module start time, module end event, and time taken by the module (unit: 10 ns).
- To view the HCCL dump file, access chrome://tracing in the Chrome browser, and drop the file in blank space.
Operator Result of a Device
Function |
Outputs the basic operator result of a training job on a device in an AI Server to a table. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d |
|||
Example |
bash hiprof.sh --save_file --type=get_ge_basic_data --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 This command outputs all basic operator result of training job 1234567890 on device 0 in AI Server 192.168.1.1 to a table. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the basic operator data type. Value: get_ge_basic_data |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/ge_basic_123456789_0.csv", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Dump path, including the dump file name |
Result |
None |
The columns in the result file are described as follows:
- Model Name: model name. If null, it maybe empty in the basic data.
- Op Name: operator name
- Op Type: operator type
- Task ID: task ID
- Block Dim: number of cores to execute a task
- Stream ID: stream task ID
- Input Count: number of inputs
- Input Formats: input formats
- Input Shapes: input shapes
- Input Data Types: input data types
- Output Count: number of outputs
- Output Formats: output formats
- Output Shapes: output shapes
- Output Data Types: output data types
L2 Cache Result of a Device
Function |
Outputs the L2 cache result of a training job on a device in an AI Server to a table. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d |
|||
Example |
bash hiprof.sh --save_file --type=get_l2_cache_data --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 This command outputs all L2 cache result of training job 1234567890 on device 0 in AI Server 192.168.1.1 to a table. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the L2 cache data type. Value: get_ge_model_load_data |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/1234567890_192.168.1.1_0_l2_cache.csv", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Dump path, including the dump file name |
Result |
None |
By default, Profiling does not collect L2 cache data. If you need the data, set l2CacheTaskProfiling to on in the profile.cfg file in the /var/log/npu/conf/profiling path.
L2 cache profiling events to collect (more from Huawei development team):
- 0x5B: number of AI Core L2 hits received by DHA
- 0x59: number of AI Core requests received by DHA
- 0x5D: number of L2 hits received by DHA
- 0x7A: number of last read requests from AI Core
- 0x7B: number of invalid read requests from AI Core
- 0x7C: number of no writeback requests from AI Core
- 0x7D: number of read requests from AI Core
- 0x7E: number of write requests from AI Core
- 0x71: number of allocate requests from AI Core
- 0x5C: number of AI Core victim caches received by DHA (L2 replacement)
- 0x77: number of victim caches not generated by AI Core
- 0x78: number of L2 entries that AI Core replaces cnt=0
- 0x79: number of L2 entries that AI Core replaces cnt=1
0x5B, 0x59, 0x5C 0x7D, 0x7E, 0x71, 0x79, and 0x7C are default profiling metrics.
By default, the profiling result of L2 cache is exported to a file. Some column names in the example file are described as follows:
- job_id: training job ID
- host_id: device ID of the training job
- device_id: device ID
- task_type: task type
- stream_id: stream ID
- task_id: task ID
- hit_rate: percentage of number of AI Core L2 hits to AI Core requests
- victim_rate: percentage of number of AI Core victim caches to AI Core requests
- op_name: operator name
Degree of Parallelism of AI Core, AI CPU, and All Reduce
Function |
Outputs the degree of parallelism of AI Core, AI CPU, and AllReduce of a training job in an iteration on a device in an AI Server. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d --iteration_id %d |
|||
Example |
bash hiprof.sh --save_file --type=get_core_cpu_reduce --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 --iteration_id=0 This command outputs the degree of parallelism of AI Core, AI CPU, and AllReduce of training job 1234567890 in iteration 0 on device 0 in AI Server 192.168.1.1. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the parallelism of AI Core, AI CPU, and AllReduce data type. Value: get_core_cpu_reduce |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
--iteration_id |
Yes |
Sets the iteration ID. The value is an integer less than or equal to the iteration number. If ten iterations are delivered, the value range is [0, 9]. |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/core_cpu_reduce_1234567890_0_0.log", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Query result |
Result |
None |
To view the dump file, access chrome://tracing in the Chrome browser, and drop the file in blank space.
The format of a dump file is as follows:
{"device_id": 0, "traceEvents": [{"name": " atomic_addr_clean0_131", "pid": "aicore", "tid": 550, "ts": 358989884.48, "dur": 0.01, "ph": "X"}, ...]}
- name: module name, selected from operator name, AI CPU stage name, and AllReduce name.
- ph: event type, X (event duration)
- ts: start time of the current event.
- dur: duration of the event from start to end.
- pid: process name, seleced from aicore, aicpu, and all_reduce.
- tid: stream ID if pid is aicore. For aicpu and all_reduce, the values of both tid and pid are same.
AI Core Operator Count Table of a Device
Function |
Outputs the AI Core operator count table of a training job in an iteration on a device in an AI Server. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d --iteration_id %d |
|||
Example |
bash hiprof.sh --save_file --type=op_counter --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 --iteration_id=0 This command outputs the AI Core operator count table of training job 1234567890 in iteration 0 on device 0 in AI Server 192.168.1.1. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the AI Core operator count table data type. Value: op_counter |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
--iteration_id |
Yes |
Sets the iteration ID. The value is an integer less than or equal to the iteration number. If ten iterations are delivered, the value range is [0, 9]. |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/1234567890_192.168.1.1_0_0_ai_core_op_statistic.csv", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Query result |
Result |
None |
The columns in the AI Core operator count table are as follows:
- Model Name: model name. If null, it maybe empty in the basic data.
- Op Type: operator type
- Core Type: core type
- Count: number of times that an operator is called
- Total Time: time taken to call an operator
- Avg Time, Min Time, and Max Time: average, minimum, and maximum time taken to call an operator
- Ratio: percentage of time taken by an operator to the corresponding model
AI Core Operator Summary Result of a Device
Function |
Outputs the AI Core operator summary result of a training job in an iteration on a device in an AI Server. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --type %s --job_id %s --ip_address %s --device_id %d --iteration_id %d |
|||
Example |
bash hiprof.sh --save_file --type=ai_core_op_summary --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 --iteration_id=0 This command outputs the AI Core operator summary result of training job 1234567890 in iteration 0 on device 0 in AI Server 192.168.1.1. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--type |
Yes |
Selects the AI Core operator summary data type. Value: ai_core_op_summary |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
--iteration_id |
Yes |
Sets the iteration ID. The value is an integer less than or equal to the iteration number. If ten iterations are delivered, the value range is [0, 9]. |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/1234567890_192.168.1.1_0_0_ai_core_op_summary.csv", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Query result |
Result |
None |
The columns in the AI Core operator summary file are as follows:
- Model Name: model name. If null, it maybe empty in the basic data.
- Task ID: task ID
- Stream ID: stream task ID
- Infer ID: sequence number of an inference
- Op Name: operator name
- Op Type: operator type
- Task Type: task type
- Task Start Tim: time when a task starts
- Task Duration: time taken by a task
- Task Wait Time: interval between two tasks
- Block Dim: number of cores to execute a task
- Input Shapes: input shapes
- Input Data Types: input data types
- Input Formats: input formats
- Output Shapes: output shapes
- Output Data Types: output data types
- Output Formats: output formats
- aicore_time: time taken to execute all instructions
- total_cycles: number of cycles taken to execute all instructions
- mac_fp16_ratio: percentage of cycles taken to execute Cube fp16 instructions
- vec_ratio: percentage of cycles taken to execute Vector instructions
- scalar_ratio: percentage of cycles taken to execute Scalar instructions
- mte1_ratio: percentage of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement)
- mte2_ratio: percentage of cycles taken to execute MTE2 instructions (DDR-to-AI Core movement)
- mte3_ratio: percentage of cycles taken to execute MTE3 instructions (AI Core-to-DDR movement)
- l2_read_bw: L2 read bandwidth (GB/s)
- l2_write_bw: L2 write bandwidth (GB/s)
The profiling result varies according to the AI Core metrics. For details about related parameters, see AI Core Metrics.
Profile Data Path of a Training Job
Function |
Outputs the profile data path of a training job. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --save_file --get_data_path --job_id %s |
|||
Example |
bash hiprof.sh --save_file --get_data_path --job_id=1234567890 This command outputs all profile data paths of training job 1234567890. |
|||
Options |
Option |
Required or Not |
Description |
|
--save_file |
Yes |
Saves the result to a file. It is the first argument. |
||
--get_data_path |
Yes |
Exports the data path to a file. |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
Output Syntax |
JSON string: {"data": [], "status": 0, "info": ""} |
|||
Output Example |
{"data": "/usr/local/Ascend/toolkit/tools/profiler/profiler_data/json/raw_data_path_1234567890.log", "status": 0, "info": ""} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Query result |
Result |
None |
Data Augmentation Result of a Device
Function |
Outputs the data augmentation result of a training job on a device in an AI Server. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --export_dp --job_id %s --ip_address %s --device_id %d --iteration_id %d --dp_type %s --batch_size %d |
|||
Example |
bash hiprof.sh --export_dp --job_id=1234567890 --ip_address=192.168.1.1 --device_id=0 --iteration_id=0 --dp_type=aicpu This command outputs the AI CPU data argumentation result of training job 1234567890 in iteration 0 on device 0 in AI Server 192.168.1.1. |
|||
Options |
Option |
Required or Not |
Description |
|
--export_dp |
Yes |
Exports the result. It is the first argument. |
||
--job_id |
Yes |
Sets the training job ID. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
device_id |
Yes |
Sets the device ID. Value range: {0, 1, 2, 3, 4, 5, 6, 7} |
||
--iteration_id |
Yes |
Sets the iteration ID. If ten iterations are delivered, the value range is [0, 9]. An integer should be less than or equal to the iteration number. |
||
--dp_type |
Yes |
Selects the module of the augmented result to report. The value can be aicpu, tdt, or dp. |
||
--batch_size |
Not |
Sets the batch size. It is required when dp_type is set to tdt. Otherwise, you do not need to set this parameter. The value is a positive integer. |
||
Output Syntax |
Table |
|||
Output Example |
bash hiprof.sh --export_dp --job_id=1000000001 --ip_address=10.10.10.10 --iteration_id=2 --dp_type=dp --device_id 0 Timestamp Action Source Cached Buffer Size ----------- ------------------ ---------------- -------------------- 3845137112 Last queue dequeue iterator_default 133 bash hiprof.sh --export_dp --job_id=1000000001 --ip_address=10.10.10.10 --iteration_id=2 --dp_type=tdt --device_id 0 --batch_size 32 Timestamp Action Source Cached Buffer Size ----------- ------------------ -------- -------------------- 3827655079 Enqueue data to dp train 102 3827655097 Enqueue data to dp train 101 3827656224 Enqueue data to dp train 128 3827656357 Enqueue data to dp train 128 3827656448 Enqueue data to dp train 128 3827656515 Enqueue data to dp train 128 3827656579 Enqueue data to dp train 128 3827656642 Enqueue data to dp train 128 3827656717 Enqueue data to dp train 128 3827656804 Enqueue data to dp train 128 3827656923 Enqueue data to dp train 128 3827657017 Enqueue data to dp train 128 3827657105 Enqueue data to dp train 128 3827657377 Enqueue data to dp train 128 3827657438 Enqueue data to dp train 128 bash hiprof.sh --export_dp --job_id=1000000001 --ip_address=10.10.10.10 --iteration_id=2 --dp_type=aicpu --device_id 0 Timestamp Node Compute_time(ms) Memcpy_time(ms) Task_time(ms) Dispatch_time(ms) Total_time(ms) ----------- ------------------------- ------------------ ----------------- --------------- ------------------- ---------------- 3845137027 IteratorV2202008122203120 0.159 0.860 1.054 0.036 1.226 |
Description of the key fields:
- Timestamp: timestamp of an event
- Action: action of an event
- Source: event source
- Cached Buffer Size: buffer cache size occupied by an event
- Node: node name of a task
- Compute_time(us): compute time
- Memcpy_time(us): memory copy time
- Task_time(us): time taken to execute a task
- Dispatch_time(us): time taken to dispatch a task
- Total_time(us): total time
- Delivering a Training Job
- Obtaining Training Traces
- Obtaining Task Traces
- HWTS Result of a Device (Chrome Trace)
- Task-Based HWTS Result of a Device
- AI Core Result of a Device
- HCCL Result of a Device
- Operator Result of a Device
- L2 Cache Result of a Device
- Degree of Parallelism of AI Core, AI CPU, and All Reduce
- AI Core Operator Count Table of a Device
- AI Core Operator Summary Result of a Device
- Profile Data Path of a Training Job
- Data Augmentation Result of a Device