Training Jobs Management
Obtaining an ID List
Function |
Outputs the ID list of training jobs. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --get_job_list --search_list %s --display_type %s |
|||
Example |
bash hiprof.sh --get_job_list --search_list=all --display_type=table This command outputs the IDs of all training jobs and displays the IDs in a table. |
|||
Options |
Option |
Required or Not |
Description |
|
--get_job_list |
Yes |
Outputs the ID list of training jobs. It is the first argument. |
||
--search_list |
Yes |
Sets the training job type in the ID list:
|
||
--display_type |
Yes |
Sets the display format of the ID list:
|
||
Output Syntax |
JSON string: {"status": 0, "info": "", "data": []} In a table: |
|||
Output Example |
JSON string: {"status": 0, "info": "", "data": "System Profiling: {'10.154.203.82': [f64bc1c8-daa9-11e9-8646-286ed488d904]} ,Job Profiling: {'10.154.203.82': ['1000000080', '1000000078']}"} In a table: |
|||
Output Fields (JSON Format) |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Query result |
Result |
None |
- Even if --search_list and --display_type are required options, they can be left blank at the same time. In this case, their default values are all and table, respectively.
- If a training job fails, alarm information similar to that in the following figure will be displayed.
Deleting Profiling Result of a Job ID
Function |
Deletes the profiling result of a job ID. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --remove_job_data --job_id %s |
|||
Example |
bash hiprof.sh --remove_job_data --job_id=1000000078 This command deletes the result of job 1000000078. |
|||
Options |
Option |
Required or Not |
Description |
|
--remove_job_data |
Yes |
Deletes a training job. It is the first argument. |
||
--job_id |
Yes |
Sets the job ID to be deleted. A job ID is in the format required by system profiling or job profiling. |
||
Output Syntax |
JSON string: {"status": 0, "info": "", "data": []} |
|||
Output Example |
{"status": 0, "info": "Remove the data of 1000000078"} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |
|
data |
Query result |
Result |
None |
Before running the --remove_job_data command, stop the Profiling process by referring to Profiling Command Reference.
Stopping Profiling a Training Job
Function |
Stops profiling a training job. |
|||
---|---|---|---|---|
Syntax |
bash hiprof.sh --stop --job_id %s --ip_address %s |
|||
Example |
bash hiprof.sh --stop --job_id=1000000078 --ip_address=192.168.1.1 This command stops profiling training job 1000000078 on AI Server 192.168.1.1. |
|||
Options |
Option |
Required or Not |
Description |
|
--stop |
Yes |
Stops profiling a training job. |
||
--ip_address |
Yes |
Sets the IP address of the AI Server. |
||
--job_id |
Yes |
Sets the job ID to be stopped. A job ID is in the format required by system profiling or job profiling. |
||
Output Syntax |
JSON string: {"status": 0, "info": ""} |
|||
Output Example |
{"status": 0, "info": "Successfully stopped 1000000078"} |
|||
Output Fields |
Field |
Description |
On Success |
On Failure |
status |
Command execution result |
0 |
1 |
|
info |
Command execution description |
None |
Cause of failure |