Analyzing AI Core Errors
Setting Environment Variables
The tool depends on the ADC and CCE. You need to configure the following environment variables on the server where you execute model conversion:
- ADC environment variable
Add the ADC installation path under the Toolkit installation path to the PATH variable:
export install_path=/home/HwHiAiUser/Ascend/ascend-toolkit/latest # Replace it with the actual installation path. export PATH=${install_path}/toolkit/bin:$PATH
- CCE environment variable
Add the CCE installation path under the ATC installation path to the PATH variable:
export install_path=/home/HwHiAiUser/Ascend/ascend-toolkit/latest # Replace it with the actual installation path. export PATH=${install_path}/atc/ccec_compiler/bin:${install_path}/atc/bin:$PATH
Starting AI Core Error Analyzer
Option |
Short Form |
Required/Optional |
Description |
---|---|---|---|
--remote_host |
-host |
Required for remote training |
IP address and port number of the remote host in the remote training scenario. The port number is default to 22118. |
--compile_path |
-c |
Required |
ATC command execution path |
--output |
-out |
Optional |
Output path. The AI Core Error report will be generated to this path. If not specified, the current path is used. |
- In the remote inference scenario, the compile_path path is looked up locally and then in the remote host.
- Replace the xx.xx.xx.xx argument of the remote_host option with the actual IP address.
The AI Core Error Analyzer can help you locate AI Core errors locally or remotely. Start it by running the startup script from the command line.
Go to the script directory: {Toolkit installation path}/toolkit/tools/msaicerr, for example, /usr/local/Ascend/toolkit/tools/msaicerr
- Local scenario:
$ python3 msaicerr.pyc --compile_path /home/bl/Project/aicerror_data/compile_path_infer --output local_infer
- Remote scenario:
$ python3 msaicerr.pyc --remote_host xx.xx.xx.xx:22118 --compile_path /home/gzj/app/model_convert
Viewing Analysis Result
The outputs of the AI Core Error Analyzer are generated to the info_xxxx directly specified by --output.
├── aicerror_xxxxx //AI Core Error Analyzer outputs │ ├──info.txt //AI Core Error Analyzer analysis result summary │ ├──te_transdata_xxxx.o │ ├──te_transdata_xxxx.o.txt //Decompilation file ├── collection //Error operator files │ ├──compile │ ├──kernel_meta │ ├──CCE code file │ ├──JSON code file │ ├──loc.json file │ ├──.o file │ ├──hisi_logs //Black box errors │ ├──slog ├──error.log //ERROR-level log messages in the log directory ├──imas.log //IMAS log messages of GE