Locating AI Core Errors
You can locate AI Core errors from the command line or the info.txt file.
********************Root cause conclusion********************* # Gives the root cause if the error matches known error patterns. ***********************1. Basic information******************** # Gives the basic information about the device having the AI Core error. #kernel name: operator kernel name #op address: address of the operator code in the DDR #args address: address of the operator arguments in the DDR error time : 2020-08-26-11:24:07 device id : 0 core id : 0 task id : 60 stream id : 517 node name : trans_TransData_167 kernel name : te_transdata_16b6e15e2a5cc7f70_33e5fb7ae8478ddb op address : 0x101000120000 args address : 0X101000053000 ***********************2. AICERROR code*********************** # Gives the AI Core error code and description. code : 0x10 CCU_ERR_INFO: 0xb166486200070074 ccu_err_addr bit[22:8]=000011100000000 meaning:CCU Error Address [17:3] approximate:0x3800 ***********************3. Instructions************************ # Gives the error instructions. start pc : 0x101000120000 current pc : 0x1010001201e0 Error occured most likely at line: 1d0 /{--output path}/aicerror_xxxx/te_transdata_16b6e15e2a5cc7f70_33e5fb7ae8478ddb.o.txt:1d0 /{--output path}/collection/compile/kernel_meta/te_transdata_16b6e15e2a5cc7f70_33e5fb7ae8478ddb.cce:32 //CCE code line number of the error operator /{Python script path}/nz_2_nd.py:4486 //Python code line number of the error operator related instructions (error occured before the mark *): 1bc: <not available> 1c0: <not available> 1c4: <not available> 1c8: <not available> 1cc: <not available> 1d0: <not available> 1d4: <not available> 1d8: <not available> 1dc: <not available> * 1e0: <not available> For complete instructions, please view /{--output path}/aicerror_xxxx/te_transdata_16b6e15e2a5cc7f70_33e5fb7ae8478ddb.o.txt ****************4. Input and output of node******************* # Gives the input and output information. # The input and output addresses are parsed from the IMAS log of GE and the size is parsed from the build graph. # In the case of memory zero copy, the new address (new addr) can also be parsed from the log. # If the address is not within the range of the RTS allocation log, an OVERFLOW flag is displayed. # If the device memory data is collected, NaN and INF verification will also be performed. The collected data is accurate only when the device is suspended. # If the detected input count and output count are inconsistent with those defined in the kernel function, a WARNING is returned. There is a high probability of misplacement between the arguments provided by GE and those processed by the operator. input[0] addr: 0x100801126600 size: 32288 output[0] addr: 0x100801157c00 size: 2048 ***********************5. Op in graph************************* # Gives information of the error operator. # The operator information is taken from the build graph for viewing convenience.