AI Core Metrics
The analysis result of AI Core metrics varies according to the --ai_core_profiling_mode option. This section assumes that --ai_core_profiling_mode is set to task-based.
- task-based: collects profile data task by task, in percentage of number of cycles.
- sample-based: collects profile data at a fixed interval, in percentage of time.
The AI Core metrics are described as follows:
- aicoreArithmeticThroughput
- mac_fp16_ratio: percentage of cycles taken to execute Cube fp16 instructions
- mac_int8_ratio: percentage of cycles taken to execute Cube int8 instructions
- vec_fp32_ratio: percentage of cycles taken to execute Vector fp32 instructions
- vec_fp16_ratio: percentage of cycles taken to execute Vector fp16 instructions
- vec_int32_ratio: percentage of cycles taken to execute Vector int32 instructions
- vec_misc_ratio: percentage of cycles taken to execute Vector misc instructions
- aicorePipeline
- vec_time: time taken to execute Vector instructions
- vec_ratio: percentage of cycles taken to execute Vector instructions
- mac_time: time taken to execute Cube instructions
- mac_ratio: percentage of cycles taken to execute Cube instructions
- scalar_time: time taken to execute Scalar instructions
- scalar_ratio: percentage of cycles taken to execute Scalar instructions
- mte1_time: time taken to execute MTE1 instructions (L1-to-L0A/L0B movement)
- mte1_ratio: percentage of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement)
- mte2_time: time taken to execute MTE2 instructions (DDR-to-AI Core movement)
- mte2_ratio: percentage of cycles taken to execute MTE2 instructions (DDR-to-AI Core movement)
- mte3_time: time taken to execute MTE3 instructions (AI Core-to-DDR movement)
- mte3_ratio: percentage of cycles taken to execute MTE3 instructions (AI Core-to-DDR movement)
- icache_miss_rate: I-Cache miss rate
- memory_bound: identifies a memory bottleneck when the AI Core is computing operators, calculated as: mte2_ratio/max(mac_ratio, vec_ratio). If the value is less than 1, no memory bottleneck exists. If not, a memory bottleneck exists.
- aicoreSynchronization
- scalar_waitflag_ratio: percentage of cycles for waiting between Scalar instructions
- cube_waitflag_ratio: percentage of cycles for waiting between Cube instructions
- vector_waitflag_ratio: percentage of cycles for waiting between Vector instructions
- mte1_waitflag_ratio: percentage of cycles for waiting between MTE1 instructions
- mte2_waitflag_ratio: percentage of cycles for waiting between MTE2 instructions
- mte3_waitflag_ratio: percentage of cycles for waiting between MTE3 instructions
- aicoreMemoryBandwidth
- ub_read_bw: UB read bandwidth (GB/s)
- ub_write_bw: UB write bandwidth (GB/s)
- l1_read_bw: L1 read bandwidth (GB/s)
- l1_write_bw: L1 write bandwidth (GB/s)
- l2_read_bw: L2 read bandwidth (GB/s)
- l2_write_bw: L2 write bandwidth (GB/s)
- main_mem_read_bw: main memory read bandwidth (GB/s)
- main_mem_write_bw: main memory write bandwidth (GB/s)
- aicoreInternalMemoryBandwidth
- scalar_ld_ratio: percentage of cycles taken to execute Scalar-read-UB instructions
- scalar_st_ratio: percentage of cycles taken to execute Scalar-read-UB instructions
- l0A_read_bw: L0A read bandwidth (GB/s)
- l0A_write_bw: L0A write bandwidth (GB/s)
- l0B_read_bw: L0B read bandwidth (GB/s)
- l0B_write_bw: L0B write bandwidth (GB/s)
- l0C_read_bw: L0C read bandwidth (GB/s)
- l0C_write_bw: L0C write bandwidth (GB/s)
- aicorePipelineStall
- vec_bankgroup_cflt_ratio: percentage of cycles taken to execute vec_bankgroup_stall_cycles instructions
- vec_bank_cflt_ratio: percentage of cycles taken to execute vec_bank_stall_cycles instructions
- vec_resc_cflt_ratio: percentage of cycles taken to execute vec_resc_cflt_ratio instructions
- mte1_iq_full_ratio: percentage of cycles taken to execute mte1_iq_full_cycles instructions
- mte2_iq_full_ratio: percentage of cycles taken to execute mte2_iq_full_cycles instructions
- mte3_iq_full_ratio: percentage of cycles taken to execute mte3_iq_full_cycles instructions
- cube_iq_full_ratio: percentage of cycles taken to execute cube_iq_full_cycles instructions
- vec_iq_full_ratio: percentage of cycles taken to execute vec_iq_full_ratio instructions
- iq_full_ratio: percentage of cycles taken to execute vec_resc_cflt_ratio, mte1_iq_full_ratio, mte2_iq_full_ratio, mte3_iq_full_ratio, cube_iq_full_ratio, and vec_iq_full_ratio instructions