TIK Introduction
TIK Overview
Tensor Iterator Kernel (TIK) is a dynamic programming framework based on Python. It is presented as a Python module and runs on the host CPU.
Developers can call the APIs provided by TIK to create custom operators in Python language. Then, the TIK compiler compiles the operator into a binary file that runs on the Ascend AI Processor.
TIK Advantages
TIK provides a flexible operator development mode. TIK code has the following advantages in operator development efficiency and automatic operator performance optimization:
- Automated memory allocation and data dependency planning: enables developers to write high-performance parallel computing operators using serial programming.
- Manual scheduling: controls data movement and computing processes more accurately, achieving higher performance and maximizing the capability of the Ascend AI Processor.
TIK Operator Development Flow
Figure 3-8 shows the general flow for writing a Python program based on the TIK APIs.
The major steps are described as follows:
- Import the Python module.
from te import tik↵
te.tik: provides all TIK-related Python functions. For details, see the python/site-packages/te.egg/te/tik in the ATC installation path.
- Construct a TIK DSL container.
from te import tik tik_instance = tik.Tik()
- Define the input data and output data in the external and internal storage of the AI Core.
data_A = tik_instance.Tensor("float16", (128,), name="data_A", scope=tik.scope_gm) data_B = tik_instance.Tensor("float16", (128,), name="data_B", scope=tik.scope_gm) data_C = tik_instance.Tensor("float16", (128,), name="data_C", scope=tik.scope_gm) data_A_ub = tik_instance.Tensor("float16", (128,), name="data_A_ub", scope=tik.scope_ubuf) data_B_ub = tik_instance.Tensor("float16", (128,), name="data_B_ub", scope=tik.scope_ubuf) data_C_ub = tik_instance.Tensor("float16", (128,), name="data_C_ub", scope=tik.scope_ubuf)
- Data in the external storage is moved to the internal storage (such as Unified Buffer) of the AI Core.
tik_instance.data_move(data_A_ub, data_A, 0, 1, 128 //16, 0, 0) tik_instance.data_move(data_B_ub, data_B, 0, 1, 128 //16, 0, 0)
- Perform computation.
repeat = tik_instance.Scalar('int32') repeat.set_as(1) tik_instance.vec_abs(128, data_C_ub[0], data_A_ub[0], data_B_ub[0], repeat, 8, 8, 8)
- Move the data out to the external storage.
tik_instance.data_move(data_C, data_C_ub, 0, 1, 128 //16, 0, 0)
- Compile the statements in the TIK DSL container into the code that can be executed by the Ascend AI Processor.
# kernel_name determines the names of operator binary files that can be executed on the Ascend AI Processor. inputs indicates the data loaded from the external storage, and outputs indicates the data moved to the external storage after computation. tik_instance.BuildCCE(kernel_name="simple_add",inputs=[data_A,data_B],outputs=[data_C])