Operator Calling
The operator calling process is briefly described as follows (for details about the operators supported by the system, see CANN Operator List (Inference)):
- Load and build an operator.
- For an operator with a fixed shape, call the ACL API to load the operator.
- Build the .json single-operator definition file into an offline model (.om file) that adapts to Ascend AI Processors in advance by referring to ATC Tool Instructions.
- A single-operator model file can be loaded using the following APIs:
Call acl.op.set_model_dir to set the directory for loading the model file. The single-operator model file (.om file) is stored in the directory.
Call acl.op.load to load the single-operator model data from the memory. The memory is managed by the user. Single-operator model data refers to the data that is loaded to the memory from the .om file. The .om file is built from a single operator.
- For an operator with dynamic shape, register the custom operator in advance.
- Call acl.op.register_compile_func to register the operator selector (that is, selecting the tiling policy function). Different tilling strategies are adopted for different shapes when executing the operator.
The operator selector needs to be defined and implemented in advance. For details about the implementation example of the operator selector, see Special Topics > TIK Custom Operator with Dynamic Shape in TBE Custom Operator Development Guide.
- Prototype:
def call_back_func(num_inputs, input_desc, num_outputs, output_desc, op_attr, aclop_kernel_desc): pass
- Function implementation
You can write code logic to select a tiling policy and generate tiling parameters, and call acl.op.set_kernel_args to set tiling arguments and number of blocks for concurrent execution.
- Prototype:
- Call acl.op.create_kernel to register the operator to the system for code implementation when executing the operator.
- Call acl.op.update_params to build the operator and trigger the calling logic of the operator selector.
- Call acl.op.register_compile_func to register the operator selector (that is, selecting the tiling policy function). Different tilling strategies are adopted for different shapes when executing the operator.
- For an operator with a fixed shape, call the ACL API to load the operator.
- Call acl.rt.malloc to allocate memory on the device to store the input and output data of the operator.
Call acl.rt.memcpy (synchronous mode) or acl.rt.memcpy_async (asynchronous mode) to implement data transfer from the host to the device through memory copy.
- Run the operator.
- If the operator is Gemm which has been built in the system and encapsulated into an ACL API, you can directly call the CBLAS API to run the operator.
- If the operator is built in the system but is not encapsulated into an ACL API, the operator can be executed in either of the following ways:
- Construct the operator description information (such as the input and output tensor description and operator attributes), allocate memory for storing the input and output data of the operator, and call acl.op.execute to load and execute the operator.
In this mode, the system matches the model in the memory based on the operator description in every acl.op.execute call to execute the operator.
- Construct the operator description information (such as the input and output tensor description and operator attributes), allocate memory for storing the input and output data of the operator, call acl.op.create_handle to create a handle, and call acl.op.execute_with_handle to load and execute an operator.
In this mode, when acl.op.create_handle is called, the system matches the model in the memory based on the operator description, which is cached in the Handle. The Handle improves the efficiency for scenarios where the same operator is executed for multiples times with the acl.op.execute_with_handle call. Call acl.op.destroy_handle to free the handle when it is no longer needed.
- Construct the operator description information (such as the input and output tensor description and operator attributes), allocate memory for storing the input and output data of the operator, and call acl.op.execute to load and execute the operator.
- If the operator is not a built-in operator, you need to develop the operator by referring to TBE Custom Operator Development Guide and then run the operator by referring to the description above.
- Call acl.rt.synchronize_stream to block application execution until all tasks in the specified stream are complete.
- Call acl.rt.free to free the memory.
Call acl.rt.memcpy (synchronous mode) or acl.rt.memcpy_async (asynchronous mode) to implement data transfer from the device to the host through memory copy.