Overview
To facilitate custom operator development, TBE provides a set of computing APIs for developers to assemble the computing logic of operators. In this way, more than 70% operators can be developed based on these APIs, greatly simplifying custom operator development. The computing APIs provided by the TBE are called the domain-specific language (DSL). Operators developed based on the DSL can directly use the Auto Schedule mechanism provided by the TBE to automatically complete the scheduling process, eliminating the most complex scheduling build flow.
Figure 3-5 shows the functional framework of the operator developed based on the TBE DSL.
- Developers call the DSL API provided by the TBE to describe the compute logic, which specifies the computation method and procedure of the operator.
- After the computing logic is developed, developers can call the Auto Schedule API to start automatic scheduling. During automatic scheduling, the TBE automatically selects a proper scheduling template based on the computing type to divide data blocks and data flows, ensuring optimal hardware execution.
- After the auto scheduling is complete, a TVM-style IR is generated.
- Pass module: performs build optimization on the generated IR. The optimization techniques include double buffering, pipeline synchronization, memory allocation management, instruction mapping, tiling for adapting to the Cube Unit.
- After the operator traverses the Pass module, the CodeGen module generates a temporary C-style code file, which is used by the compiler to generate the operator implementation file or directly loaded and executed by a network model.
A code sample is provided as follows:
// Initialize the input tensor to configure a placeholder of the input tensor. data_x = tvm.placeholder(shape_x, name="data_1", dtype=input_data_type) data_y = tvm.placeholder(shape_y, name="data_2", dtype=input_data_type) // Call the computation API to implement data_x + data_y. res = te.lang.cce.vadd(data_x, data_y) // Call the auto_schedule API to implement automatic scheduling. with tvm.target.cce(): schedule = topi.generic.auto_schedule(res) // Configure build parameters and perform build. config = {"print_ir": False, "name": kernel_name, "tensor_list": (data_x, data_y, res)} te.lang.cce.cce_build_code(schedule, config)