TBE Operator Development Architecture
TVM Overview
As deep learning becomes ubiquitous and indispensable, more deep learning frameworks and hardware backends are emerging. Most existing neural network models are optimized for a narrow range of hardware backends, and require significant effort be deployed on other platforms. Tensor Virtual Machine (TVM) is proposed as an open deep learning compiler stack that compiles various deep learning models from different frameworks to the CPU, GPU or specialized accelerators through unified intermediate representation (IR) and structure scheduling.
For details about TVM, visit https://tvm.apache.org/.
TBE Overview
Tensor Boost Engine (TBE) enables custom operator development based on TVM. You can develop neural network operators using TBE APIs on a dedicated GUI.
Figure 3-1 shows the logical architecture of TBE.
TBE supports layered operator development. You can select an appropriate operator development mode based on your proficiency in the hardware and leverage the optimization and code generation capabilities of TBE to generate high-performance operators executable on the Ascend AI Processor.
- Frontend framework: includes third-party open-source frameworks TensorFlow (Google's open-source machine learning framework) and Caffe (convolutional architecture for fast feature embedding).
- Graph Engine (GE): a unified IR interface provided by Huawei based on the Ascend AI Processor Software Stack for interfacing with different machine learning frameworks, such as TensorFlow, Caffe. GE implements the preparation, splitting, optimization, building, loading, execution, and management of the network model topology, or the graph.
- Fusion Engine (FE): interconnects with GE and TBE operators and supports operator information library loading and management, fusion rule management, source graph fusion, and subgraph optimization. GE transfers subgraphs to FE for subgraph optimization. FE performs prebuild actions on the subgraphs based on the operator information library and fusion optimization, including modifying the data type, inserting cast operators, and more. Then, the subgraphs are returned back to GE for subgraph fusion and optimization.
- TBE: infers necessary operator information based on IR-defined GE graphs, and provides subgraph optimization information and TBE operator calling information for FE based on the operator information library and fusion patterns. Binary files generated by TBE are used by the Ascend AI Processor to generate tasks to be executed on the Ascend AI Processor.
TBE Development Deliverables and Build Flow
Figure 3-2 shows the TBE operator execution flow on a hardware platform powered by the Ascend AI Processor.
In the preceding figure, the files in yellow are the deliverables that developers need to implement for custom operator development.
Deliverable |
Description |
Operator implementation |
Python file for operator implementation, including the operator computation implementation and schedule implementation. |
Operator plug-in |
In the custom operator development scenario based on a third-party framework (such as TensorFlow and Caffe), after developing implementation code of the custom operator, you need to develop an adaptation plug-in to map the third-party operator to an operator supported by the Ascend AI Processor and register the operator information with GE. To run a network trained on a third-party framework, the operator plug-in information in GE is loaded and called to parse and map the operators on the network to operators supported by the Ascend AI Processor. |
Operator prototype library |
The operator prototype definition specifies the constraints on an operator that runs on the Ascend AI Processor, mainly reflecting the mathematical meanings of the operator by defining inputs, outputs, attributes, and value range, verifying arguments, and inferring shape. During network execution, GE calls the verification API of the operator prototype library to verify operator arguments. If the verification passes, GE infers the output shape and dtype of each node by calling the inference function of the operator prototype library and allocates static memory for the result tensor. |
Operator information library |
The operator information library mainly reflects the restrictions on the physical implementation of operators on the Ascend AI Processor, including the input and output data types, format, and input shape of operators. During network running, based on the operator information in the operator information library, FE performs basic verification, inserts conversion nodes for the operator as required, and finds the operator implementation code to build the operator binary file. |
The following figure shows the process of loading a TBE operator for model conversion.
- Deliver the original third-party network model (TensorFlow/Caffe) to GE.
The topology of a network model is referred to as a graph.
- GE calls the operator plug-in to map an operator in the TensorFlow/Caffe network model to the operator supported by the Ascend AI Processor, so that the original TensorFlow/Caffe graph can be parsed into a graph supported by the Ascend AI Processor.
- GE calls the verification API of the operator prototype library to verify operator arguments. If the verification passes, GE infers the output shape and dtype of each node by calling the inference function of the operator prototype library and allocates static memory for the result tensor.
- GE sends a graph optimization request to FE and sends the graph to FE. Then, FE fuses the operators according to the fusion patterns and selects the operators with the highest priority based on the configuration in fe.ini. By default, custom operators have the highest priority. Finally, an optimized graph is returned to GE.
- GE splits the graph into subgraphs and sends the subgraphs to FE. Then, FE inserts Cast Operator into the subgraphs, prebuilds the TBE operators based on the subgraph engines, performs UB fusion on the TBE operators, finds the operator implementation based on the operator information library, builds the operator implementation into operator kernel files (.o and .json files), and returns the optimized subgraphs to GE.
- GE builds the graph (including memory and stream resource allocation) and sends a tasking request to FE. Then, FE returns the taskinfo information of the operator to GE. After the graph build process is complete, an .om offline model file that adapts to the Ascend AI Processor is generated.
Debuggers (such as objdump, ld, and clang) are required for service execution. For example, when building a TBE operator based on the network framework, a compiler is required to perform online building. The workflow is as follows: GE transfers subgraphs to FE for subgraph optimization. FE performs prebuild actions on the subgraphs based on the operator information library and fusion optimization, including modifying the data type, inserting cast operators, and more. Then, the subgraphs are returned back to GE for subgraph fusion and optimization before GE builds a network.
TBE Architecture
TBE consists of the domain-specific language (DSL) module, schedule module, intermediate representation (IR) module, build optimization (Pass) module, and code generation (CodeGen) module, as shown in Figure 3-4.
- DSL module: provides developers with APIs (compute APIs) for coding the operator logic.
- Schedule module: describes how operators should be tiled to specified shapes for running on the Ascend AI Processor using the scheduling primitives. Different tiling policies are adopted for Cube operators, Vector operators, and other types of operators.
- IR module: offers functions such as IR transformation and AST tree maintenance. It is represented by the IR module of the TVM community.
- Pass module: performs build optimization on the generated IR. The optimization techniques include double buffering, pipeline synchronization, memory allocation management, instruction mapping, and tiling for adapting to the Cube Unit.
- CodeGen module: generates a temporary C-style code file. This temporary file can be used by the compiler to generate the operator implementation file or directly loaded and executed by a network model.
TBE Operator Development Flow
TBE operator development consists of two parts: computing logic coding and scheduling development.
- The operator computing process and scheduling process can be implemented in DSL and TIK modes. The operator computing process describes the operator computing operations and steps, while the scheduling process describes the data tiling and data flow planning. Operator computation is shape-specific. Therefore, data must be tiled for the operators to be executed on different compute units in the Ascend AI processor. For example, operators running on a Cube Unit, a Vector Unit, and the AI CPU require different input data shapes.
- After defining the basic implementation of an operator, you need to call the Tiling submodule to tile the operator data based on the scheduling description and specify the data movement process to ensure optimal hardware execution. After data shape tiling, the Fusion submodule performs operator fusion and optimization.
- Once the operator is built, the IR module generates an IR of the operator in a TVM-style IR format. Then, the IR module is optimized in aspects including double buffering, pipeline synchronization, memory allocation management, instruction mapping, and tiling for adapting to the Cube Unit.
- After the operator traverses the Pass module, the CodeGen module generates a temporary C-style code file, which is used by the compiler to generate the operator implementation file or directly loaded and executed by a network model.
In conclusion, a custom operator is developed by the submodules of TBE. Specifically, the DSL, and TIK approach provide the operator compute logic and scheduling description as the operator prototype, the Schedule module performs data tiling and operator fusion, the IR module produces the IR of the generated operator, and then the Pass module performs build optimization in aspects such as memory allocation based on the IR. Finally, the CodeGen module generates C-style code for the compiler for direct build. During operator definition, TBE optimizes the operator in various aspects in addition to implementing the operator, thereby boosting the operator execution performance.