Add Operator (TBE DSL)
Function Description
This sample describes how to implement the Add operator in TBE DSL mode, and verifies the single-operator functions and single-operator network.
The Add operator adds two segments of data and returns the result, as shown in the following figure.
Operator Analysis
Before developing an operator through the TBE DSL, you need to determine the operator function, input, output, development mode, operator type (OpType), implementation function name, and more.
- Specify the operator function and mathematical expression.
The mathematical expression of the Add operator is as follows:
z=x+y
Add the two input parameters to obtain the final result z and return it.
- Specify the inputs and output.
- The Add operator has two inputs, x and y, and outputs the result z.
- The supported input data types include float16, float32, and int32. The output has the same data type as the inputs.
- The operator inputs support all shapes. The output has the same shape as the inputs.
- The operator inputs support the following formats: NCHW, NC1HWC0, NHWC, and ND.
- Determine the operator development mode and the compute API.
- The computation process involves only the addition operation. For details, see TBE DSL APIs. The te.lang.cce.vadd(lhs, rhs) API can be used to implement "x + y" for preliminary analysis.
- The te.lang.cce.vadd(lhs, rhs) API requires that the two input tensors have the same shape. Therefore, you need to obtain the larger shape of the two input tensors, and then call the te.lang.cce.broadcast(var, shape, output_dtype=None) API to broadcast the input tensors to the specified shape.
- Specify the names of the operator implementation file, operator implementation function, and OpType.
The naming rules are as follows:
- The operator type must be named in upper camel case to distinguish different semantics.
- You can name the operator file and operator function in either of the following naming rules:
- Perform the opFile.value and opInterface.value in Operator Information Definition.
- If opFile.value and opInterface.value in the Operator Information Definition are not configured, the FE converts OpType and matches the operator file name with the operator function name as follows:The conversion rules are as follows:
- Convert the first uppercase letter to a lowercase letter.
Example: Abc -> abc
- Convert the uppercase letter after a lowercase letter to a lowercase letter with an underscore (_) prefix.
Example: AbcDef -> abc_def
- Uppercase letters following a digit or an uppercase letter are regarded as a semantic string. If there is a lowercase letter after this string, convert the last uppercase letter in this string into an underscore (_) and a lowercase letter, and convert the other uppercase letters into lowercase letters. If there is no lowercase letter after the string, directly convert the string into lowercase letters.
Examples: ABCDef -> abc_def; Abc2DEf -> abc2d_ef; Abc2DEF -> abc2def; ABC2dEF -> abc2d_ef
- Convert the first uppercase letter to a lowercase letter.
In this example, OpType of the operator is defined as Add, so the operator implementation file and implementation function are named add.
Based on the preceding analysis, the design specifications of the Add operator are as follows:
Table 12-1 Add operator design specificationsOpType
Add
Operator Input
Name: x
Shape: all
Data type:
float16, float32, int32
Format:
NCHW,NC1HWC0,NHWC,ND
Name: y
Shape: all
Data type:
float16, float32, int32
Format:
NCHW,NC1HWC0,NHWC,ND
Operator Return
Name: z
Shape: all
Data type:
float16, float32, int32
Format:
NCHW,NC1HWC0,NHWC,ND
Main DSL APIs for Operator Implementation
te.lang.cce.broadcast(var, shape, output_dtype=None)
te.lang.cce.vadd(lhs, rhs)
Operator File/Function Name
add
Operator Implementation
This section describes the key points of the operator implementation in the sample.
Operator Code Implementation
The Add operator supports only three data types: float16, float32, and int32. Therefore, the input data type needs to be verified. The two inputs may have different shapes. This scenario is supported by the Add operator, but not supported by the operator compute API te.lang.cce.vadd(). As a result, the two input shapes need to be broadcast and verified. The operator implementation code is as follows:
tbe/impl/add.py
SHAPE_SIZE_LIMIT = 2147483648 # Compare each dimension of the two input shapes and assign the larger value of each dimension to generate out_shape. def _produce_shapes(shape1, shape2): shape1 = list(shape1) shape2 = list(shape2) flag = 0 if len(shape1) < len(shape2): shape1, shape2 = shape2, shape1 flag = 1 output_shape_len = len(shape1) dec = output_shape_len - len(shape2) for i in range(dec): shape2 = [1] + shape2 out_shape = [] for i in range(output_shape_len): if (shape1[i] != shape2[i]) and (shape1[i] != 1) and (shape2[i] != 1): raise RuntimeError("input shapes not match!") out_shape.append(shape1[i] if shape1[i] > shape2[i] else shape2[i]) if flag == 1: shape1, shape2 = shape2, shape1 return shape1, shape2, out_shape # Convert the shape to a list. def _shape_to_list(shape): result = [] for i in shape: if isinstance(i, tvm.expr.Var): result.append(i) else: result.append(i.value) return result # Implement the computation logic of the Add operator. @fusion_manager.register("add") def add_compute(input_x, input_y, output_z, kernel_name="add"): shape_x = _shape_to_list(input_x.shape) shape_y = _shape_to_list(input_y.shape) shape_x, shape_y, shape_max = _produce_shapes(shape_x, shape_y) # Assign the larger value of each dimension of shape_x and shape_y to shape_max. shape_size = reduce(lambda x, y: x * y, shape_max[:]) if shape_size > SHAPE_SIZE_LIMIT: raise RuntimeError("the shape is too large to calculate") input_x = te.lang.cce.broadcast(input_x, shape_max) # Broadcast the shape of input_x as shape_max. input_y = te.lang.cce.broadcast(input_y, shape_max) # Broadcast the shape of input_y as shape_max. res = te.lang.cce.vadd(input_x, input_y) # Execute input_x + input_y. return res # Return the output tensor. # Operator definition function def add(input_x, input_y, output_z, kernel_name="add"): # Obtain the shape and data type of the operator input tensor. shape_x = input_x.get("shape") shape_y = input_y.get("shape") check_tuple = ("float16", "float32", "int32") input_data_type = input_x.get("dtype").lower() if input_data_type not in check_tuple: raise RuntimeError("only support %s while dtype is %s" % (",".join(check_tuple), input_data_type)) # Broadcast shape_x and shape_y, which is a preparation for tensor placeholder. shape_x, shape_y, shape_max = _produce_shapes(shape_x, shape_y) if shape_x[-1] == 1 and shape_y[-1] == 1 and shape_max[-1] == 1: # If the shape length is 1, assign a value directly. If the shape length is not 1 and the last dimension is 1, reduce the last dimension. For the scenario where the last dimension = 1 and where the last dimension does not exist, the format is the same. For example, 2*3 = 2*3*1. Reducing the last dimension = 1 can improve the subsequent scheduling efficiency. shape_x = shape_x if len(shape_x) == 1 else shape_x[:-1] shape_y = shape_y if len(shape_y) == 1 else shape_y[:-1] shape_max = shape_max if len(shape_max) == 1 else shape_max[:-1] # Call the placeholder API of TVM to place the first input tensor, returning a tensor object. data_x = tvm.placeholder(shape_x, name="data_1", dtype=input_data_type) # Call the placeholder API of TVM to place the second input tensor, returning a tensor object. data_y = tvm.placeholder(shape_y, name="data_2", dtype=input_data_type) # Call the compute implementation function. res = add_compute(data_x, data_y, output_z, kernel_name) # Auto scheduling with tvm.target.cce(): schedule = generic.auto_schedule(res) # Build configuration config = {"name": kernel_name, "tensor_list": (data_x, data_y, res)} te.lang.cce.cce_build_code(schedule, config)
Operator Plug-In Implementation
Parse and map the Add operator developed under the TensorFlow framework to the Add operator supported by the Ascend AI Processor. You can implement the operator attribute mapping by calling AutoMappingFn(). For details about the complete code, see the framework/tf_plugin/add_plugin.cpp folder in the sample.
Operator Prototype Definition
The key point of prototype definition is to infer the shape of the output tensor and verify the internal association of the operator inputs.
The principle of inferring the output shape is as follows: Obtain the two input shapes, broadcast them to the same shape by assigning the larger size of each dimension of the two inputs. The implementation code is as follows:
op_proto/add.cpp
bool InferShapeAndTypeAdd(Operator& op, const string& input_name1, const string& input_name2, const string& output_name) { // vOutputDesc.push_back(op.GetInputDesc(0)); TensorDesc vOutputDesc = op.GetOutputDesc(output_name); DataType input_dtype = op.GetInputDesc(input_name1).GetDataType(); Format input_format = op.GetInputDesc(input_name1).GetFormat(); // Exchange the shape dimensions. ge::Shape shapeX = op.GetInputDesc(input_name1).GetShape(); ge::Shape shapeY = op.GetInputDesc(input_name2).GetShape(); std::vector<int64_t> dimsX = shapeX.GetDims(); std::vector<int64_t> dimsY = shapeY.GetDims(); if (dimsX.size() < dimsY.size()) { std::vector<int64_t> dimsTmp = dimsX; dimsX = dimsY; dimsY = dimsTmp; } // Pad the smaller shape dimension with 1. if (dimsX.size() != dimsY.size()) { int dec = dimsX.size() - dimsY.size(); for (int i = 0; i < dec; i++) { dimsY.insert(dimsY.begin(), (int64_t)1); } } // Use the larger value of each dimension in the two input shapes to form the output shape dimensions. std::vector<int64_t> dimVec; for (size_t i = 0; i < dimsX.size(); i++) { if ((dimsX[i] != dimsY[i]) && (dimsX[i] != 1) && (dimsY[i] != 1)) { return false; } int64_t dims = dimsX[i] > dimsY[i] ? dimsX[i] : dimsY[i]; dimVec.push_back(dims); } ge::Shape outputShape = ge::Shape(dimVec); vOutputDesc.SetShape(outputShape); vOutputDesc.SetDataType(input_dtype); vOutputDesc.SetFormat(input_format); op.UpdateOutputDesc(output_name, vOutputDesc); return true; }
Operator Information Definition
For details about the information definition file of the Add operator, see tbe/op_info_cfg/ai_core/<soc_version>/add.ini.
Test Cases for Operator Verification on Network
For details about the implementation code of the TensorFlow-based operator network test cases, see tbe/testcases/tf_test/add/tf_add.py. For details about the code, see Test Case Implementation for Network Verification.