# Quick Start

When developing TBE operators in DSL mode, you only need to focus on the compute logic of the operators without the need to pay attention to the scheduling strategy, making this development mode simple and convenient.

## Objectives

This section describes the method of writing the code implementation of a TBE operator in DSL mode by using the Add operator as an example.

The Add operator returns the sum of its operands, as shown in the following figure.

## Operator Analysis

Before developing an Add operator using the TBE DSL, you need to determine the operator function, input, output, development mode, operator type (*OpType*), implementation function name, and more.

- Specify the operator function and mathematical expression.
The mathematical expression of the Add operator is as follows:

z=x+y

The Add operator adds two inputs and returns a result.

- Specify the inputs and output.
- The Add operator has two inputs,
**x**and**y**, and outputs the result**z**. - The supported input data types include float16, float32, and int32. The output has the same data type as the inputs.
- The operator inputs support all shapes. The output has the same shape as the inputs.
- The operator inputs support the following formats: NCHW, NC1HWC0, NHWC, and ND.

- The Add operator has two inputs,
- Determine the operator development mode and the compute API.
- The compute process involves only the addition operation. For details, see TBE DSL APIs. The
**te.lang.cce.vadd(lhs, rhs)**API can be used to implement "x + y" for preliminary analysis. - The
**te.lang.cce.vadd(lhs, rhs)**API requires that the two input tensors have the same shape. Therefore, you need to obtain the larger shape of the two input tensors, and then call the**te.lang.cce.broadcast(var, shape, output_dtype=None)**API to broadcast the input tensors to the specified shape.

- The compute process involves only the addition operation. For details, see TBE DSL APIs. The
- Specify the names of the operator implementation file, operator implementation function, and
*OpType*.- The operator type must be named in upper camel case to distinguish different semantics.
- You can name the operator file and operator function in either of the following naming rules:
- To create user-defined names, configure the
**opFile.value**and**opInterface.value**in Operator Information Definition. - If
**opFile.value**and**opInterface.value**in the Operator Information Definition are not configured, the FE converts*OpType*and matches the operator file name with the operator function name as follows:The conversion rules are as follows:- Convert the first uppercase letter to a lowercase letter.
Example: Abc -> abc

- Convert the uppercase letter after a lowercase letter to a lowercase letter with an underscore (_) prefix.
Example: AbcDef -> abc_def

- Uppercase letters following a digit or an uppercase letter are regarded as a semantic string. If there is a lowercase letter after this string, convert the last uppercase letter in this string into an underscore (_) and a lowercase letter, and convert the other uppercase letters into lowercase letters. If there is no lowercase letter after the string, directly convert the string into lowercase letters.
Examples: ABCDef -> abc_def; Abc2DEf -> abc2d_ef; Abc2DEF -> abc2def; ABC2dEF -> abc2d_ef

- Convert the first uppercase letter to a lowercase letter.

- To create user-defined names, configure the

In this example,

*OpType*of the operator is defined as**Add**. The first letter of the operator implementation file name and implementation function name is converted into lowercase letters, so the name is**add**.Based on the preceding analysis, the design specifications of the Add operator are as follows:

Table 6-1 Design specificationsOpType

add

Operator Input

Name: x

Shape: all

Data type:

float16, float32, int32

Format:

NCHW,NC1HWC0,

NHWC,ND

Name: y

Shape: all

Data type:

float16, float32, int32

Format:

NCHW,NC1HWC0,

NHWC,ND

Operator Output

Name: z

Shape: all

Data type:

float16, float32, int32

Format:

NCHW,NC1HWC0,

NHWC,ND

Main DSL APIs for Operator Implementation

te.lang.cce.broadcast(var, shape, output_dtype=None)

te.lang.cce.vadd(lhs, rhs)

Operator File/Function Name

add

## Operator Code Implementation

The Add operator supports only three data types: float16, float32, and int32. Therefore, the input data type needs to be verified. The two inputs may have different shapes. This scenario is supported by the Add operator, but not supported by the operator compute API **te.lang.cce.vadd()**. As a result, the two input shapes need to be broadcast and verified. The operator implementation code is as follows:

SHAPE_SIZE_LIMIT = 2147483648 # Compare each dimension of the two input shapes and assign the larger value of each dimension to generateout_shape. def _produce_shapes(shape1, shape2): shape1 = list(shape1) shape2 = list(shape2) flag = 0 if len(shape1) < len(shape2): shape1, shape2 = shape2, shape1 flag = 1 output_shape_len = len(shape1) dec = output_shape_len - len(shape2) for i in range(dec): shape2 = [1] + shape2 out_shape = [] for i in range(output_shape_len): if (shape1[i] != shape2[i]) and (shape1[i] != 1) and (shape2[i] != 1): raise RuntimeError("input shapes not match!") out_shape.append(shape1[i] if shape1[i] > shape2[i] else shape2[i]) if flag == 1: shape1, shape2 = shape2, shape1 return shape1, shape2, out_shape # Convert the shape to a list. def _shape_to_list(shape): result = [] for i in shape: if isinstance(i, tvm.expr.Var): result.append(i) else: result.append(i.value) return result # Implement the compute logic of the Add operator. @fusion_manager.register("add") def add_compute(input_x, input_y, output_z, kernel_name="add"): shape_x = _shape_to_list(input_x.shape) shape_y = _shape_to_list(input_y.shape) shape_x, shape_y, shape_max = _produce_shapes(shape_x, shape_y) # Assign the larger value of each dimension ofshape_xandshape_ytoshape_max. shape_size = reduce(lambda x, y: x * y, shape_max[:]) if shape_size > SHAPE_SIZE_LIMIT: raise RuntimeError("the shape is too large to calculate") input_x = te.lang.cce.broadcast(input_x, shape_max) # Broadcast the shape ofinput_xasshape_max. input_y = te.lang.cce.broadcast(input_y, shape_max) # Broadcast the shape ofinput_yasshape_max. res = te.lang.cce.vadd(input_x, input_y) # Executeinput_x + input_y. return res # Return the output tensor. # Operator definition function def add(input_x, input_y, output_z, kernel_name="add"): # Obtain the shape and data type of the operator input tensor. shape_x = input_x.get("shape") shape_y = input_y.get("shape") check_tuple = ("float16", "float32", "int32") input_data_type = input_x.get("dtype").lower() if input_data_type not in check_tuple: raise RuntimeError("only support %s while dtype is %s" % (",".join(check_tuple), input_data_type)) # Assign the larger value of each dimension ofshape_xandshape_ytoshape_max. shape_x, shape_y, shape_max = _produce_shapes(shape_x, shape_y) if shape_x[-1] == 1 and shape_y[-1] == 1 and shape_max[-1] == 1: # If the shape length is1, assign a value directly. If the shape length is not1and the last dimension is1, reduce the last dimension. For the scenario where the last dimension = 1 and where the last dimension does not exist, the format is the same. For example, 2*3 = 2*3*1. Reducing the last dimension = 1 can improve the subsequent scheduling efficiency. shape_x = shape_x if len(shape_x) == 1 else shape_x[:-1] shape_y = shape_y if len(shape_y) == 1 else shape_y[:-1] shape_max = shape_max if len(shape_max) == 1 else shape_max[:-1] # Call theplaceholderAPI of TVM to place the first input tensor, returning a tensor object. data_x = tvm.placeholder(shape_x, name="data_1", dtype=input_data_type) # Call theplaceholderAPI of TVM to place the second input tensor, returning a tensor object. data_y = tvm.placeholder(shape_y, name="data_2", dtype=input_data_type) # Call the compute implementation function. res = add_compute(data_x, data_y, output_z, kernel_name) # Auto scheduling with tvm.target.cce(): schedule = generic.auto_schedule(res) # Build configuration config = {"name": kernel_name, "tensor_list": (data_x, data_y, res)} te.lang.cce.cce_build_code(schedule, config)