LeakyRelu Operator (TBE DSL)
Function Description
This sample describes how to implement the LeakyRelu operator in TBE DSL mode.
The mathematical expression of the LeakyRelu operator is as follows:
- When
= 0,
f(x) = max(0, x)
- When
! = 0,
Operator Analysis
Before developing a LeakyRelu operator using the TBE DSL, you need to determine the operator function, input, output, development mode, operator type (OpType), implementation function name, and more.
- Specify the operator function and mathematical expression.
For details about the operator function, see Function Description.
- Specify the input and output.
- The LeakyRelu operator has one input (x), one output (y), and one attribute (
in the operator expression).
- The supported input data types include float16, float32, int32, and int8. The output has the same data type as the inputs.
- The operator input supports all shapes. The output has the same shape as the inputs.
- The operator input supports the following formats: NCHW, NC1HWC0, and NHWC.
- The LeakyRelu operator has one input (x), one output (y), and one attribute (
- Determine the operator development mode and the compute API.
- The compute process involves several operations: obtaining the maximum value, obtaining the minimum value, and performing the multiplication method. For details, see TBE DSL APIs. The DSL APIs have corresponding functions:
te.lang.cce.vmuls() is used for multiplication.
te.lang.cce.vmin() is used to obtain the minimum value.
te.lang.cce.vmax() is used to obtain the maximum value.
- Some DSL APIs convert data types. Therefore, after the compute process, the result needs to be converted back to the original data type by using the te.lang.cce.cast_to API.
- When
= 0, indicating the ReLU operation, its function of converting int8, int32, and float32 to float16 can be implemented by using the te.lang.cce.vrelu() API. If the data type is int32 or float32, the precision loss may occur. In this case, use the te.lang.cce.vmax() API to obtain the maximum value.
- The compute process involves several operations: obtaining the maximum value, obtaining the minimum value, and performing the multiplication method. For details, see TBE DSL APIs. The DSL APIs have corresponding functions:
- Specify the operator implementation file name, operator implementation function name, and OpType.
- The operator type must be named in upper camel case to distinguish different semantics.
- You can name the operator file and operator function in either of the following naming rules:
- To create user-defined names, configure the opFile.value and opInterface.value in Operator Information Definition.
- If opFile.value and opInterface.value in the Operator Information Definition are not configured, the FE converts OpType and matches the operator file name with the operator function name as follows:The conversion rules are as follows:
- Convert the first uppercase letter to a lowercase letter.
Example: Abc -> abc
- Convert the uppercase letter after a lowercase letter to a lowercase letter with an underscore (_) prefix.
Example: AbcDef -> abc_def
- Uppercase letters following a digit or an uppercase letter are regarded as a semantic string. If there is a lowercase letter after this string, convert the last uppercase letter in this string into an underscore (_) and a lowercase letter, and convert the other uppercase letters into lowercase letters. If there is no lowercase letter after the string, directly convert the string into lowercase letters.
Examples: ABCDef -> abc_def; Abc2DEf -> abc2d_ef; Abc2DEF -> abc2def; ABC2dEF -> abc2d_ef
- Convert the first uppercase letter to a lowercase letter.
In this sample, the operator type is defined as LeakyReluDemo, and the implementation file name and implementation function name of the operator are defined as leaky_relu_demo, so that the built-in LeakyRelu operator will not be affected.
Based on the preceding analysis, the design specifications of the LeakyRelu operator are as follows.
Table 14-2 Design specifications of the LeakyRelu operatorOpType
LeakyReluDemo
Operator Input
Name: x
Shape: all
Data type:
float16, float32, int32, int8
Format:
NCHW,NC1HWC0,
NHWC
Operator Attribute
Name: negative_slope
(
in Function Description)
-
Data type:
float32
-
Operator Output
Name: y
Shape: all
Data type:
float16, float32, int32, int8
Format:
NCHW, NC1HWC0, NHWC
Main DSL APIs for Operator Implementation
te.lang.cce.vmuls( )
te.lang.cce.vmin( )
te.lang.cce.vmax( )
Operator File/Function Name
leaky_relu_demo
Operator Implementation
This section describes the key points of operator implementation.
Operator Code Implementation
- Only four data types (float16, float32, int32, and int8) are supported by the LeakyRelu operator. First, verify the operator type, create a placeholder for the input tensor, call the operator compute function, and perform auto scheduling.
def leaky_relu_demo(x, y, negative_slope=0, kernel_name="leaky_relu"): # check input tensor shape shape = x.get("shape") dtype = x.get("dtype") # check input tensor data_type check_list = ["float16", "float32", "int32", "int8"] if dtype.lower() not in check_list: raise RuntimeError( "leaky relu only support %s while dtype is %s" % (",".join(check_list), dtype)) inp_dtype = dtype.lower() input_data_x = tvm.placeholder(shape, name="input_data_x", dtype=inp_dtype) with tvm.target.cce(): res = leaky_relu_compute(input_data_x, y, negative_slope, kernel_name) sch = generic.auto_schedule(res)
- Implement the operator compute APIs in the following logic:
- When negative_slope = 0, the output y is the larger value between the input x and 0.
- If the input data type is float16 or int8, the te.lang.cce.vrelu(x) API can be called for compute process.
- If the input data type is float32 or int32, precision loss may occur if the te.lang.cce.vrelu(x) API is directly called. This is because the te.lang.cce.vrelu(x) API converts float32 and int32 to float16 for compute process. Therefore, you need to compare the input data with tensor_zero (with the same shape as x and 0 elements) by using the te.lang.cce.vmax() API.
- When negative_slope is not 0, you need to distinguish the scenarios where negative_slope is less than 1 and negative_slope is large than 1 according to Function Description.
When negative_slope is less than or equals to 1, the larger value between x and x*negative_slope is used. When negative_slope is large than 1, the smaller value between x and x*negative_slope is used.
- As required by the te.lang.cce.vmuls() API, the input tensor should be of the same data type of the scalar. Therefore, you need to convert negative_slope to the data type of input x by using the tvm.const API.
- When the te.lang.cce.vmuls() API is used, data types int8 and int32 will be converted to float16. Therefore, after the te.lang.cce.vmuls() operation is performed, you need to convert the result to the original data type by calling the te.lang.cce.cast_to() API and compares the value of negative_slope with x.
The compute implementation is as follows:
@fusion_manager.register("leaky_relu") def leaky_relu_demo_compute(x, y, negative_slope=0, kernel_name="leaky_relu"): """ compute for caffe_relu_layer_cce """ inp_dtype = x.dtype.lower() # Convert all uppercase letters to lowercase letters. shape = x.shape # Obtain the input shape. # The original relu logic remains unchanged. if negative_slope == 0: if inp_dtype in ("float32", "int32"): # Perform the vmax operation to avoid precision loss. tensor_zero = te.lang.cce.broadcast(tvm.const(0, inp_dtype), shape) # Each element is a tensor of inp_dtype and has the value of 0. data_res = te.lang.cce.vmax(x, tensor_zero) # Obtain the larger value between x and tensor_zero. else: # When the data type is float16 or int8, data_res = te.lang.cce.vrelu(x) # Perform the ReLU operation when the data type is float16 or int8. data_res = te.lang.cce.cast_to(data_res, inp_dtype) # Convert the data type of data_res to inp_dtype. return data_res # negative_slope != 0 if inp_dtype in ("float16", "float32"): slope_tmp = tvm.const(negative_slope, dtype=inp_dtype) # Convert the constant negative_slope to the data type Inp_dtype. tmp = te.lang.cce.vmuls(x, slope_tmp) # Perform the x*slop_tmp operation. if negative_slope <= 1: res = te.lang.cce.vmax(x, tmp) # Obtain the larger value between x and tmp when negative_slope ≤ 1. else: res = te.lang.cce.vmin(x, tmp) # Obtain the smaller value between x and tmp when negative_slope > 1. else: # inp_dtype in ("int32", "int8") slope_tmp = tvm.const(negative_slope, dtype=inp_dtype) tmp = te.lang.cce.vmuls(x, slope_tmp) # Perform the vmuls operation to convert int8 and int32 to float16. tmp_oritype = te.lang.cce.cast_to(tmp, inp_dtype) # Perform the cast operation to convert the tmp data type to the original data type. if negative_slope <= 1: res = te.lang.cce.vmax(x, tmp_oritype) else: res = te.lang.cce.vmin(x, tmp_oritype) res = te.lang.cce.cast_to(res, inp_dtype) return res
- When negative_slope = 0, the output y is the larger value between the input x and 0.
- Call the te.lang.cce.cce_build_code() API to perform auto scheduling and build.
config = {"name": kernel_name, "tensor_list": [input_data_x, res]} te.lang.cce.cce_build_code(sch, config)
Operator Plug-in Implementation
You need to customize the ParseParamsLeakyRelu function to implement the attribute mapping from the Caffe LeakyReLUDemo operator to the LeakyReluDemo operator adapted to the Ascend AI Processor.
The ParseParamsLeakyRelu function is implemented as follows:
Status ParseParamsLeakyRelu(const ge::Operator& op_src, ge::Operator& op_dest) { // trans op_src to op_dest float negative_slope; if (ge::GRAPH_SUCCESS == op_src.GetAttr("negative_slope", negative_slope)){// Obtains the value of the negative_slope attribute in the original model. The op_dest.SetAttr("negative_slope", negative_slope); // Assigns the obtained value of the negative_slope attribute to the negative_slope attribute of the op_dest object. }else{ op_dest.SetAttr("negative_slope", float(0)); } return SUCCESS; }
Operator Prototype Definition
The key point of prototype definition is to infer the shape and dtype of the output tensor. The LeakyReluDemo operator directly assigns the shape and dtype of the input tensor to the output tensor, as shown in the following:
IMPLEMT_INFERFUNC(LeakyReluDemo, LeakyReluDemoInferShape) { auto x_shape = op.GetInputDesc("x").GetShape().GetDims(); DataType x_dtype = op.GetInputDesc("x").GetDataType(); TensorDesc y_desc = op.GetOutputDesc("y"); y_desc.SetShape(ge::Shape(x_shape)); y_desc.SetDataType(x_dtype); (void)op.UpdateOutputDesc("y", y_desc); return GRAPH_SUCCESS; }
Operator Information Definition
For details about the information definition file of the LeakyReluDemo operator, see tbe/op_info_cfg/ai_core/<soc_version>/leaky_relu.ini. The name of the Python file of the operator implementation code (opFile.vaule) and name of the operator definition function (opInterface.vaule) are not configured in the information definition file. Therefore, FE converts uppercase letters of the operator type into underscored lowercase letters to match the operator implementation file and operator definition function. For details about the mapping rules, see 4.