Upsample Operator (TIK)
Function Description
Implemented by using TIK PAPIs, the Upsample operator is used to scale up the feature map in a neural network.
Operator Analysis
Before developing a Upsample operator using the TIK API, you need to determine the operator function, input, output, development mode, operator type, implementation function name, and more.
- Specify the operator function.
The Upsample operator is used to scale up the feature map in a neural network by using an interpolation method.
- Specify the input and output.
- The Upsample operator has one input x, one output y, and three attributes.
- Both the operator input and output data types are float16 and float32.
- The operator input supports all shapes.
- The supported input format is NC1HWC0.
- The three attributes are scale, stride_h, and stride_w.
- Determine the operator development mode and the compute API.
- The operator needs to compute different elements in different dimensions of a tensor, which is not supported by TBE DSL APIs. Therefore, the operator is implemented by using TIK.
- The core computation process is as follows:
- Call the data_move() API to read data into the Unified Buffer.
- Call the vec_muls() API to multiply the input with a scaling coefficient.
- Call the data_move() API to move data from the Unified Buffer to the Global Memory.
- Specify the operator implementation file name, operator implementation function name, and OpType.
- The operator type must be named in upper camel case to distinguish different semantics.
- You can name the operator file and operator function in either of the following naming rules:
- To create user-defined names, configure the opFile.value and opInterface.value in Operator Information Definition.
- If opFile.value and opInterface.value in the Operator Information Definition are not configured, the FE converts OpType and matches the operator file name with the operator function name as follows:The conversion rules are as follows:
- Convert the first uppercase letter to a lowercase letter.
Example: Abc -> abc
- Convert the uppercase letter after a lowercase letter to a lowercase letter with an underscore (_) prefix.
Example: AbcDef -> abc_def
- Uppercase letters following a digit or an uppercase letter are regarded as a semantic string. If there is a lowercase letter after this string, convert the last uppercase letter in this string into an underscore (_) and a lowercase letter, and convert the other uppercase letters into lowercase letters. If there is no lowercase letter after the string, directly convert the string into lowercase letters.
Examples: ABCDef -> abc_def; Abc2DEf -> abc2d_ef; Abc2DEF -> abc2def; ABC2dEF -> abc2d_ef
- Convert the first uppercase letter to a lowercase letter.
In this example, OpType of the operator is defined as Upsample, so the operator implementation file and implementation function are named upsample.
Based on the preceding analysis, the design specifications of the UpsampleTik operator are as follows:
Table 14-8 UpsampleTik operator specificationsOpType
UpsampleTik
Operator Input
Name: x
Shape: all
Data types: float16, float32
Format: NC1HWC0
Operator Attributes
Name: scale
-
Data type: float32
-
Name: stride_h
-
Data type: int
-
Name: stride_w
-
Data type: int
-
Operator Output
Name: y
Shape: all
Data types: float16, float32
Format: NC1HWC0
Main TIK APIs for Operator Implementation
data_move()
vec_muls()
Operator File/Function Name
upsample_tik
Operator Implementation
This section describes the key points of operator implementation.
Operator Code Implementation
- In this sample, the UpsampleTik operator supports two data types: float16 and float32. Verify the operator type and format, set parameters, and call the operator compute function.
def check_shape_dtype_format(input_shape, input_dtype, input_format): """ input_shape:input dic shape input_dtype: input dtype input_format: input format,NC1HWC0 The common check rule for tensor shape, just for 5hd """ tik_name = tik.Dprofile().get_product_name() if tik_name == "hisi-es": check_list = ["float16"] else: check_list = ["float16", "float32"] if input_dtype not in check_list: raise RuntimeError("upsample only support %s while dtype is %s" % (",".join(check_list), input_dtype)) util.check_shape_rule(input_shape) if len(input_shape) != DIM_5HD: raise RuntimeError( "The dim of tensor must be %d" ", actual dim is %d" % (DIM_5HD, len(input_shape))) n, c1, h, w, c0 = input_shape shape_c0 = C0 if input_shape[DIM_5HD - 1] != shape_c0: raise RuntimeError( "The value of C0 must be 16") if input_format != "NC1HWC0": raise RuntimeError( "The format must be NC1HWC0, while actual format is %s" %(input_format)) def upsample_check(dic, stride_h, stride_w, kernel_name="upsample"): """ calculating data Parameters ---------- dci : dict,include shape dtype and format stride : the shape change axis scale : the value of tensor change axis, default value is 1 kernel_name: str kernel_name Returns ------- res: TVM tensor result of compute """ input_shape = dic.get("shape") input_format = dic.get("format") input_dtype = dic.get("dtype").lower() if stride_h <= 0 or stride_w <= 0: raise RuntimeError( "The stride must be greater than 0") check_shape_dtype_format(input_shape, input_dtype, input_format) util.check_kernel_name(kernel_name)
- Implement the operator compute APIs in the following logic:
- Define the Upsample class and initialize the parameters used for subsequent computation in the initialization function. Compute the shape and placeholder for the input and output tensors based on parameters.
class Upsample: def __init__(self, input_dict, stride_h, stride_w): self.dbprofile = tik.Dprofile() self.tik_instance = tik.Tik(self.dbprofile) self.ub_size = self.dbprofile.get_unified_buffer_size() self.dtype = input_dict.get("x").get("dtype").lower() self.x_shape = input_dict.get("x").get("shape") self.dsize = get_data_size(self.dtype) self.y_shape = call_out_shape(self.x_shape, stride_h, stride_w) self.x_gm = self.tik_instance.Tensor(self.dtype, self.x_shape, name="x_gm", scope=tik.scope_gm) self.y_gm = self.tik_instance.Tensor(self.dtype, self.y_shape, name="y_gm", scope=tik.scope_gm)
- Compute tiling parameters based on the input and output shape.
ub_size_in_byte = self.ub_size - RESERVE_SIZE c0_size_in_ub = ub_size_in_byte // self.dsize // self.x_shape[-1]/2 out_loop, in_loop, axis, x_shape_in_ub, y_shape_in_ub = cal_tilling( self.x_shape, self.y_shape, c0_size_in_ub) x_axis_num = get_axis_size_shape(self.x_shape, axis) y_axis_num = get_axis_size_shape(self.y_shape, axis) if self.dtype == "float16": mask = 128 else: mask = 64 n, c1, y_h, y_w, c0 = self.y_shape c0_stride = c0 * self.dsize // BLOCK_SIZE block_size = BLOCK_SIZE // self.dsize if in_loop * out_loop > 1: thread_num = 2 else: thread_num = 1 with self.tik_instance.for_range(0, n, block_num=n) as blockid: c1_id = self.tik_instance.Scalar( "uint32", name="c1_id", init_value=0) c1_size = self.tik_instance.Scalar( "uint32", name="c1_size", init_value=x_shape_in_ub[1]) x_h_id = self.tik_instance.Scalar( "uint32", name="x_h_id", init_value=0) x_h_size = self.tik_instance.Scalar( "uint32", name="x_h_size", init_value=x_shape_in_ub[2]) x_w_id = self.tik_instance.Scalar( "uint32", name="x_w_id", init_value=0) x_w_size = self.tik_instance.Scalar( "uint32", name="x_w_size", init_value=x_shape_in_ub[3]) x_axis_size = self.tik_instance.Scalar( "uint32", name="x_axis_size") y_axis_size = self.tik_instance.Scalar( "uint32", name="y_axis_size") repeats = self.tik_instance.Scalar("uint32", name="repeats") loop = self.tik_instance.Scalar("uint32", name="loop") with self.tik_instance.for_range(0, out_loop * in_loop, thread_num=thread_num) as out_loopid: # with self.tik_instance.for_range(0, in_loop,thread_num=thread_num) as in_loopid: x_in_ub = self.tik_instance.Tensor( self.dtype, x_shape_in_ub, scope=tik.scope_ubuf, name="x_in_ub") y_in_ub = self.tik_instance.Tensor( self.dtype, y_shape_in_ub, scope=tik.scope_ubuf, name="y_in_ub") if axis == 1: # store the c1 start of x c1_id.set_as(out_loopid * x_shape_in_ub[1]) c1_size.set_as(x_shape_in_ub[1]) with self.tik_instance.if_scope(c1_id + c1_size > self.x_shape[axis]): c1_size.set_as(self.x_shape[axis] - c1_id) x_axis_size.set_as(c1_size) y_axis_size.set_as( c1_size * y_shape_in_ub[axis] // x_shape_in_ub[axis]) x_h_id.set_as(0) x_w_id.set_as(0) elif axis == 2: c1_id.set_as(out_loopid // in_loop) x_h_id.set_as(out_loopid % in_loop * x_shape_in_ub[axis]) x_w_id.set_as(0) c1_size.set_as(1) x_h_size.set_as(x_shape_in_ub[2]) with self.tik_instance.if_scope(x_h_size + x_h_id > self.x_shape[axis]): x_h_size.set_as(self.x_shape[axis] - x_h_id) x_axis_size.set_as(x_h_size) y_axis_size.set_as( x_axis_size * y_shape_in_ub[axis] // x_shape_in_ub[axis]) else: c1_id.set_as(out_loopid // in_loop // self.x_shape[2] * x_shape_in_ub[1]) x_h_id.set_as(out_loopid // in_loop % self.x_shape[2] * x_shape_in_ub[2]) x_w_id.set_as(out_loopid % in_loop * x_shape_in_ub[3]) c1_size.set_as(1) x_h_size.set_as(1) x_w_size.set_as(x_shape_in_ub[3]) with self.tik_instance.if_scope(x_w_size+x_w_id > self.x_shape[3]): x_w_size.set_as(self.x_shape[3] - x_w_id) x_axis_size.set_as(x_w_size) y_axis_size.set_as( x_w_size * y_shape_in_ub[axis] // x_shape_in_ub[axis]) self.tik_instance.data_move(x_in_ub[0, 0, 0, 0, 0], self.x_gm[blockid, c1_id, x_h_id, x_w_id, 0], 0, 1, x_axis_size * x_axis_num * self.dsize // BLOCK_SIZE, 0, 0) loop.set_as(x_w_size * block_size // mask // MAX_REPEAT) repeats.set_as(x_w_size * block_size // mask % MAX_REPEAT)
- Based on the tiling parameters, move data to the Unified Buffer, and process the data with the attribute scale, and then move the data out.
with self.tik_instance.for_range(0, c1_size) as c1_index: if axis == 3 or (axis == 2 and y_shape_in_ub[2] == x_shape_in_ub[2]): for i in range(0, stride_h): with self.tik_instance.for_range(0, x_h_size) as x_h_index: with self.tik_instance.for_range(0, stride_w) as stride_w_index: with self.tik_instance.for_range(0, loop) as loop_w_id: self.tik_instance.vec_muls( mask, y_in_ub[0, c1_index, x_h_index, MAX_REPEAT * loop_w_id * stride_w + stride_w_index, 0], x_in_ub[0, c1_index, (x_h_index + i * x_h_size) // stride_h, MAX_REPEAT * loop_w_id, 0], scale, MAX_REPEAT, c0_stride * stride_w, c0_stride) with self.tik_instance.if_scope(repeats > 0): self.tik_instance.vec_muls( mask, y_in_ub[0, c1_index, x_h_index, MAX_REPEAT * loop * stride_w + stride_w_index, 0], x_in_ub[0, c1_index, (x_h_index + i * x_h_size) // stride_h, (MAX_REPEAT * mask * loop) // block_size, 0], scale, repeats, c0_stride * stride_w, c0_stride) self.tik_instance.data_move( self.y_gm[blockid, c1_id, x_h_id * stride_h + i * x_h_size, x_w_id * stride_w, 0], y_in_ub[0, 0, 0, 0, 0], 0, 1, y_axis_size * y_axis_num * self.dsize // BLOCK_SIZE, 0, 0) else: with self.tik_instance.for_range(0, x_h_size * stride_h) as y_h_index: with self.tik_instance.for_range(0, stride_w) as stride_w_index: with self.tik_instance.for_range(0, loop) as loop_w_id: self.tik_instance.vec_muls( mask, y_in_ub[0, c1_index, y_h_index, MAX_REPEAT * loop_w_id * stride_w + stride_w_index, 0], x_in_ub[0, c1_index, y_h_index // stride_h, (MAX_REPEAT * loop_w_id), 0], scale, MAX_REPEAT, c0_stride * stride_w, c0_stride) with self.tik_instance.if_scope(repeats > 0): self.tik_instance.vec_muls( mask, y_in_ub[0, c1_index, y_h_index, MAX_REPEAT * loop * stride_w + stride_w_index, 0], x_in_ub[0, c1_index, y_h_index // stride_h, (MAX_REPEAT * loop), 0], scale, repeats, c0_stride * stride_w, c0_stride) with self.tik_instance.if_scope(c1_size == 1): self.tik_instance.data_move( self.y_gm[blockid, c1_id, x_h_id * stride_h, x_w_id * stride_w, 0], y_in_ub, 0, 1, y_axis_size * y_axis_num * self.dsize // BLOCK_SIZE, 1, 1) with self.tik_instance.if_scope(c1_size > 1): self.tik_instance.data_move( self.y_gm[blockid, c1_id, x_h_id * stride_h, x_w_id * stride_w, 0], y_in_ub, 0, 1, y_axis_size * y_axis_num * self.dsize // BLOCK_SIZE, 1, 1)
- Define the Upsample class and initialize the parameters used for subsequent computation in the initialization function. Compute the shape and placeholder for the input and output tensors based on parameters.
- Call BuildCCE() to perform building.
upsample_instance.tik_instance.BuildCCE(kernel_name=kernel_name, inputs=upsample_instance.x_gm, outputs=upsample_instance.y_gm)
Operator Plug-in Implementation
You need to customize the ParseParam_Upsample function to implement the attribute mapping from the UpsampleTik operator developed under the Caffe framework to the UpsampleTik operator that adapts to the Ascend AI Processor.
The ParseParam_Upsample function is implemented as follows:
Status ParseParams_Upsample(const ge::Operator& op_src, ge::Operator& op_dest) { // trans op_src to op_dest float scale; if (ge::GRAPH_SUCCESS == op_src.GetAttr("scale", scale)){ op_dest.SetAttr("scale", scale); } int stride; int stride_h; int stride_w; if (ge::GRAPH_SUCCESS == op_src.GetAttr("stride", stride)){ op_dest.SetAttr("stride_h", stride); op_dest.SetAttr("stride_w", stride); }else{ op_src.GetAttr("stride_h", stride_h); op_src.GetAttr("stride_w", stride_w); op_dest.SetAttr("stride_h", stride_h); op_dest.SetAttr("stride_w", stride_w); } return SUCCESS; }
Operator Prototype Definition
upsample.h defines the prototype of the UpsampleTik operator.
namespace ge { REG_OP(UpsampleTik) .INPUT(x, TensorType({DT_FLOAT16, DT_FLOAT})) .OUTPUT(y, TensorType({DT_FLOAT16, DT_FLOAT})) .ATTR(scale, Float, 1) .ATTR(stride_h, Int, 2) .ATTR(stride_w, Int, 2) .OP_END_FACTORY_REG(UpsampleTik) } // namespace ge #endif // GE_OP_UPSAMPLE_H
The key of prototype definition is to infer the shape and dtype of the output tensor, as shown in the following code.
namespace ge { // ----------------Upsample Op Begin------------------- IMPLEMT_VERIFIER(UpsampleTik, UpsampleVerify) { return GRAPH_SUCCESS; } IMPLEMT_INFERFUNC(UpsampleTik, UpsampleInferShape) { TensorDesc tensordesc_output = op.GetInputDesc("x"); uint32_t stride_h = 2; uint32_t stride_w = 2; if (op.GetAttr("stride_h", stride_h) != ge::GRAPH_SUCCESS) { stride_h = 2; } if (op.GetAttr("stride_w", stride_w) != ge::GRAPH_SUCCESS) { stride_w = 2; } ge::Shape shape = tensordesc_output.GetShape(); std::vector<int64_t> dims_input = shape.GetDims(); std::vector<int64_t> dimVector; for (size_t i = 0; i < dims_input.size(); i++) { if (i == 2 ) { int64_t dims = dims_input[i] * stride_h; dimVector.push_back(dims); } else if(i == 3) { int64_t dims = dims_input[i] * stride_w; dimVector.push_back(dims); }else { int64_t dims = dims_input[i]; dimVector.push_back(dims); } } Shape outputMaxShape(dimVector); tensordesc_output.SetShape(outputMaxShape); (void)op.UpdateOutputDesc("y", tensordesc_output); return GRAPH_SUCCESS; } INFER_FUNC_REG(UpsampleTik, UpsampleInferShape); VERIFY_FUNC_REG(UpsampleTik, UpsampleVerify); // ----------------Upsample Op End----------------- }
Operator Information Definition
For details about the information definition file of the UpsampleTik operator, see tbe/op_info_cfg/ai_core/<soc_version>/upsample_tik.ini.