TBE DSL APIs
Introduction
TBE Overview
Tensor Boost Engine, abbreviated as TBE, is a framework for developing custom operators based on the Tensor Virtual Machine (TVM). The TVM is an open source project of the community. It aims to further abstract the generation rules of operators by dividing the operators into operation primitives and combining the operators when necessary. According to the definition of the computation process of operators, the TVM uses the Schedule technology and the CodeGen technology to generate the operators for the specified hardware.
Schedule is used to describe the computation process for implementing an operator on hardware, which requires profound hardware knowledge. To reduce the difficulty in writing operators, the writing of Schedule is simplified based on the TVM. Guided by the concept of "Auto_Schedule", a collection of TBE APIs are provided to combine the computation of operators. By combination using APIs, you can define the computation process of an operator and hand over the schedule to "Auto_Schedule". This document describes the TBE domain-specific language (DSL) APIs defined based on the TVM. You can use these APIs to develop operators.
The TBE DSL APIs mainly cover vector operations, including element-wise operation APIs, reduction APIs, broadcast APIs, index operation APIs, concat APIs, convolution APIs, 4D to 5D conversion APIs, and matrix computation APIs.
Version Query
You can view the version number of the current TBE-DSL in the python/site-packages/te/te/version.py directory in the ATC installation path.
API Usage
Before calling TBE DSL APIs, declare the environment variable PYTHONPATH.
export install_path=/home/HwHiAiUser/Ascend/ascend-toolkit/latest
exprot PYTHONPATH=${install_path}/atc/python/site-packages/te:${install_path}/atc/python/site-packages/topi
Where, install_path indicates the ATC installation path.
Element-Wise Compute APIs
Element-wise compute APIs are used to compute input data element by element. The output usually has the same shape as the input.
vadd
Description
Adds two tensors element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vadd(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vadd(data1, data2)
vsub
Description
Subtracts two tensors element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vsub(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vsub(data1, data2)
vmul
Description
Multiplies two tensors element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vmul(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vmul(data1, data2)
vdiv
Description
Divides two tensors element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vdiv(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vdiv(data1, data2)
vmod
Description
Performs modulo operations on two tensors element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vmod(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vmod(data1, data2)
vmin
Description
Returns the min of two tensors element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vmin(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vmin(data1, data2)
vmax
Description
Returns the max of two tensors element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vmax(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vmax(data1, data2)
vor
Description
Performs the bitwise OR operation on two tensors.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
The supported data types are int16 and uint16.
Prototype
te.lang.cce.vor(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "int16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vor(data1, data2)
vand
Description
Performs the bitwise AND operation on two tensors.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
The supported data types are int16 and uint16.
Prototype
te.lang.cce.vand(lhs, rhs)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "int16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vand(data1, data2)
vadds
Description
Adds a scalar to a tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
In case of data type inconsistency, the scalar will be converted into the same data type as the tensor during computation.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vadds(raw_tensor, scalar)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- scalar: a scalar for the coefficient to be added to raw_tensor element-wise
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vadds(data, scalar)
vmins
Description
Compares a raw_tensor with a scalar element-wise and chooses the smaller one.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
In case of data type inconsistency, the scalar will be converted into the same data type as the raw_tensor during computation.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vmins(raw_tensor, scalar)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- scalar: a scalar for the coefficient to be compared with raw_tensor element-wise
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vmins(data, scalar)
vmaxs
Description
Compares a raw_tensor with a scalar element-wise and chooses the larger one.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
In case of data type inconsistency, the scalar will be converted into the same data type as the raw_tensor during computation.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vmaxs(raw_tensor, scalar)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- scalar: a scalar for the coefficient to be compared with raw_tensor element-wise
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vmaxs(data, scalar)
vmuls
Description
Multiplies a raw_tensor and a scalar element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
In case of data type inconsistency, the scalar will be converted into the same data type as the raw_tensor during computation.
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.vmuls(raw_tensor, scalar)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- scalar: a scalar for the coefficient by which raw_tensor is multiplied element-wise
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vmuls(data, scalar)
vcmp
Description
Compares lhs with rhs element-wise based on operation. The operations specified by operation include eq, ne, lt, gt, le, and ge, which indicate ==, !=, <, >, <=, and >=, respectively. If the expression is true, True is returned when the mode is bool, and 1 is returned when the mode is bit. If the expression is false, False is returned when the mode is bool, and 0 is returned when the mode is bit.
The following describes the meaning of each operation by using an expression. Parameter x indicates an element in lhs, parameter y indicates an element in rhs, parameter z indicates an element of the result tensor, and parameter n (the value ranging from 0 to 7) indicates the bit index of an element of the result tensor. The expressions are as follows:
- mode=='bool':
- lt: z = True (x < y) or False (x >= y)
- gt: z = True (x > y) or False (x <= y)
- le: z = True (x <= y) or False (x > y)
- ge: z = True (x >= y) or False (x < y)
- eq: z = True (x == y) or False (x != y)
- ne: z = True (x != y) or False (x == y)
- mode=='bit':
- lt: z[n] = 1 (x < y) or 0 (x >= y)
- gt: z[n] = 1 (x > y) or 0 (x <= y)
- le: z[n] = 1 (x <= y) or 0 (x > y)
- ge: z[n] = 1 (x >= y) or 0 (x < y)
- eq: z[n] = 1 (x == y) or 0 (x != y)
- ne: z[n] = 1 (x != y) or 0 (x == y)
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
- The left and right operands for comparison must have the same data type.
- When mode is set to bool and the te.lang.cce.cce_build_code API is called for compilation, bool_storage_as_1bit in the passed configuration parameter must be set to False. Otherwise, unexpected output shape will be obtained.
bool_storage_as_1bit is set to True by default, indicating that it is stored as 1-bit data.
The following gives a build configuration file template.
with tvm.target.cce(): schedule = generic.auto_schedule(res) config = {"name": kernel_name, "tensor_list": [data_x, data_y, res], "bool_storage_as_1bit": False} te.lang.cce.cce_build_code(schedule, config)
- When mode is set to bit, the last dimension of the shape of the left operand must be divisible by 8.
- If the right operand is also a tensor, the two tensors must have the same shape.
- Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vcmp(lhs, rhs, operation='lt', mode='bool')
Arguments
- lhs: a tvm.tensor for the left operand
- rhs: a tvm.tensor or scalar for the right operand
- operation: operation type selected from eq, ne, lt, gt, ge, or le. Defaults to lt.
- mode: mode selected from bool or bit. Defaults to bool.
Returns
res_tensor: a tvm.tensor for the result tensor If mode is set to bool, the data type is bool. If mode is set to bit, the data type is uint8.
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vcmp(data1, data2, 'lt', 'bit')
vlogic
Description
Performs the logical AND/OR operation on two tensors element-wise, or performs the logical NOT operation on one tensor.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The supported data type is bool. The elements of the two tensors must have the same data type.
Prototype
te.lang.cce.vlogic(lhs, rhs=None, operation='logic_and')
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
- operation: operation type selected from logic_and, logic_or, or logic_not. Defaults to logic_and. To perform the logic_not operation, you must set rhs to None.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "bool" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vlogic(data1, data2, 'logic_and')
vsel
Description
Compares the condition element with True or 1 based on the data type of condition. If the expression is true, the value of x is returned. Otherwise, the value of y is returned.
- If the data type of condition is bool, the condition element is compared with True. If the expression is true, the value of x is returned. Otherwise, the value of y is returned.
- If the data type of condition is uint8, the condition element is compared with the bit index or 1. If the expression is true, the value of x is returned. Otherwise, the value of y is returned.
The following uses an expression to explain the comparison. i indicates the element in the condition, the parameter x' indicates the x element or x, the parameter y' indicates the y element or y, the parameter z indicates the element of the result tensor, and the parameter n (value range: 0–7) indicates the bit index of the condition element. The expression is as follows:
- When the data type of condition is bool:
z = (i==True)? x': y'
- When the data type of condition is uint8:
z = (i[n]==1)? x': y'
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
- x and y must have the same data type.
- When condition is an uint8, the last dimension of the shape of x and y must be a multiple of 8.
- When condition is a bool and the te.lang.cce.cce_build_code API is called for compilation, bool_storage_as_1bit in the passed configuration parameter must be set to False.
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vsel(condition, x, y)
Arguments
- condition: a tvm.tensor for the condition tensor of type bool or uint8.
- x: a tvm.tensor or scalar for the possible return value
- y: a tvm.tensor or scalar for the possible return value
Returns
res_tensor: a tvm.tensor for the result tensor
vcmpsel
Description
Compares lhs with rhs element-wise based on operation. The operations specified by operation include eq, ne, lt, gt, le, and ge, which indicate ==, !=, <, >, <=, and >=, respectively. If the expression is true, the value of slhs is returned. Otherwise, the value of srhs is returned.
- lt: res = c (a < b) or d (a >= b)
- gt: res = c (a > b) or d (a <= b)
- le: res = c (a <= b) or d (a > b)
- ge: res = c (a >= b) or d (a < b)
- eq: res = c (a == b) or d (a != b)
- ne: res = c (a != b) or d (a == b)
- If rhs is None, the elements in lhs are compared with the floating-point number 2.0.
- If slhs is None and the expression is true, the value of lhs is returned.
- If srhs is None and rhs is a tensor, the value of rhs is returned when the expression is not true.
If srhs is None and rhs is a scalar, the floating-point number 0.0 is returned when the expression is not true.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The arguments have the same data type.
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vcmpsel(lhs,rhs=None,operation='lt', slhs=None, srhs=None)
Arguments
- lhs: a tvm.tensor for the left operand
- rhs: a tvm.tensor or scalar for the right operand. Defaults to None.
- slhs: a tvm.tensor or scalar for the value returned when the comparison expression is true. Defaults to None.
- srhs: a tvm.tensor or scalar for the value returned when the comparison expression is not true. Defaults to None.
- operation: operation type selected from eq, ne, lt, gt, ge, or le. Defaults to lt.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) data3 = tvm.placeholder(shape, name="data3", dtype=input_dtype) data4 = tvm.placeholder(shape, name="data4", dtype=input_dtype) res = te.lang.cce.vcmpsel(data1, data2, 'gt', data3, data4)
vlog
Description
Performs the logarithmic ln(x) operation a raw_tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vlog(raw_tensor, priority_flag=0)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- priority_flag: a scalar for the priority flag
- 1 indicates that the precision takes priority. In this case, the performance deteriorates because the computation process is complex.
- 0 indicates that the performance takes priority. In this case, the precision deteriorates.
Defaults to 0.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vlog(data)
vexp
Description
Performs the natural exponential operation e^x on a tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vexp(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vexp(data)
vabs
Description
Performs the absolute value operation |x| on a tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vabs(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vabs(data)
vrec
Description
Performs the reciprocal operation 1/x on a tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vrec(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vrec(data)
vrelu
Description
Performs the ReLU operation on a tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16
Prototype
te.lang.cce.vrelu(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vrelu(data)
vnot
Description
Performs bitwise NOT on a tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The supported data types are int16 and uint16.
Prototype
te.lang.cce.vnot(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "int16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vnot(data)
vsqrt
Description
Computes the square root for a tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vsqrt(raw_tensor, priority_flag=0)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- priority_flag: a scalar for the priority flag
- 1 indicates that the precision takes priority. In this case, the performance deteriorates because the computation process is complex.
- 0 indicates that the performance takes priority. In this case, the precision deteriorates.
Defaults to 0.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float 16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vsqrt(data)
vrsqrt
Description
Calculates the reciprocal of the square root for a tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vrsqrt(raw_tensor, priority_flag=0)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- priority_flag: a scalar for the priority flag
- 1 indicates that the precision takes priority. In this case, the performance deteriorates because the computation process is complex.
- 0 indicates that the performance takes priority. In this case, the precision deteriorates.
Defaults to 0.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float 16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vrsqrt(data)
vaxpy
Description
Multiplies lhs by a scalar and adds rhs to the result element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
- lhs and rhs must have the same data type and shape.
- If the data type of the scalar is different from that of the tensor, the data type of the tensor prefers.
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vaxpy(lhs, rhs, scalar)
Arguments
- lhs: a tvm.tensor for the left tensor
- rhs: a tvm.tensor for the right tensor
- scalar: a scalar for the coefficient by which lhs is multiplied
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vaxpy(data1, data2, scalar)
vmla
Description
Multiplies tensor_0 by tensor_1 and adds tensor_2 to the result element-wise. The corresponding computation formula is tensor_0 * tensor_1 + tensor_2.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vmla(tensor_0, tensor_1, tensor_2)
Arguments
- tensor_0: a tvm.tensor for tensor 0
- tensor_1: a tvm.tensor for tensor 1
- tensor_2: a tvm.tensor for tensor 2
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) data3 = tvm.placeholder(shape, name="data3", dtype=input_dtype) res = te.lang.cce.vmla(data1, data2, data3)
vmadd
Description
Multiplies tensor_0 by tensor_2 and adds tensor_1 to the result element-wise. The corresponding computation formula is tensor_0 * tensor_2 + tensor_1
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type and same shape.
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vmadd(tensor_0, tensor_1, tensor_2)
Arguments
- tensor_0: a tvm.tensor for tensor 0
- tensor_1: a tvm.tensor for tensor 1
- tensor_2: a tvm.tensor for tensor 2
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) data3 = tvm.placeholder(shape, name="data3", dtype=input_dtype) res = te.lang.cce.vmadd(data1, data2, data3)
vmaddrelu
Description
Multiplies tensor_0 by tensor_2 and adds tensor_1 to the result element-wise. Then, performs ReLU. The corresponding computation formula is relu(tensor_0 * tensor_2 + tensor_1).
The API is defined in python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py in the ATC installation path.
Restrictions
The tensors must have the same data type.
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.vmaddrelu(tensor_0, tensor_1, tensor_2)
Arguments
- tensor_0: a tvm.tensor for tensor 0
- tensor_1: a tvm.tensor for tensor 1
- tensor_2: a tvm.tensor for tensor 2
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) data3 = tvm.placeholder(shape, name="data3", dtype=input_dtype) res = te.lang.cce.vmaddrelu(data1, data2, data3)
Reduction APIs
Reduction APIs are used to reduce the data of a dimension and perform operations such as accumulation and multiplication on the data in a specified direction. The operation output has one less tensor in a dimension than the input data.
General Restrictions
Due to the data arrangement restriction on the computing platform, the data after the reduction operation needs to be rearranged for subsequent operations. Therefore, when different types of APIs are used, no vector operation can be performed after the reduction operation.
sum
Description
Computes the sum along an axis to reduce dimensions.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/reduction_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.sum(raw_tensor, axis, keepdims=False)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- axis: an int or a list of ints for the axis along which a reduction is performed. The value range is [–d, d – 1], where d indicates the dimension count of raw_tensor
- keepdims: Defaults to False, indicating that the axis length is 0 after reduction. For example, if the original shape is (10, 10, 10) and keepdims is set to False, the shape after reduction is (10, 10). If this parameter is set to True, the axis length is set to 1 after reduction. For example, if the original shape is (10, 10, 10) and keepdims is set to True, the shape after reduction is (10, 10, 1).
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.sum(data, axis=1)
reduce_min
Description
Computes the minimum along an axis to reduce dimensions.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/reduction_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.reduce_min(raw_tensor, axis, keepdims=False,priority_flag=False)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- axis: an int or a list of ints for the axis along which a reduction is performed. The value range is [–d, d – 1], where d indicates the dimension count of raw_tensor
- keepdims: Defaults to False, indicating that the axis length is 0 after reduction. For example, if the original shape is (10, 10, 10) and keepdims is set to False, the shape after reduction is (10, 10). If this parameter is set to True, the axis length is set to 1 after reduction. For example, if the original shape is (10, 10, 10) and keepdims is set to True, the shape after reduction is (10, 10, 1).
- priority_flag: high-accuracy mode or high-performance mode select. True: high-accuracy mode; False: high-performance mode.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.reduce_min(data, axis=1)
reduce_max
Description
Computes the maximum along an axis to reduce dimensions.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/reduction_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.reduce_max(raw_tensor, axis, keepdims=False, priority_flag=False)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- axis: an int or a list of ints for the axis along which a reduction is performed. The value range is [–d, d – 1], where d indicates the dimension count of raw_tensor
- keepdims: Defaults to False, indicating that the axis length is 0 after reduction. For example, if the original shape is (10, 10, 10) and keepdims is set to False, the shape after reduction is (10, 10). If this parameter is set to True, the axis length is set to 1 after reduction. For example, if the original shape is (10, 10, 10) and keepdims is set to True, the shape after reduction is (10, 10, 1).
- priority_flag: high-accuracy mode or high-performance mode select. True: high-accuracy mode; False: high-performance mode.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.reduce_max(data, axis=1)
reduce_prod
Description
Computes the product along an axis to reduce dimensions.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/reduction_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.reduce_prod(raw_tensor, axis, keepdims=False)
Arguments
- raw_tensor: a tvm.tensor for the input tensor
- axis: an int for the axis along which a reduction is performed. The value range is [–d, d – 1], where d indicates the dimension count of raw_tensor
- keepdims: Defaults to False, indicating that the axis length is 0 after reduction. For example, if the original shape is (10, 10, 10) and keepdims is set to False, the shape after reduction is (10, 10). If this parameter is set to True, the axis length is set to 1 after reduction. For example, if the original shape is (10, 10, 10) and keepdims is set to True, the shape after reduction is (10, 10, 1).
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.reduce_prod(data, axis=1)
Broadcast API
Broadcast APIs are used to process two tensors with different shapes, specifically, to broadcast a lower-dimensional operand according to a higher-dimensional operand so that the dimensions of the two operands can be the same, and then, compute the operands element-wise.
broadcast
Description
Broadcasts var to a tensor with the target shape (the second parameter). The data type of the result is specified by output_dtype.
As shown in the following example, the shape of A is (2, 1), that is, two rows and one column. It is broadcast to the target shape (2, 3), that is, two rows and three columns. In this case, the original column is broadcast to three same columns.
For example, if the shape of var is (2, 1, 64) and the target shape is (2, 128, 64), the shape of the result tensor after calling the broadcast API is (2, 128, 64).
The API is defined in python/site-packages/te/te/lang/cce/te_compute/broadcast_compute.py in the ATC installation path.
Restrictions
The shape of var and the second parameter shape have the same value. Each dimension is either the same as shape or equals to 1. If the dimension is 1, it is broadcast to be the same as the target shape.
The supported types are float16, float32, and int32.
Prototype
te.lang.cce.broadcast(var, shape, output_dtype=None)
Arguments
- var: a scalar or tensor for data to be broadcast
- shape: target shape for the broadcast operation
- output_dtype: output data type. Defaults to var.dtype.
Returns
res_tensor: tensor obtained after var extension. shape is specified by the parameter. The data type is output_dtype.
Example
As shown in the following code, the tensor with shape (1024, 1) is broadcast to shape (1024, 1024) by calling the broadcast API.
outshape = (1024,1024) shape = (1024,1) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.broadcast(data, outshape)
Segment Compute APIs
Segment compute APIs are used to compute the sum, mean, inner product, maximum, minimum along the segments of a tensor.
unsorted_segment_sum
Description
Computes the sum along segments of a tensor by using the array segment_ids. Assuming that the input is data and the output is output, then output[i] = sum(data[j...]), where j... is an array, and element j meets the following requirement: segment_ids[j] == i.
If a subscript i does not appear in segment_ids, then output[i] = init_value. For example, in the following figure, if 1 does not appear in segment_ids, then output[1] = 0.
If a value in segment_ids is a negative number, the value of data in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of data[3] is discarded and is not involved in the calculation.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/segment_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.unsorted_segment_sum(tensor, segment_ids, num_segments, init_value=0)
Arguments
- tensor: input tensor
- segment_ids: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.
- num_segments: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of segment_ids plus 1.
- init_value: default value of the output when a subscript in segment_ids does not exist. It is determined based on the implementation of the operator. Defaults to 0.
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1", dtype=input_dtype) segment_ids = [1,1,4,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_sum(data, segment_ids, num_segments) res.shape = (6,1024) # res[0] = 0 # res[1] = data[0] + data[1] # res[2] = 0 # res[3] = 0 # res[4] = data[2] # res[5] = data[3] + data[4]
unsorted_segment_mean
Description
Computes the mean along segments of a tensor by using the array segment_ids. Assuming that the input is data and the output is output, then output[i] = (1/len(j...)) sum(data[j...]), where j... is an array, and element j meets the following requirement: segment_ids[j] == i.
If a subscript i does not appear in segment_ids, then output[i] = init_value (the default value is 0). For example, in the following figure, if 1 does not appear in segment_ids, then output[1] = 0.
If a value in segment_ids is a negative number, the value of data in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of data[3] is discarded and is not involved in the calculation.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/segment_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.unsorted_segment_mean(tensor, segment_ids, num_segments, init_value=0)
Arguments
- tensor: input tensor
- segment_ids: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.
- num_segments: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of segment_ids plus 1.
- init_value: default value of the output when a subscript in segment_ids does not exist. It is determined based on the implementation of the operator. Defaults to 0.
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1", dtype=input_dtype) segment_ids = [1,1,5,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_mean(data, segment_ids, num_segments) # res.shape = (6,1024) # res[0] = 0 # res[1] = (data[0] + data[1]) / 2 # res[2] = 0 # res[3] = 0 # res[4] = 0 # res[5] = (data[2] + data[3] + data[4]) / 3
unsorted_segment_prod
Description
Computes the inner product along segments of a tensor by using the array segment_ids. Assuming that the input is data and the output is output, then output[i] = product(data[j...]), where j... is an array, and element j meets the following requirement: segment_ids[j] == i.
The parameter product indicates the inner product, that is, all elements in data[j...] are multiplied by each other.
If a subscript i does not appear in segment_ids, then output[i] = init_value (the default value is 0). For example, in the following figure, if 1 does not appear in segment_ids, then output[1] = 0.
If a value in segment_ids is a negative number, the value of data in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of data[3] is discarded and is not involved in the calculation.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/segment_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.unsorted_segment_prod(tensor, segment_ids, num_segments, init_value=0)
Arguments
- tensor: input tensor
- segment_ids: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.
- num_segments: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of segment_ids plus 1.
- init_value: When a subscript in segment_ids does not exist, the default value 0 is output.
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1", dtype=input_dtype) segment_ids = [1,1,4,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_prod(data, segment_ids, num_segments) # res.shape = (6,1024) # res[0] = 0 # res[1] = (data[0] * data[1]) # res[2] = 0 # res[3] = 0 # res[4] = data[2] # res[5] = (data[3] * data[4])
unsorted_segment_min
Description
Computes the minimum along segments of a tensor by using the array segment_ids. Assuming that the input is data and the output is output, then output[i] = min(data[j...]), where j... is an array, and element j meets the following requirement: segment_ids[j] == i.
If a subscript i does not appear in segment_ids, then output[i] = init_value (the default value is 0). For example, in the following figure, if 1 does not appear in segment_ids, then output[1] = 0.
If a value in segment_ids is a negative number, the value of data in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of data[3] is discarded and is not involved in the calculation.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/segment_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.unsorted_segment_min(tensor, segment_ids, num_segments, init_value=0)
Arguments
- tensor: input tensor
- segment_ids: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.
- num_segments: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of segment_ids plus 1.
- init_value: When a subscript in segment_ids does not exist, the default value 0 is output.
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1", dtype=input_dtype) segment_ids = [1,1,4,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_min(data, segment_ids, num_segments) # res.shape = (6,1024) # res[0] = 0 # res[1] = min(data[0], data[1]) # res[2] = 0 # res[3] = 0 # res[4] = data[2] # res[5] = min(data[3], data[4])
unsorted_segment_max
Description
Computes the maximum along segments of a tensor by using the array segment_ids. Assuming that the input is data and the output is output, then output[i] = max(data[j...]), where j... is an array, and element j meets the following requirement: segment_ids[j] == i.
If a subscript i does not appear in segment_ids, then output[i] = init_value (the default value is 0). For example, in the following figure, if 1 does not appear in segment_ids, then output[1] = 0.
If a value in segment_ids is a negative number, the value of data in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of data[3] is discarded and is not involved in the calculation.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/segment_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32, int32
Prototype
te.lang.cce.unsorted_segment_max(tensor, segment_ids, num_segments, init_value=0)
Arguments
- tensor: input tensor
- segment_ids: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.
- num_segments: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of segment_ids plus 1.
- init_value: default value of the output when a subscript in segment_ids does not exist. It is determined based on the implementation of the operator. Defaults to 0.
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1",type=input_dtype) segment_ids = [1,1,4,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_max(data,segment_ids,num_segments) # res.shape = (6,1024) # res[0] = 0 # res[1] = max(data[0], data[1]) # res[2] = 0 # res[3] = 0 # res[4] = data[2] # res[5] = max(data[3], data[4])
Inplace Compute APIs
Inplace compute APIs are used to compute tensors based on rows, such as add, sub, and update.
inplace_add
Description
Adds lhs and rhs based on a specified row.
For example:
res = lhs res[ids,:] += rhs return res
The API is defined in python/site-packages/te/te/lang/cce/te_compute/inplace_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32, int32
- The maximum value of the first dimension of rhs is 7934. A value larger than 7934 cannot be processed.
- If the first dimension of rhs is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the ulimit -s command to increase the stack space, for example, from 8192 to 81920.
Prototype
te.lang.cce.inplace_add(lhs, inplace_ids, rhs)
Arguments
- lhs: input left tensor
- inplace_ids: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of lhs. The length must be the same as that of the first dimension of rhs.
- rhs: right tensor or scalar. The dimensions must be the same as those of lhs, except the first dimension. If inplace_ids is of integer type, rhs has one dimension less than lhs. For example, lhs is (10, 1024), inplace_ids is [5], and rhs is (1, 1024); lhs is (10, 1024), inplace_ids is 5, and rhs is (1024,).
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_add(dataA, inplace_ids, dataB) res.shape = (6,1024) # res[0] = dataA[0] # res[1] = dataA[1] + dataB[0] + dataB[1] # res[2] = dataA[2] + dataB[3] + dataB[4] # res[3] = dataA[3] # res[4] = dataA[4] + dataB[2] # res[5] = dataA[5]
inplace_sub
Description
Subtracts lhs and rhs based on a specified row.
For example:
res = lhs res[ids,:] -= rhs return res
The API is defined in python/site-packages/te/te/lang/cce/te_compute/inplace_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32, int32
- The maximum value of the first dimension of rhs is 7934. A value larger than 7934 cannot be processed.
- If the first dimension of rhs is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the ulimit -s command to increase the stack space, for example, from 8192 to 81920.
Prototype
te.lang.cce.inplace_sub(lhs, inplace_ids, rhs)
Arguments
- lhs: input left tensor
- inplace_ids: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of lhs. The length must be the same as that of the first dimension of rhs.
- rhs: right tensor or scalar. The dimensions must be the same as those of lhs, except the first dimension. If inplace_ids is of integer type, rhs has one dimension less than lhs. For example, lhs is (10, 1024), inplace_ids is [5], and rhs is (1, 1024); lhs is (10, 1024), inplace_ids is 5, and rhs is (1024,).
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_sub(dataA, inplace_ids, dataB) res.shape = (6,1024) # res[0] = dataA[0] # res[1] = dataA[1] - dataB[0] - dataB[1] # res[2] = dataA[2] - dataB[3] - dataB[4] # res[3] = dataA[3] # res[4] = dataA[4] - dataB[2] # res[5] = dataA[5]
inplace_update
Description
Replaces the specified row of lhs with rhs for computation.
For example:
res = lhs res[ids,:] = rhs return res
The API is defined in python/site-packages/te/te/lang/cce/te_compute/inplace_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32, int32
- The maximum value of the first dimension of rhs is 7934. A value larger than 7934 cannot be processed.
- If the first dimension of rhs is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the ulimit -s command to increase the stack space, for example, from 8192 to 81920.
Prototype
te.lang.cce.inplace_update(lhs, inplace_ids, rhs)
Arguments
- lhs: input left tensor
- inplace_ids: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of lhs. The length must be the same as that of the first dimension of rhs.
- rhs: right tensor or scalar. The dimensions must be the same as those of lhs, except the first dimension. If inplace_ids is of integer type, rhs has one dimension less than lhs. For example, lhs is (10, 1024), inplace_ids is [5], and rhs is (1, 1024); lhs is (10, 1024), inplace_ids is 5, and rhs is (1024,).
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_update(dataA, inplace_ids, dataB) res.shape = (6,1024) # res[0] = dataA[0] # res[1] = dataB[1] # res[2] = dataB[4] # res[3] = dataA[3] # res[4] = dataB[2] # res[5] = dataA[5]
Cast Compute APIs
Cast compute APIs are used to round the input tensor element-wise based on certain rules.
ceil
Description
Rounds up a raw_tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/cast_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.ceil(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor of type int32
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.ceil(data)
floor
Description
Rounds down a raw_tensor element-wise.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/cast_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.floor(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor of type int32
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.floor(data)
round
Description
Performs banker's rounding on a raw_tensor element-wise. For example, the constant interval between adjacent members is 0.5, then 1.5 rounds up to 2, and 2.5 rounds down to 2.0.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/cast_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.round(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor of type int32
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.round(data)
trunc
Description
Rounds a raw_tensor towards 0 element-wise. For example, –1.9 rounds up to –1, and 1.9 rounds down to 1.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/cast_compute.py in the ATC installation path.
Restrictions
Ascend 910 AI Processor: float16, float32
Prototype
te.lang.cce.trunc(raw_tensor)
Arguments
raw_tensor: a tvm.tensor for the input tensor
Returns
res_tensor: a tvm.tensor for the result tensor of type int32
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.trunc(data)
Concat Compute API
concat
Description
Reconcatenates multiple input tensors based on a specified axis.
raw_tensors indicates multiple input tensors, which have the same data type.
If raw_tensors[i].shape = [D0, D1, ... Daxis(i), ...Dn], the shape of the output after concatenation based on axis is [D0, D1, ... Raxis, ...Dn].
Where, Raxis = sum(Daxis(i)).
For example:
t1 = [[1, 2, 3], [4, 5, 6]] t2 = [[7, 8, 9], [10, 11, 12]] concat([t1, t2], 0) # [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]] concat([t1, t2], 1) # [[1, 2, 3, 7, 8, 9], [4, 5, 6, 10, 11, 12]] # The shape of tensor t1 is [2, 3]. # The shape of tensor t2 is [2, 3]. concat([t1, t2], 0).shape # [4, 3] concat([t1, t2], 1).shape # [2, 6]
The parameter axis can also be a negative number, indicating the axis + len(shape) axis, which is computed starting from the end of the dimension.
For example:
t1 = [[[1, 2], [2, 3]], [[4, 4], [5, 3]]] t2 = [[[7, 4], [8, 4]], [[2, 10], [15, 11]]] concat([t1, t2], -1)
The output is as follows:
[[[ 1, 2, 7, 4], [ 2, 3, 8, 4]], [[ 4, 4, 2, 10], [ 5, 3, 15, 11]]]
The API is defined in python/site-packages/te/te/lang/cce/te_compute/concat_compute.py in the ATC installation path.
Restrictions
This API cannot be used in conjunction with other TBE DSL APIs.
For input tensors, the axes except axis must have the same dimensions.
The supported data types are as follows: int8, uint8, int16, int32, float16, and float32.
Prototype
te.lang.cce.concat(raw_tensors, axis)
Arguments
- raw_tensors: tensor list, list type. The element is tvm.tensor, and the last dimension of the tensor shape must be 32-byte aligned.
- axis: axis along which a concatenation is performed. The value range is [–d, d – 1]. The parameter d indicates the dimension count of a raw_tensor.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
import tvm import te.lang.cce shape1 = (64,128) shape1 = (64,128) input_dtype = "float16" data1 = tvm.placeholder(shape1, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape2, name="data1", dtype=input_dtype) data = [data1, data2] res = te.lang.cce.concat(data, 0) # res.shape = (128,128)
Convolution Compute API
conv
Description
Computes the 2D convolution of the float16 type with the given 5HD data and FracZ weight.
The shape of the data tensor is 5HD, that is, (N, C1, H, W, C0). The shape of the weight tensor is FracZ, that is, (C1 x KH x KW, Cout//C0_out, C0_out, C0).
This API supports bias.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/conv_compute.py in the ATC installation path.
Restrictions
This API cannot be used in conjunction with other TBE DSL APIs.
The supported data type is float16.
Prototype
te.lang.cce.conv(data, weight, para_dict, optim_dict=None, dsl_flag=True)
Arguments
- Data: feature map for 2D convolution, tensor, 5HD format. Currently, the float16 type is supported.
- Weight: weight for 2D convolution, tensor, FracZ format. Currently, the float16 type is supported.
- para_dict: a dictionary of parameters.Currently, the following parameters must be passed to para_dict:
- pad_h: padding height of the feature map for 2D convolution (int type)
- pad_w: padding width of the feature map for 2D convolution (int type)
- stride_h: stride height direction of the feature map for 2D convolution (int type)
- stride_w: stride width of the feature map for 2D convolution (int type)
- filter_h: filter height for 2D convolution (int type)
- filter_w: filter width for 2D convolution (int type)
Currently, the following optional parameters are supported in para_dict:
- Bias: a tensor. It is supported in 2D convolution of float16 type. If this argument exists, bias is included in the convolutional computation. Otherwise, bias is not included. In addition, Bias must have the same size as cout.
- optim_dict: a dictionary for the enabled optimization featuresThe key values are of type bool.
- True: feature enabled
- False: feature disabled
Defaults to optim_dict = {"c0_optim_flg": False}, indicating that the C0 = 4 feature is disabled.
Note that only c0_optim_flg is currently configurable.
- dsl_flag: whether to support automatic UB fusion. True: yes; False: no.
Returns
res_tensor: output tensor for convolutional computation
Example
Take 2D convolution of float16 type as an example. The feature map is (Batch = 1, C = 32, H = 16, W = 8), Cout is 32, and Weight is (KernelH = KernelW = 1),
Then, the shape of the 5HD feature map tensor is (Batch, C/16, H, W, 16) = (1, 2, 16, 8, 16).
The shape of the FracZ weight tensor is (KernelH * KernelW * C/16, Cout/16, 16, 16)= (2, 2, 16, 16).
The shape of the bias tensor is (Cout,) = (32,).
import te from te import tvm shape_in = (1, 2, 16, 8, 16) shape_w = (2, 2, 16, 16) pad_h = 0 pad_w = 0 stride_h = 1 stride_w = 1 filter_h = 1 filter_w = 1 Data = tvm.placeholder(shape_in, name='FmapW', dtype="float16") Weight = tvm.placeholder(shape_w, name='FilterW', dtype="float16") bias_tensor = tvm.placeholder( (shape_w[1] * shape_w[2], ), name='Bias', dtype="float16") res_tensor = te.lang.cce.conv( Data, Weight, {"bias_tensor": bias_tensor, "pad_h": pad_h, "pad_w": pad_w, "stride_h": stride_h, "stride_w": stride_w, "filter_h": filter_h, "filter_w": filter_w, "offset_a":0})
4D/5D Conversion APIs
compute_four2five
Description
Converts the 4D data format NCHW to the 5D data format NC1HWC0.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/dim_conv.py in the ATC installation path.
Restrictions
This API cannot be used in conjunction with other TBE DSL APIs.
The supported data type is float16.
Prototype
te.lang.cce.compute_four2five(input, raw_shape_4D)
Arguments
- input: a 4D tvm.tensor for the input tensor (N,C,H,W)
- raw_shape_4D: format of the input tensor
Returns
res_tensor: a 5D tvm.tensor for the result tensor (N,C1,H,W,C0)
Example
import tvm import te.lang.cce raw_shape = (2, 32, 16, 128) in_dtype = "float16" input = tvm.placeholder(raw_shape, name='input', dtype=in_dtype) res = te.lang.cce.compute_four2five(input, raw_shape) # res.shape = (2,(32+15)//16,16,128,16)
compute_five2four
Description
Converts the 5D data format NC1HWC0 to the 4D data format NCHW.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/dim_conv.py in the ATC installation path.
Restrictions
This API cannot be used in conjunction with other TBE DSL APIs.
The supported data type is float16.
Prototype
te.lang.cce.compute_five2four(input, raw_shape_4D)
Arguments
- input: a 5D tvm.tensor for the input tensor (N,C1,H,W,C0)
- raw_shape_4D: format of the result tensor
Returns
res_tensor: a 4D tvm.tensor for the result tensor (N, C, H, W)
Example
import tvm import te.lang.cce raw_shape = (2, 32, 16, 128) input_shape = (2,(32+15)//16,16,128,16) in_dtype = "float16" input = tvm.placeholder(input_shape, name='input', dtype=in_dtype) res = te.lang.cce.compute_five2four(input, raw_shape) # res.shape = (2, 32, 16, 128)
Matmul Compute API
matmul
Description
Multiplies the matrix. The formula is tensor_c = trans_a(tensor_a) * trans_b(tensor_b) + tensor_bias.
For tensor_a and tensor_b, the last two dimensions of shape (after transposition) must meet the following matrix multiplication condition: (M, K) * (K, N) = (M, N). Multiple dimensions are supported. If is_fractal is set to True, the data layout of tensor_a must meet the fractal structure of L0A, and the data layout of tensor_b must meet the fractal structure of L0B. If is_fractal is set to False, both tensor_a and tensor_b use the ND layout.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/mmad_compute.py in the ATC installation path.
Restrictions
This API cannot be used in conjunction with other TBE DSL APIs.
The input supports float16, and the output supports float16 and float32.
Prototype
te.lang.cce.matmul(tensor_a, tensor_b, trans_a=False, trans_b=False, format_a="ND", format_b="ND", alpha_num=1.0, beta_num=0.0, dst_dtype="float16", tensor_bias=None, quantize_params=None)
Arguments
- tensor_a: a tvm.tensor for matrix A
- tensor_b: a tvm.tensor for matrix B
- trans_a: a bool specifying whether to transpose matrix A
- trans_b: a bool specifying whether to transpose matrix B
- format_a: format of matrix A, either ND or fractal.
- format_b: format of matrix B, either ND or fractal.
- alpha_num: a broadcast parameter, which is not used currently. Defaults to 1.0.
- beta_num: a broadcast parameter, which is not used currently. Defaults to 0.0.
- dst_dtype: output data type, either float16 or float32.
- tensor_bias: Defaults to None. If the value is not empty, tensor_bias is added to the computation result obtained after matrix A is multiplied by matrix B. The shape of tensor_bias supports broadcasting. The data type of tensor_bias must be the same as dst_dtype.
- quantize_params: quantization parametersquantize_params: input parameter about quantization, which is in the dictionary format. If quantize_params is None, quantization is disabled. If quantize_params is not None, quantization is enabled. The parameters are as follows:
- quantize_alg: quantization mode. The value can be NON_OFFSET (default) or HALF_OFFSET_A.
- scale_mode_a: reserved
- scale_mode_b: reserved
- scale_mode_out: output dequantization, that is, the value type of the quantization parameter. The value can be SCALAR (default) or VECTOR.
- sqrt_mode_a: reserved
- sqrt_mode_b: reserved
- sqrt_mode_out: whether the square root of scale_drq is extracted. The value can be NON_SQRT (default) or SQRT.
- scale_q_a: reserved
- offset_q_a: reserved
- scale_q_b: reserved
- offset_q_b: reserved
- scale_drq: placeholder of the output dequantization or requantization weight parameter. Defaults to None.
- offset_drq: reserved
The quantization modes are as follows:
- Input quantization: refers to quantization from input data to intermediate data. Generally, fp16 data is quantized to the int8 or uint8 data.
- Output quantization: refers to quantization from intermediate data to output data. The following two quantization modes are available:
- Requantization: quantizes int32 to int8.
- Dequantization: quantizes int32 to fp16.
Returns
tensor_c: a tvm.tensor for the result tensor
Example
from te import tvm import te.lang.cce a_shape = (1024, 256) b_shape = (256, 512) bias_shape = (512, ) in_dtype = "float16" dst_dtype = "float32" tensor_a = tvm.placeholder(a_shape, name='tensor_a', dtype=in_dtype) tensor_b = tvm.placeholder(b_shape, name='tensor_b', dtype=in_dtype) tensor_bias = tvm.placeholder(bias_shape, name='tensor_bias', dtype=dst_dtype) res = te.lang.cce.matmul(tensor_a, tensor_b, False, False, False, dst_dtype=dst_dtype, tensor_bias=tensor_bias)
Pooling2d Compute API
pooling2d
Description
Samples signals in different sliding windows of tensor_in in different pooling modes.
The pooling mode can be MAX, AVG, GMP, or GAP.
- MAX: max pooling, used to compute the maximum of the elements covered by each sliding window
- AVG: avg pooling, used to compute the average value for the sum of the elements covered by each sliding window
- GMP: global max pooling, a special mode of max pooling, that is, max pooling whose window size is the same as the feature map size.
- GAP: global avg pooling, a special mode of avg pooling, that is, avg pooling whose window size is the same as the feature map size.
When pooling_mode is set to MAX and padding_mode is set to SAME for tensor_in, the pooling result is as follows:
q
where,
- input_w: width of tensor_in
- input_h: height of tensor_in
- kernel_w: width of the window
- kernel_h: height of the window
- pad_top: number of top padding rows in the H direction of tensor_in. The value is 1 in the figure.
- pad_bottom: number of bottom padding rows in the H direction of tensor_in. The value is 1 in the figure.
- pad_left: number of left padding columns in the W direction of tensor_in. The value is 1 in the figure.
- pad_right: number of right padding columns in the W direction of tensor_in. The value is 1 in the figure.
- stride_w: width of the stride
- stride_h: height of the stride
The API is defined in python/site-packages/te/te/lang/cce/te_compute/pooling2d_compute.py in the ATC installation path.
This API supports the basic pooling functions as well as the output quantization function. The quantization function is disabled by default. To enable the quantization function, set quantize_params based on the quantization algorithm requirements. For details, see the parameter description.
Restrictions
This API cannot be used in conjunction with other TBE DSL APIs.
- The supported input data type is float16.
- tensor_in is a 5D tensor of format NC1HWC0.
- The last dimension C0 of tensor_in must be 16.
- The dimension of window must be 2 and be a positive integer within the range of [1, 32768].
- The stride dimension must be 2 and be a positive integer. The width and height of the stride must be within the range of [1, 63].
- If pad is input, the dimension of pad must be 4. The value of pad should be greater than or equal to 0.
- The dilation dimension must be 2 and be a positive integer within the range of [1, 255].
- When pooling_mode is set to MAX or AVG in VALID mode, the following condition must be met:
out_w * window_h * window_w * C0 * SIZE_OF_FP16 + out_w * C0 * SIZE_OF_FP16 < ub_size
- When pooling_mode is set to AVG in SAME mode, the following condition must be met:
out_w * window_h * window_w * C0 * SIZE_OF_FP16 + out_w * C0 * SIZE_OF_FP16
+ out_w * C0 * SIZE_OF_FP16 < ub_size
- When pooling_mode is set to MAX or AVG, the following conditions must be met: stride_h ≤ 2 x window_h, and stride_w ≤ 2 x window_w
- When pooling_mode is set to MAX or AVG, the following condition must be met: Window width x Window height < 256
- When pooling_mode is set to MAX or AVG, the tensor_in, pad, and window must meet the following conditions:
stride_h <= in_size_h + pad_top + pad_bottom – window_h
stride_w <= in_size_w + pad_left + pad_right – window_w
- When pooling_mode is set to GAP or GMP, the following conditions must be met: window_h = in_size_h and window_w = in_size_w
- When pooling_mode is set to GAP or GMP, the following condition must be met: padding_mode = "VALID"
- ub_size indicates the available size of the unified buffer (UB).
- out_w indicates the width of the output tensor
- window_h indicates the height of the window.
- window_w indicates the width of the window.
- C0 indicates C0 of tensor_in.
- SIZE_OF_FP16 indicates the size of float16 type.
Prototype
te.lang.cce.pooling2d(tensor_in, window, stride, pooling_mode, padding_mode="SAME", pad = (0,0,0,0), dilation = (1,1), data_mode=1, ceil_mode=0)
Arguments
- tensor_in: input feature map of the tvm.tensor type. A 5D tensor of format NC1HWC0.
- window: size of the input sliding window, list or tuple type. window[0] indicates the width of the input window, and window[1] indicates the height of the input window.
- stride: stride of the input sliding window, list or tuple type. stride[0] indicates the stride of the window in the W direction of the feature map, and stride[1] indicates the stride of the window in the H direction of the feature map.
- pooling_mode: pooling mode selected from MAX, AVG, GMP, and GAP.
- MAX: max pooling, used to compute the maximum of the elements covered by each sliding window
- AVG: average pooling, used to compute the average value for the sum of the elements covered by each sliding window
- GMP: global max pooling, which is a special mode of max pooling. The feature map size is the same as the window size. The maximum value of feature map elements is used as the computation output.
- GAP: global average pooling, which is a special mode of avg pooling. The feature map size is the same as the window size. The average value of feature map elements is used as the computation output.
- padding_mode: padding mode selected from VALID (padding disabled) and SAME (padding enabled).
- In VALID mode:
When the window movement in the W or H direction can cover only some parts of the feature map, the data that does not cover a complete window is discarded. That is, this data in the feature map is not involved in the computation.
MAX, AVG, GMP, and GAP all involve the VALID mode.
- In SAME mode:
When the window movement in the W or H direction can cover only some parts of the feature map, pad 0 to ensure that a complete window can be covered. That is, this data in the feature map is involved in the computation.
MAX and AVG involve the SAME mode, while GMP and GAP do not involve the SAME mode.
- In VALID mode:
- pad: a list or tuple for the padding sizes. An optional argument used to be compatible with Caffe pooling. pad[0], pad[1], pad[2], and pad[3] indicate the padding in the top, bottom, left, and right, respectively. The default values are (0, 0, 0, 0).
- dilation: a list or tuple for the dilation factors. An optional argument. dilation[0] and dilation[1] indicate the dilation factors of the window in the H and W directions, respectively. The default values are (1, 1).
- data_mode: template type. 0 = CAFFE_DATA_MODE; 1 = TENSORFLOW_DATA_MODE.
- ceil_mode: corresponding to round_mode in Caffe. 0: ceiling (default); 1: floor
Returns
res_tensor: a 5D tvm.tensor for the result tensor (NC1HWC0)
The shape of tensor_in is [N, C1, H, W, C0=16], the shape of the window is [F, F], and the stride is [S, S].
In VALID mode and SAME mode of MAX pooling and AVG pooling, the shape of the output tensor is computed as follows:
- In VALID mode:
- The N and C dimensions remain unchanged.
- The dimensions of Hout and Wout are as follows:
- In SAME mode:
- The N and C dimensions remain unchanged.
- The dimensions of Hout and Wout are as follows:
W is the input size, F is the filter size, S is the stride, and [] is the round-up symbol.
- The N and C dimensions remain unchanged.
- Hout = Wout = 1
Example
from te import tvm import te.lang.cce shape = (1, 2, 416, 416, 16) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.pooling2d(data, (3, 3), (2, 2), "AVG", "SAME") # res.shape = (1, 2, 208, 208, 16)
Common Compute APIs
round_to
Description
Rounds data towards the range of [min_value, max_value] and compares data with min_value and max_value element-wise. If the element value is between min_value and max_value, use the value of the data element. If the element value is less than min_value or greater than max_value, these min_value and max_value will be preferred.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/common.py in the ATC installation path.
Restrictions
In case of data type inconsistency, max_value and min_value will be converted into the same data type as data during computation.
The supported data types are float16, float32, int8, uint8, and int32. However, int8, uint8, and int32 will be converted to float16.
Prototype
te.lang.cce.round_to(data, max_value, min_value)
Arguments
- data: a tvm.tensor for the input tensor
- max_value: a scalar for the maximum value of the target range
- min_value: a scalar for the minimum value of the target range
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) max_value = tvm.const(2, dtype =input_dtype) min_value = tvm.const(3, dtype =input_dtype) res = te.lang.cce.round_to(data, max_value, min_value)
cast_to
Description
Converts the data type of a tensor, specifically, from data to dtype.
The API is defined in python/site-packages/te/te/lang/cce/te_compute/common.py in the ATC installation path.
Restrictions
Source Data Type |
Destination Data Type |
---|---|
float32 |
float16 |
float32 |
int8 |
float32 |
uint8 |
float16 |
float32 |
float16 |
int8 |
float16 |
uint8 |
float16 |
int32 |
int8 |
float16 |
int8 |
uint8 |
int32 |
float16 |
int32 |
int8 |
int32 |
uint8 |
Prototype
te.lang.cce.cast_to(data, dtype, f1628IntegerFlag=True)
Arguments
- data: a tvm.tensor for the input tensor
- dtype: destination data type, string type
- f1628IntegerFlag: Defaults to True. If the decimal part of the data before conversion is 0, set f1628IntegerFlag to True. If the decimal part of the data before conversion is not 0, set f1628IntegerFlag to False.
Returns
res_tensor: a tvm.tensor for the result tensor
Example
shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.cast_to(data,"float32")
Build APIs
auto_schedule
Generates a schedule object based on the defined computation process.
The API is defined in python/site-packages/topi/topi/generic/cce.py in the ATC installation path.
Prototype
topi.generic.auto_schedule(outs, option=None)
Arguments
outs: list of output tensors of the tensor. A single output or multiple outputs are supported.
option: configuration parameter used for RL operator search. Does not need to be configured for auto_schedule.
Returns
schedule: computation schedule of the operator
Example
import te.lang.cce from te import tvm import topi.generic shape = (28,28) dtype = "float16" # Define input. data = tvm.placeholder(shape, name="data", dtype=dtype) # Describe the computation process of the operator. res = te.lang.cce.vabs(data) with tvm.target.cce(): # Generate a schedule object. sch = topi.generic.auto_schedule(res)
cce_build_code
Builds the schedule object to generate an operator binary file and an operator description file.
The API is defined in python/site-packages/te/te/lang/cce/te_schedule/cce_schedule.py in the ATC installation path.
Prototype
te.lang.cce.cce_build_code(sch, config_map=None)
Arguments
- sch: tvm.schedule, schedule to build or to print lower code
- config_map: a dictionary of build configuration. Defaults to None. The keys include:
- print_ir: whether to print lower IR code. Defaults to True.
- need_build: whether build is performed. Defaults to True.
- name: operator name. Defaults to cce_op.
The value can contain only uppercase letters, lowercase letters, digits, and underscores (_), must start with a letter or underscore (_), and cannot exceed 200 characters.
- tensor_list: a list of input and output tensors. The input is the tensor object returned by the placeholder API. The output is the computed tensor object. The value is mandatory. Otherwise, an error is reported. In addition, this list determines the sequence of the parameters of the kernel function for generating an operator, which is the same as the sequence of the input and output in the list.
- bool_storage_as_1bit: whether to store bools by 1 bit.
- True: 1-bit storage
- False: 8-bit storage
Default value: True
Returns
None
Example
import te.lang.cce from te import tvm from topi import generic # Define the input placeholder. data = tvm.placeholder(shape, name="data", dtype=dtype) with tvm.target.cce(): # Describe the computation process of the operator. res = te.lang.cce.vabs(data) # Generate a schedule object. sch = generic.auto_schedule(res) # Define build parameters. config = {"print_ir" : True, "need_build" : True, "name" : "abs_28_28_float16", "tensor_list" : [data,res] } # Build operator te.lang.cce.cce_build_code(sch, config)
Instructions
Usage Example
The following example shows how to concatenate the preceding APIs. It implements a simple operator that supports the float16 type to obtain an absolute value.
import te.lang.cce from te import tvm import topi shape = (28,28) dtype = "float16" # Define input. data = tvm.placeholder(shape, name="data", dtype=dtype) # Describe the computation process of the operator. res = te.lang.cce.vabs(data) with tvm.target.cce(): # Generate a schedule object. sch = topi.generic.auto_schedule(res) # Define build parameters. config = {"print_ir" : True, "need_build" : True, "name" : "abs_28_28_float16", "tensor_list" : [data,res]} # Build operator te.lang.cce.cce_build_code(sch, config)
Exception Handling
If an exception occurs when an API is executed, the error is usually caused by incorrect input parameters. The following example shows the error information when tensor_list is incomplete.
Code example:
data = tvm.placeholder(shape, name="data", dtype=inp_dtype) with tvm.target.cce(): res = te.lang.cce.vabs(data) sch = generic.auto_schedule(res) config = {"print_ir": need_print, "need_build": need_build, "name": kernel_name, "tensor_list": [res]} te.lang.cce.cce_build_code(sch, config)
The following error information is displayed:
Traceback (most recent call last): File "llt/tensor_engine/ut/testcase_python/tf_abs/test_tf_abs_cce.py", line 71, in test_cce_tf_abs_99991_fp16 tf_abs_cce((99991,), dtype = "Float16", need_build = False, need_print = False, kernel_name = "cce_tf_abs") File "/home1/repotvm/tensor_engine/topi/python/topi/cce/tf_abs.py", line 68, in tf_abs_cce te.lang.cce.cce_build_code(sch, config) File "/home1/repotvm/tensor_engine/python/te/lang/cce/te_schedule/cce_schedule.py", line 381, in cce_build_code _build(sch, tensor_list, local_config_map["name"]) File "/home1/repotvm/tensor_engine/python/te/lang/cce/te_schedule/cce_schedule.py", line 338, in _build mod = tvm.build(sch, tensor_list, device, name=name) File "/home1/repotvm/tensor_engine/python/te/tvm/build_module.py", line 432, in build binds=binds) File "/home1/repotvm/tensor_engine/python/te/tvm/build_module.py", line 353, in lower stmt = ir_pass.StorageFlatten(stmt, binds, 64) File "/home1/repotvm/tensor_engine/python/te/tvm/_ffi/function.py", line 280, in my_api_func return flocal(*args) File "/home1/repotvm/tensor_engine/python/te/tvm/_ffi/_ctypes/function.py", line 183, in __call__ ctypes.byref(ret_val), ctypes.byref(ret_tcode))) File "/home1/repotvm/tensor_engine/python/te/tvm/_ffi/base.py", line 66, in check_call raise TVMError(py_str(_LIB.TVMGetLastError())) TVMError: [17:12:02] /home1/repotvm/tensor_engine/src/pass/storage_flatten.cc:249: Check failed: it != buf_map_.end() Cannot find allocated buffer for placeholder(data, 0x27d7290)
The problem is solved after the parameter is modified as follows:
"tensor_list" : [data, res]