# TBE DSL APIs

## Introduction

### TBE Overview

Tensor Boost Engine, abbreviated as TBE, is a framework for developing custom operators based on the Tensor Virtual Machine (TVM). The TVM is an open source project of the community. It aims to further abstract the generation rules of operators by dividing the operators into operation primitives and combining the operators when necessary. According to the definition of the computation process of operators, the TVM uses the Schedule technology and the CodeGen technology to generate the operators for the specified hardware.

Schedule is used to describe the computation process for implementing an operator on hardware, which requires profound hardware knowledge. To reduce the difficulty in writing operators, the writing of Schedule is simplified based on the TVM. Guided by the concept of "Auto_Schedule", a collection of TBE APIs are provided to combine the computation of operators. By combination using APIs, you can define the computation process of an operator and hand over the schedule to "Auto_Schedule". This document describes the TBE domain-specific language (DSL) APIs defined based on the TVM. You can use these APIs to develop operators.

The TBE DSL APIs mainly cover vector operations, including element-wise operation APIs, reduction APIs, broadcast APIs, index operation APIs, concat APIs, convolution APIs, 4D to 5D conversion APIs, and matrix computation APIs.

### Version Query

You can view the version number of the current TBE-DSL in the **python/site-packages/te/te/version.py** directory in the ATC installation path.

### API Usage

Before calling TBE DSL APIs, declare the environment variable **PYTHONPATH**.

`export install_path=`**/home/HwHiAiUser/Ascend****/ascend-toolkit/latest**
exprot PYTHONPATH=${install_path}/atc/python/site-packages/te:${install_path}/atc/python/site-packages/topi

Where, ** install_path **indicates the ATC installation path.

## Element-Wise Compute APIs

Element-wise compute APIs are used to compute input data element by element. The output usually has the same shape as the input.

### vadd

#### Description

Adds two tensors element-wise.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vadd(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vadd(data1, data2)

### vsub

#### Description

Subtracts two tensors element-wise.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vsub(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vsub(data1, data2)

### vmul

#### Description

Multiplies two tensors element-wise.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vmul(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vmul(data1, data2)

### vdiv

#### Description

Divides two tensors element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vdiv(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vdiv(data1, data2)

### vmod

#### Description

Performs modulo operations on two tensors element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vmod(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vmod(data1, data2)

### vmin

#### Description

Returns the min of two tensors element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vmin(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vmin(data1, data2)

### vmax

#### Description

Returns the max of two tensors element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vmax(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vmax(data1, data2)

### vor

#### Description

Performs the bitwise OR operation on two tensors.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

The supported data types are **int16** and **uint16**.

#### Prototype

te.lang.cce.vor(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "int16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vor(data1, data2)

### vand

#### Description

Performs the bitwise AND operation on two tensors.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

The supported data types are **int16** and **uint16**.

#### Prototype

te.lang.cce.vand(lhs, rhs)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "int16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vand(data1, data2)

### vadds

#### Description

Adds a scalar to a tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

In case of data type inconsistency, the scalar will be converted into the same data type as the tensor during computation.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vadds(raw_tensor, scalar)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**scalar**: a scalar for the coefficient to be added to**raw_tensor**element-wise

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vadds(data, scalar)

### vmins

#### Description

Compares a raw_tensor with a scalar element-wise and chooses the smaller one.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

In case of data type inconsistency, the scalar will be converted into the same data type as the raw_tensor during computation.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vmins(raw_tensor, scalar)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**scalar**: a scalar for the coefficient to be compared with**raw_tensor**element-wise

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vmins(data, scalar)

### vmaxs

#### Description

Compares a raw_tensor with a scalar element-wise and chooses the larger one.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

In case of data type inconsistency, the scalar will be converted into the same data type as the raw_tensor during computation.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vmaxs(raw_tensor, scalar)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**scalar**: a scalar for the coefficient to be compared with**raw_tensor**element-wise

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vmaxs(data, scalar)

### vmuls

#### Description

Multiplies a raw_tensor and a scalar element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

In case of data type inconsistency, the scalar will be converted into the same data type as the raw_tensor during computation.

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.vmuls(raw_tensor, scalar)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**scalar**: a scalar for the coefficient by which**raw_tensor**is multiplied element-wise

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vmuls(data, scalar)

### vcmp

#### Description

Compares **lhs** with **rhs** element-wise based on **operation**. The operations specified by **operation **include **eq**, **ne**, **lt**, **gt**, **le**, and **ge**, which indicate **==**, **!=**, **<**, **>**, **<=**, and **>=**, respectively. If the expression is true, **True** is returned when the mode is **bool**, and **1** is returned when the mode is **bit**. If the expression is false, **False** is returned when the mode is **bool**, and **0** is returned when the mode is **bit**.

The following describes the meaning of each operation by using an expression. Parameter **x** indicates an element in **lhs**, parameter **y** indicates an element in **rhs**, parameter **z** indicates an element of the result tensor, and parameter **n** (the value ranging from 0 to 7) indicates the bit index of an element of the result tensor. The expressions are as follows:

- mode=='bool':
- lt: z = True (x < y) or False (x >= y)
- gt: z = True (x > y) or False (x <= y)
- le: z = True (x <= y) or False (x > y)
- ge: z = True (x >= y) or False (x < y)
- eq: z = True (x == y) or False (x != y)
- ne: z = True (x != y) or False (x == y)

- mode=='bit':
- lt: z[n] = 1 (x < y) or 0 (x >= y)
- gt: z[n] = 1 (x > y) or 0 (x <= y)
- le: z[n] = 1 (x <= y) or 0 (x > y)
- ge: z[n] = 1 (x >= y) or 0 (x < y)
- eq: z[n] = 1 (x == y) or 0 (x != y)
- ne: z[n] = 1 (x != y) or 0 (x == y)

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

- The left and right operands for comparison must have the same data type.
- When
**mode**is set to**bool**and the**te.lang.cce.cce_build_code**API is called for compilation,**bool_storage_as_1bit**in the passed configuration parameter must be set to**False**. Otherwise, unexpected output shape will be obtained.**bool_storage_as_1bit**is set to**True**by default, indicating that it is stored as 1-bit data.The following gives a build configuration file template.

with tvm.target.cce(): schedule = generic.auto_schedule(res) config = {"name": kernel_name, "tensor_list": [data_x, data_y, res], "bool_storage_as_1bit": False} te.lang.cce.cce_build_code(schedule, config)

- When
**mode**is set to**bit**, the last dimension of the shape of the left operand must be divisible by 8. - If the right operand is also a tensor, the two tensors must have the same shape.
- Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vcmp(lhs, rhs, operation='lt', mode='bool')

#### Arguments

**lhs**: a tvm.tensor for the left operand**rhs**: a tvm.tensor or scalar for the right operand**operation**: operation type selected from**eq**,**ne**,**lt**,**gt**,**ge**, or**le**. Defaults to**lt**.**mode**: mode selected from**bool**or**bit**. Defaults to**bool**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor If **mode** is set to **bool**, the data type is **bool**. If **mode** is set to **bit**, the data type is **uint8**.

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vcmp(data1, data2, 'lt', 'bit')

### vlogic

#### Description

Performs the logical AND/OR operation on two tensors element-wise, or performs the logical NOT operation on one tensor.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The supported data type is **bool**. The elements of the two tensors must have the same data type.

#### Prototype

te.lang.cce.vlogic(lhs, rhs=None, operation='logic_and')

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor**operation**: operation type selected from**logic_and**,**logic_or**, or**logic_not**. Defaults to**logic_and**. To perform the logic_not operation, you must set**rhs**to**None**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "bool" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) res = te.lang.cce.vlogic(data1, data2, 'logic_and')

### vsel

#### Description

Compares the **condition** element with **True** or **1** based on the data type of **condition**. If the expression is true, the value of **x** is returned. Otherwise, the value of **y** is returned.

- If the data type of
**condition**is**bool**, the**condition**element is compared with**True**. If the expression is true, the value of**x**is returned. Otherwise, the value of**y**is returned. - If the data type of
**condition**is**uint8**, the**condition**element is compared with the bit index or**1**. If the expression is true, the value of**x**is returned. Otherwise, the value of**y**is returned.

The following uses an expression to explain the comparison. **i** indicates the element in the condition, the parameter **x'** indicates the **x** element or **x**, the parameter **y'** indicates the **y** element or **y**, the parameter **z** indicates the element of the result tensor, and the parameter **n** (value range: 0–7) indicates the bit index of the **condition** element. The expression is as follows:

- When the data type of
**condition**is**bool**:z = (i==True)? x': y'

- When the data type of
**condition**is**uint8**:z = (i[n]==1)? x': y'

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py**in the ATC installation path.

#### Restrictions

**x**and**y**must have the same data type.- When
**condition**is an uint8, the last dimension of the shape of**x**and**y**must be a multiple of 8. - When
**condition**is a bool and the**te.lang.cce.cce_build_code**API is called for compilation,**bool_storage_as_1bit**in the passed configuration parameter must be set to**False**.

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vsel(condition, x, y)

#### Arguments

**condition**: a tvm.tensor for the condition tensor of type bool or uint8.**x**: a tvm.tensor or scalar for the possible return value**y**: a tvm.tensor or scalar for the possible return value

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

### vcmpsel

#### Description

Compares **lhs** with **rhs** element-wise based on **operation**. The operations specified by **operation** include **eq**, **ne**, **lt**, **gt**, **le**, and **ge**, which indicate **==**, **!=**, **<**, **>**, **<=**, and **>=**, respectively. If the expression is true, the value of **slhs** is returned. Otherwise, the value of **srhs** is returned.

**a**indicates an element in

**lhs**, the parameter

**b**indicates an element in

**rhs**, the parameter

**c**indicates an element in

**slhs**, the parameter

**d**indicates an element in

**srhs**, and the parameter

**res**indicates an element of a result tensor. The expressions are as follows:

- lt: res = c (a < b) or d (a >= b)
- gt: res = c (a > b) or d (a <= b)
- le: res = c (a <= b) or d (a > b)
- ge: res = c (a >= b) or d (a < b)
- eq: res = c (a == b) or d (a != b)
- ne: res = c (a != b) or d (a == b)

- If
**rhs**is**None**, the elements in**lhs**are compared with the floating-point number**2.0**. - If
**slhs**is**None**and the expression is true, the value of**lhs**is returned. - If
**srhs**is**None**and**rhs**is a tensor, the value of**rhs**is returned when the expression is not true.

If **srhs** is **None** and **rhs** is a scalar, the floating-point number **0.0** is returned when the expression is not true.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The arguments have the same data type.

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vcmpsel(lhs,rhs=None,operation='lt', slhs=None, srhs=None)

#### Arguments

**lhs**: a tvm.tensor for the left operand**rhs**: a tvm.tensor or scalar for the right operand. Defaults to**None**.**slhs**: a tvm.tensor or scalar for the value returned when the comparison expression is true. Defaults to**None**.**srhs**: a tvm.tensor or scalar for the value returned when the comparison expression is not true. Defaults to**None**.**operation**: operation type selected from**eq**,**ne**,**lt**,**gt**,**ge**, or**le**. Defaults to**lt**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) data3 = tvm.placeholder(shape, name="data3", dtype=input_dtype) data4 = tvm.placeholder(shape, name="data4", dtype=input_dtype) res = te.lang.cce.vcmpsel(data1, data2, 'gt', data3, data4)

### vlog

#### Description

Performs the logarithmic ln(x) operation a raw_tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vlog(raw_tensor, priority_flag=0)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**priority_flag**: a scalar for the priority flag**1**indicates that the precision takes priority. In this case, the performance deteriorates because the computation process is complex.**0**indicates that the performance takes priority. In this case, the precision deteriorates.

Defaults to

**0**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vlog(data)

### vexp

#### Description

Performs the natural exponential operation e^x on a tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vexp(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vexp(data)

### vabs

#### Description

Performs the absolute value operation |x| on a tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vabs(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vabs(data)

### vrec

#### Description

Performs the reciprocal operation 1/x on a tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vrec(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vrec(data)

### vrelu

#### Description

Performs the ReLU operation on a tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16

#### Prototype

te.lang.cce.vrelu(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vrelu(data)

### vnot

#### Description

Performs bitwise NOT on a tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The supported data types are **int16** and **uint16**.

#### Prototype

te.lang.cce.vnot(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "int16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vnot(data)

### vsqrt

#### Description

Computes the square root for a tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vsqrt(raw_tensor, priority_flag=0)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**priority_flag**: a scalar for the priority flag**1**indicates that the precision takes priority. In this case, the performance deteriorates because the computation process is complex.**0**indicates that the performance takes priority. In this case, the precision deteriorates.

Defaults to

**0**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float 16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vsqrt(data)

### vrsqrt

#### Description

Calculates the reciprocal of the square root for a tensor element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vrsqrt(raw_tensor, priority_flag=0)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**priority_flag**: a scalar for the priority flag**1**indicates that the precision takes priority. In this case, the performance deteriorates because the computation process is complex.**0**indicates that the performance takes priority. In this case, the precision deteriorates.

Defaults to

**0**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float 16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.vrsqrt(data)

### vaxpy

#### Description

Multiplies **lhs** by a scalar and adds **rhs** to the result element-wise.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

**lhs**and**rhs**must have the same data type and shape.- If the data type of the scalar is different from that of the tensor, the data type of the tensor prefers.

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vaxpy(lhs, rhs, scalar)

#### Arguments

**lhs**: a tvm.tensor for the left tensor**rhs**: a tvm.tensor for the right tensor**scalar**: a scalar for the coefficient by which**lhs**is multiplied

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) scalar = tvm.const(2, dtype =input_dtype) res = te.lang.cce.vaxpy(data1, data2, scalar)

### vmla

#### Description

Multiplies **tensor_0** by **tensor_1** and adds **tensor_2** to the result element-wise. The corresponding computation formula is tensor_0 * tensor_1 + tensor_2.

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vmla(tensor_0, tensor_1, tensor_2)

#### Arguments

**tensor_0**: a tvm.tensor for tensor 0**tensor_1**: a tvm.tensor for tensor 1**tensor_2**: a tvm.tensor for tensor 2

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) data3 = tvm.placeholder(shape, name="data3", dtype=input_dtype) res = te.lang.cce.vmla(data1, data2, data3)

### vmadd

#### Description

Multiplies **tensor_0** by **tensor_2** and adds **tensor_1** to the result element-wise. The corresponding computation formula is tensor_0 * tensor_2 + tensor_1

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type and same shape.

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vmadd(tensor_0, tensor_1, tensor_2)

#### Arguments

**tensor_0**: a tvm.tensor for tensor 0**tensor_1**: a tvm.tensor for tensor 1**tensor_2**: a tvm.tensor for tensor 2

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) data3 = tvm.placeholder(shape, name="data3", dtype=input_dtype) res = te.lang.cce.vmadd(data1, data2, data3)

### vmaddrelu

#### Description

Multiplies **tensor_0** by **tensor_2** and adds **tensor_1** to the result element-wise. Then, performs ReLU. The corresponding computation formula is relu(tensor_0 * tensor_2 + tensor_1).

**python/site-packages/te/te/lang/cce/te_compute/elewise_compute.py** in the ATC installation path.

#### Restrictions

The tensors must have the same data type.

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.vmaddrelu(tensor_0, tensor_1, tensor_2)

#### Arguments

**tensor_0**: a tvm.tensor for tensor 0**tensor_1**: a tvm.tensor for tensor 1**tensor_2**: a tvm.tensor for tensor 2

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data1 = tvm.placeholder(shape, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape, name="data2", dtype=input_dtype) data3 = tvm.placeholder(shape, name="data3", dtype=input_dtype) res = te.lang.cce.vmaddrelu(data1, data2, data3)

## Reduction APIs

Reduction APIs are used to reduce the data of a dimension and perform operations such as accumulation and multiplication on the data in a specified direction. The operation output has one less tensor in a dimension than the input data.

### General Restrictions

Due to the data arrangement restriction on the computing platform, the data after the reduction operation needs to be rearranged for subsequent operations. Therefore, when different types of APIs are used, no vector operation can be performed after the reduction operation.

### sum

#### Description

Computes the sum along an axis to reduce dimensions.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/reduction_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.sum(raw_tensor, axis, keepdims=False)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**axis**: an int or a list of ints for the axis along which a reduction is performed. The value range is [–d, d – 1], where**d**indicates the dimension count of**raw_tensor****keepdims**: Defaults to**False**, indicating that the axis length is**0**after reduction. For example, if the original shape is**(10, 10, 10)**and**keepdims**is set to**False**, the shape after reduction is**(10, 10)**. If this parameter is set to**True**, the axis length is set to**1**after reduction. For example, if the original shape is**(10, 10, 10)**and**keepdims**is set to**True**, the shape after reduction is**(10, 10, 1)**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.sum(data, axis=1)

### reduce_min

#### Description

Computes the minimum along an axis to reduce dimensions.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/reduction_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.reduce_min(raw_tensor, axis, keepdims=False,priority_flag=False)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**axis**: an int or a list of ints for the axis along which a reduction is performed. The value range is [–d, d – 1], where**d**indicates the dimension count of**raw_tensor****keepdims**: Defaults to**False**, indicating that the axis length is**0**after reduction. For example, if the original shape is**(10, 10, 10)**and**keepdims**is set to**False**, the shape after reduction is**(10, 10)**. If this parameter is set to**True**, the axis length is set to**1**after reduction. For example, if the original shape is**(10, 10, 10)**and**keepdims**is set to**True**, the shape after reduction is**(10, 10, 1)**.**priority_flag**: high-accuracy mode or high-performance mode select.**True**: high-accuracy mode;**False**: high-performance mode.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.reduce_min(data, axis=1)

### reduce_max

#### Description

Computes the maximum along an axis to reduce dimensions.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/reduction_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.reduce_max(raw_tensor, axis, keepdims=False, priority_flag=False)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**axis**: an int or a list of ints for the axis along which a reduction is performed. The value range is [–d, d – 1], where**d**indicates the dimension count of**raw_tensor****keepdims**: Defaults to**False**, indicating that the axis length is**0**after reduction. For example, if the original shape is**(10, 10, 10)**and**keepdims**is set to**False**, the shape after reduction is**(10, 10)**. If this parameter is set to**True**, the axis length is set to**1**after reduction. For example, if the original shape is**(10, 10, 10)**and**keepdims**is set to**True**, the shape after reduction is**(10, 10, 1)**.**priority_flag**: high-accuracy mode or high-performance mode select.**True**: high-accuracy mode;**False**: high-performance mode.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.reduce_max(data, axis=1)

### reduce_prod

#### Description

Computes the product along an axis to reduce dimensions.

**python/site-packages/te/te/lang/cce/te_compute/reduction_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.reduce_prod(raw_tensor, axis, keepdims=False)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor**axis**: an int for the axis along which a reduction is performed. The value range is [–d, d – 1], where**d**indicates the dimension count of**raw_tensor****keepdims**: Defaults to**False**, indicating that the axis length is**0**after reduction. For example, if the original shape is**(10, 10, 10)**and**keepdims**is set to**False**, the shape after reduction is**(10, 10)**. If this parameter is set to**True**, the axis length is set to**1**after reduction. For example, if the original shape is**(10, 10, 10)**and**keepdims**is set to**True**, the shape after reduction is**(10, 10, 1)**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.reduce_prod(data, axis=1)

## Broadcast API

Broadcast APIs are used to process two tensors with different shapes, specifically, to broadcast a lower-dimensional operand according to a higher-dimensional operand so that the dimensions of the two operands can be the same, and then, compute the operands element-wise.

### broadcast

#### Description

Broadcasts **var** to a tensor with the target **shape** (the second parameter). The data type of the result is specified by **output_dtype**.

As shown in the following example, the shape of A is (2, 1), that is, two rows and one column. It is broadcast to the target shape (2, 3), that is, two rows and three columns. In this case, the original column is broadcast to three same columns.

For example, if the shape of **var** is (2, 1, 64) and the target shape is (2, 128, 64), the shape of the result tensor after calling the broadcast API is (2, 128, 64).

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/broadcast_compute.py** in the ATC installation path.

#### Restrictions

The shape of **var** and the second parameter **shape** have the same value. Each dimension is either the same as **shape** or equals to **1**. If the dimension is **1**, it is broadcast to be the same as the target **shape**.

The supported types are float16, float32, and int32.

#### Prototype

te.lang.cce.broadcast(var, shape, output_dtype=None)

#### Arguments

**var**: a scalar or tensor for data to be broadcast**shape**: target shape for the broadcast operation**output_dtype**: output data type. Defaults to**var.dtype**.

#### Returns

**res_tensor**: tensor obtained after **var** extension. **shape **is specified by the parameter. The data type is **output_dtype**.

#### Example

As shown in the following code, the tensor with shape (1024, 1) is broadcast to shape (1024, 1024) by calling the broadcast API.

outshape = (1024,1024) shape = (1024,1) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.broadcast(data, outshape)

## Segment Compute APIs

Segment compute APIs are used to compute the sum, mean, inner product, maximum, minimum along the segments of a tensor.

### unsorted_segment_sum

#### Description

Computes the sum along segments of a tensor by using the array **segment_ids**. Assuming that the input is **data** and the output is **output**, then output[i] = sum(data[j...]), where** j... **is an array, and element **j** meets the following requirement: segment_ids[j] == i.

If a subscript** i **does not appear in** segment_ids**, then output[i] = init_value. For example, in the following figure, if **1 **does not appear in **segment_ids**, then output[1] = 0.

If a value in **segment_ids** is a negative number, the value of **data** in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of **data[3] **is discarded and is not involved in the calculation.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/segment_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.unsorted_segment_sum(tensor, segment_ids, num_segments, init_value=0)

#### Arguments

**tensor**: input tensor**segment_ids**: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.**num_segments**: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of**segment_ids**plus 1.**init_value**: default value of the output when a subscript in**segment_ids**does not exist. It is determined based on the implementation of the operator. Defaults to**0**.

#### Returns

**res_tensor**: tensor after computation

#### Example

import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1", dtype=input_dtype) segment_ids = [1,1,4,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_sum(data, segment_ids, num_segments) res.shape = (6,1024) # res[0] = 0 # res[1] = data[0] + data[1] # res[2] = 0 # res[3] = 0 # res[4] = data[2] # res[5] = data[3] + data[4]

### unsorted_segment_mean

#### Description

Computes the mean along segments of a tensor by using the array **segment_ids**. Assuming that the input is **data** and the output is **output**, then output[i] = (1/len(j...)) sum(data[j...]), where** j... **is an array, and element **j** meets the following requirement: segment_ids[j] == i.

If a subscript** i **does not appear in** segment_ids**, then output[i] = init_value (the default value is **0**). For example, in the following figure, if **1 **does not appear in **segment_ids**, then output[1] = 0.

If a value in **segment_ids** is a negative number, the value of **data** in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of **data[3] **is discarded and is not involved in the calculation.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/segment_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.unsorted_segment_mean(tensor, segment_ids, num_segments, init_value=0)

#### Arguments

**tensor**: input tensor**segment_ids**: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.**num_segments**: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of**segment_ids**plus 1.**init_value**: default value of the output when a subscript in**segment_ids**does not exist. It is determined based on the implementation of the operator. Defaults to**0**.

#### Returns

**res_tensor**: tensor after computation

#### Example

import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1", dtype=input_dtype) segment_ids = [1,1,5,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_mean(data, segment_ids, num_segments)# res.shape = (6,1024)# res[0] = 0# res[1] = (data[0] + data[1]) / 2# res[2] = 0# res[3] = 0# res[4] = 0# res[5] = (data[2] + data[3] + data[4]) / 3

### unsorted_segment_prod

#### Description

Computes the inner product along segments of a tensor by using the array **segment_ids**. Assuming that the input is **data** and the output is **output**, then output[i] = product(data[j...]), where** j... **is an array, and element **j** meets the following requirement: segment_ids[j] == i.

The parameter **product** indicates the inner product, that is, all elements in** data[j...]** are multiplied by each other.

If a subscript** i **does not appear in** segment_ids**, then output[i] = init_value (the default value is **0**). For example, in the following figure, if **1 **does not appear in **segment_ids**, then output[1] = 0.

If a value in **segment_ids** is a negative number, the value of **data** in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of **data[3] **is discarded and is not involved in the calculation.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/segment_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.unsorted_segment_prod(tensor, segment_ids, num_segments, init_value=0)

#### Arguments

**tensor**: input tensor**segment_ids**: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.**num_segments**: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of**segment_ids**plus 1.**init_value**: When a subscript in**segment_ids**does not exist, the default value**0**is output.

#### Returns

**res_tensor**: tensor after computation

#### Example

import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1", dtype=input_dtype) segment_ids = [1,1,4,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_prod(data, segment_ids, num_segments)# res.shape = (6,1024)# res[0] = 0# res[1] = (data[0] * data[1])# res[2] = 0# res[3] = 0# res[4] = data[2]# res[5] = (data[3] * data[4])

### unsorted_segment_min

#### Description

Computes the minimum along segments of a tensor by using the array **segment_ids**. Assuming that the input is **data** and the output is **output**, then output[i] = min(data[j...]), where** j... **is an array, and element **j** meets the following requirement: segment_ids[j] == i.

If a subscript** i **does not appear in** segment_ids**, then output[i] = init_value (the default value is **0**). For example, in the following figure, if **1 **does not appear in **segment_ids**, then output[1] = 0.

**segment_ids** is a negative number, the value of **data** in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of **data[3] **is discarded and is not involved in the calculation.

**python/site-packages/te/te/lang/cce/te_compute/segment_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.unsorted_segment_min(tensor, segment_ids, num_segments, init_value=0)

#### Arguments

**tensor**: input tensor**segment_ids**: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.**num_segments**: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of**segment_ids**plus 1.**init_value**: When a subscript in**segment_ids**does not exist, the default value**0**is output.

#### Returns

**res_tensor**: tensor after computation

#### Example

import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1", dtype=input_dtype) segment_ids = [1,1,4,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_min(data, segment_ids, num_segments)# res.shape = (6,1024)# res[0] = 0# res[1] = min(data[0], data[1])# res[2] = 0# res[3] = 0# res[4] = data[2]# res[5] = min(data[3], data[4])

### unsorted_segment_max

#### Description

Computes the maximum along segments of a tensor by using the array **segment_ids**. Assuming that the input is **data** and the output is **output**, then output[i] = max(data[j...]), where** j... **is an array, and element **j** meets the following requirement: segment_ids[j] == i.

** i **does not appear in** segment_ids**, then output[i] = init_value (the default value is **0**). For example, in the following figure, if **1 **does not appear in **segment_ids**, then output[1] = 0.

**segment_ids** is a negative number, the value of **data** in the corresponding position is discarded. For example, in the following figure, segment_ids[3] = –1, indicating that the value of **data[3] **is discarded and is not involved in the calculation.

**python/site-packages/te/te/lang/cce/te_compute/segment_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32, int32

#### Prototype

te.lang.cce.unsorted_segment_max(tensor, segment_ids, num_segments, init_value=0)

#### Arguments

**tensor**: input tensor**segment_ids**: one-dimensional array. This array is used to segment the input tensor. The length of the array must be the same as the length of the first dimension of the input tensor. The array can be sequenced or unsequenced.**num_segments**: length of the first dimension of the output tensor. Its value must be greater than or equal to the value of**segment_ids**plus 1.**init_value**: default value of the output when a subscript in**segment_ids**does not exist. It is determined based on the implementation of the operator. Defaults to**0**.

#### Returns

**res_tensor**: tensor after computation

#### Example

import tvm import te.lang.cce shape = (5,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data1",type=input_dtype) segment_ids = [1,1,4,5,5] num_segments = 6 res = te.lang.cce.unsorted_segment_max(data,segment_ids,num_segments)# res.shape = (6,1024)# res[0] = 0# res[1] = max(data[0], data[1])# res[2] = 0# res[3] = 0# res[4] = data[2]# res[5] = max(data[3], data[4])

## Inplace Compute APIs

Inplace compute APIs are used to compute tensors based on rows, such as **add**, **sub**, and **update**.

### inplace_add

#### Description

Adds **lhs** and **rhs** based on a specified row.

For example:

res = lhs res[ids,:] += rhs return res

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/inplace_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32, int32

- The maximum value of the first dimension of
**rhs**is**7934**. A value larger than**7934**cannot be processed. - If the first dimension of
**rhs**is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the**ulimit -s**command to increase the stack space, for example, from 8192 to 81920.

#### Prototype

te.lang.cce.inplace_add(lhs, inplace_ids, rhs)

#### Arguments

**lhs**: input left tensor**inplace_ids**: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of**lhs**. The length must be the same as that of the first dimension of**rhs**.**rhs**: right tensor or scalar. The dimensions must be the same as those of**lhs**, except the first dimension. If**inplace_ids**is of integer type,**rhs**has one dimension less than**lhs**. For example,**lhs**is (10, 1024),**inplace_ids**is [5], and**rhs**is (1, 1024);**lhs**is (10, 1024),**inplace_ids**is 5, and**rhs**is (1024,).

#### Returns

**res_tensor**: tensor after computation

#### Example

import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_add(dataA, inplace_ids, dataB)res.shape = (6,1024)# res[0] = dataA[0]# res[1] = dataA[1] + dataB[0] + dataB[1]# res[2] = dataA[2] + dataB[3] + dataB[4]# res[3] = dataA[3]# res[4] = dataA[4] + dataB[2]# res[5] = dataA[5]

### inplace_sub

#### Description

Subtracts **lhs** and **rhs** based on a specified row.

For example:

res = lhs res[ids,:] -= rhs return res

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/inplace_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32, int32

- The maximum value of the first dimension of
**rhs**is**7934**. A value larger than**7934**cannot be processed. - If the first dimension of
**rhs**is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the**ulimit -s**command to increase the stack space, for example, from 8192 to 81920.

#### Prototype

te.lang.cce.inplace_sub(lhs, inplace_ids, rhs)

#### Arguments

**lhs**: input left tensor**inplace_ids**: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of**lhs**. The length must be the same as that of the first dimension of**rhs**.**rhs**: right tensor or scalar. The dimensions must be the same as those of**lhs**, except the first dimension. If**inplace_ids**is of integer type,**rhs**has one dimension less than**lhs**. For example,**lhs**is (10, 1024),**inplace_ids**is [5], and**rhs**is (1, 1024);**lhs**is (10, 1024),**inplace_ids**is 5, and**rhs**is (1024,).

#### Returns

**res_tensor**: tensor after computation

#### Example

import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_sub(dataA, inplace_ids, dataB)res.shape = (6,1024)# res[0] = dataA[0]# res[1] = dataA[1] - dataB[0] - dataB[1]# res[2] = dataA[2] - dataB[3] - dataB[4]# res[3] = dataA[3]# res[4] = dataA[4] - dataB[2]# res[5] = dataA[5]

### inplace_update

#### Description

Replaces the specified row of **lhs** with **rhs** for computation.

For example:

res = lhs res[ids,:] = rhs return res

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/inplace_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32, int32

- The maximum value of the first dimension of
**rhs**is**7934**. A value larger than**7934**cannot be processed. - If the first dimension of
**rhs**is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the**ulimit -s**command to increase the stack space, for example, from 8192 to 81920.

#### Prototype

te.lang.cce.inplace_update(lhs, inplace_ids, rhs)

#### Arguments

**lhs**: input left tensor**inplace_ids**: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of**lhs**. The length must be the same as that of the first dimension of**rhs**.**rhs**: right tensor or scalar. The dimensions must be the same as those of**lhs**, except the first dimension. If**inplace_ids**is of integer type,**rhs**has one dimension less than**lhs**. For example,**lhs**is (10, 1024),**inplace_ids**is [5], and**rhs**is (1, 1024);**lhs**is (10, 1024),**inplace_ids**is 5, and**rhs**is (1024,).

#### Returns

**res_tensor**: tensor after computation

#### Example

import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_update(dataA, inplace_ids, dataB)res.shape = (6,1024)# res[0] = dataA[0]# res[1] = dataB[1]# res[2] = dataB[4]# res[3] = dataA[3]# res[4] = dataB[2]# res[5] = dataA[5]

## Cast Compute APIs

Cast compute APIs are used to round the input tensor element-wise based on certain rules.

### ceil

#### Description

Rounds up a raw_tensor element-wise.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/cast_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.ceil(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor of type int32

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.ceil(data)

### floor

#### Description

Rounds down a raw_tensor element-wise.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/cast_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.floor(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor of type int32

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.floor(data)

### round

#### Description

Performs banker's rounding on a raw_tensor element-wise. For example, the constant interval between adjacent members is 0.5, then 1.5 rounds up to 2, and 2.5 rounds down to 2.0.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/cast_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.round(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor of type int32

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.round(data)

### trunc

#### Description

Rounds a raw_tensor towards 0 element-wise. For example, –1.9 rounds up to –1, and 1.9 rounds down to 1.

**python/site-packages/te/te/lang/cce/te_compute/cast_compute.py** in the ATC installation path.

#### Restrictions

Ascend 910 AI Processor: float16, float32

#### Prototype

te.lang.cce.trunc(raw_tensor)

#### Arguments

**raw_tensor**: a tvm.tensor for the input tensor

#### Returns

**res_tensor**: a tvm.tensor for the result tensor of type int32

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.trunc(data)

## Concat Compute API

### concat

#### Description

Reconcatenates multiple input tensors based on a specified axis.

**raw_tensors** indicates multiple input tensors, which have the same data type.

If raw_tensors[i].shape = [D0, D1, ... Daxis(i), ...Dn], the shape of the output after concatenation based on **axis** is [D0, D1, ... Raxis, ...Dn].

Where, Raxis = sum(Daxis(i)).

For example:

t1 = [[1, 2, 3], [4, 5, 6]] t2 = [[7, 8, 9], [10, 11, 12]] concat([t1, t2], 0) # [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]] concat([t1, t2], 1) # [[1, 2, 3, 7, 8, 9], [4, 5, 6, 10, 11, 12]] # The shape of tensor t1 is [2, 3]. # The shape of tensor t2 is [2, 3]. concat([t1, t2], 0).shape # [4, 3] concat([t1, t2], 1).shape # [2, 6]

The parameter **axis** can also be a negative number, indicating the** axis** + **len(shape)** axis, which is computed starting from the end of the dimension.

For example:

t1 = [[[1, 2], [2, 3]], [[4, 4], [5, 3]]] t2 = [[[7, 4], [8, 4]], [[2, 10], [15, 11]]] concat([t1, t2], -1)

The output is as follows:

[[[ 1, 2, 7, 4], [ 2, 3, 8, 4]], [[ 4, 4, 2, 10], [ 5, 3, 15, 11]]]

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/concat_compute.py** in the ATC installation path.

#### Restrictions

This API cannot be used in conjunction with other TBE DSL APIs.

For input tensors, the axes except **axis** must have the same dimensions.

The supported data types are as follows: **int8**, **uint8**, **int16**, **int32**, **float16**, and **float32**.

#### Prototype

te.lang.cce.concat(raw_tensors, axis)

#### Arguments

**raw_tensors**: tensor list,**list**type. The element is tvm.tensor, and the last dimension of the tensor shape must be 32-byte aligned.**axis**: axis along which a concatenation is performed. The value range is [–d, d – 1]. The parameter**d**indicates the dimension count of a raw_tensor.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

import tvm import te.lang.cce shape1 = (64,128) shape1 = (64,128) input_dtype = "float16" data1 = tvm.placeholder(shape1, name="data1", dtype=input_dtype) data2 = tvm.placeholder(shape2, name="data1", dtype=input_dtype) data = [data1, data2] res = te.lang.cce.concat(data, 0)# res.shape = (128,128)

## Convolution Compute API

### conv

#### Description

Computes the 2D convolution of the float16 type with the given 5HD data and FracZ weight.

The shape of the data tensor is 5HD, that is, (N, C1, H, W, C0). The shape of the weight tensor is FracZ, that is, (C1 x KH x KW, Cout//C0_out, C0_out, C0).

This API supports bias.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/conv_compute.py** in the ATC installation path.

#### Restrictions

This API cannot be used in conjunction with other TBE DSL APIs.

The supported data type is float16.

#### Prototype

te.lang.cce.conv(data, weight, para_dict, optim_dict=None, dsl_flag=True)

#### Arguments

**Data**: feature map for 2D convolution, tensor, 5HD format. Currently, the**float16**type is supported.**Weight**: weight for 2D convolution, tensor, FracZ format. Currently, the**float16**type is supported.**para_dict**: a dictionary of parameters.Currently, the following parameters must be passed to**para_dict**:**pad_h**: padding height of the feature map for 2D convolution (**int**type)**pad_w**: padding width of the feature map for 2D convolution (**int**type)**stride_h**: stride height direction of the feature map for 2D convolution (**int**type)**stride_w**: stride width of the feature map for 2D convolution (**int**type)**filter_h**: filter height for 2D convolution (**int**type)**filter_w**: filter width for 2D convolution (**int**type)

Currently, the following optional parameters are supported in

**para_dict**:**Bias**: a tensor. It is supported in 2D convolution of**float16**type. If this argument exists, bias is included in the convolutional computation. Otherwise, bias is not included. In addition,**Bias**must have the same size as**cout**.

**optim_dict**: a dictionary for the enabled optimization featuresThe key values are of type bool.**True**: feature enabled**False**: feature disabled

Defaults to

**optim_dict = {"c0_optim_flg": False}**, indicating that the C0 = 4 feature is disabled.Note that only

**c0_optim_flg**is currently configurable.**dsl_flag**: whether to support automatic UB fusion.**True**: yes;**False**: no.

#### Returns

**res_tensor**: output tensor for convolutional computation

#### Example

Take 2D convolution of **float16** type as an example. The feature map is (Batch = 1, C = 32, H = 16, W = 8), **Cout** is **32**, and **Weight** is (KernelH = KernelW = 1),

Then, the shape of the 5HD feature map tensor is (Batch, C/16, H, W, 16) = (1, 2, 16, 8, 16).

The shape of the FracZ weight tensor is (KernelH * KernelW * C/16, Cout/16, 16, 16)= (2, 2, 16, 16).

The shape of the bias tensor is (Cout,) = (32,).

import te from te import tvm shape_in = (1, 2, 16, 8, 16) shape_w = (2, 2, 16, 16) pad_h = 0 pad_w = 0 stride_h = 1 stride_w = 1 filter_h = 1 filter_w = 1 Data = tvm.placeholder(shape_in, name='FmapW', dtype="float16") Weight = tvm.placeholder(shape_w, name='FilterW', dtype="float16") bias_tensor = tvm.placeholder( (shape_w[1] * shape_w[2], ), name='Bias', dtype="float16") res_tensor = te.lang.cce.conv( Data, Weight, {"bias_tensor": bias_tensor, "pad_h": pad_h, "pad_w": pad_w, "stride_h": stride_h, "stride_w": stride_w, "filter_h": filter_h, "filter_w": filter_w, "offset_a":0})

## 4D/5D Conversion APIs

### compute_four2five

#### Description

Converts the 4D data format **NCHW** to the 5D data format **NC1HWC0**.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/dim_conv.py** in the ATC installation path.

#### Restrictions

This API cannot be used in conjunction with other TBE DSL APIs.

The supported data type is float16.

#### Prototype

te.lang.cce.compute_four2five(input, raw_shape_4D)

#### Arguments

**input**: a 4D tvm.tensor for the input tensor (N,C,H,W)**raw_shape_4D**: format of the input tensor

#### Returns

**res_tensor**: a 5D tvm.tensor for the result tensor (N,C1,H,W,C0)

#### Example

import tvm import te.lang.cce raw_shape = (2, 32, 16, 128) in_dtype = "float16" input = tvm.placeholder(raw_shape, name='input', dtype=in_dtype) res = te.lang.cce.compute_four2five(input, raw_shape) # res.shape = (2,(32+15)//16,16,128,16)

### compute_five2four

#### Description

Converts the 5D data format **NC1HWC0** to the 4D data format **NCHW**.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/dim_conv.py** in the ATC installation path.

#### Restrictions

This API cannot be used in conjunction with other TBE DSL APIs.

The supported data type is float16.

#### Prototype

te.lang.cce.compute_five2four(input, raw_shape_4D)

#### Arguments

**input**: a 5D tvm.tensor for the input tensor (N,C1,H,W,C0)**raw_shape_4D**: format of the result tensor

#### Returns

**res_tensor**: a 4D tvm.tensor for the result tensor (N, C, H, W)

#### Example

import tvm import te.lang.cce raw_shape = (2, 32, 16, 128) input_shape = (2,(32+15)//16,16,128,16) in_dtype = "float16" input = tvm.placeholder(input_shape, name='input', dtype=in_dtype) res = te.lang.cce.compute_five2four(input, raw_shape) # res.shape = (2, 32, 16, 128)

## Matmul Compute API

### matmul

#### Description

Multiplies the matrix. The formula is tensor_c = trans_a(tensor_a) * trans_b(tensor_b) + tensor_bias.

For **tensor_a** and **tensor_b**, the last two dimensions of **shape** (after transposition) must meet the following matrix multiplication condition: (M, K) * (K, N) = (M, N). Multiple dimensions are supported. If **is_fractal** is set to **True**, the data layout of **tensor_a** must meet the fractal structure of L0A, and the data layout of **tensor_b** must meet the fractal structure of L0B. If **is_fractal** is set to **False**, both **tensor_a** and **tensor_b** use the ND layout.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/mmad_compute.py** in the ATC installation path.

#### Restrictions

This API cannot be used in conjunction with other TBE DSL APIs.

The input supports **float16**, and the output supports **float16** and **float32**.

#### Prototype

te.lang.cce.matmul(tensor_a, tensor_b, trans_a=False, trans_b=False, format_a="ND", format_b="ND", alpha_num=1.0, beta_num=0.0, dst_dtype="float16", tensor_bias=None, quantize_params=None)

#### Arguments

**tensor_a**: a tvm.tensor for matrix A**tensor_b**: a tvm.tensor for matrix B**trans_a**: a bool specifying whether to transpose matrix A**trans_b**: a bool specifying whether to transpose matrix B**format_a**: format of matrix A, either**ND**or**fractal**.**format_b**: format of matrix B, either**ND**or**fractal**.**alpha_num**: a broadcast parameter, which is not used currently. Defaults to**1.0**.**beta_num**: a broadcast parameter, which is not used currently. Defaults to**0.0**.**dst_dtype**: output data type, either**float16**or**float32**.**tensor_bias**: Defaults to**None**. If the value is not empty,**tensor_bias**is added to the computation result obtained after matrix A is multiplied by matrix B. The shape of**tensor_bias**supports broadcasting. The data type of**tensor_bias**must be the same as**dst_dtype**.**quantize_params**: quantization parameters**quantize_params**: input parameter about quantization, which is in the dictionary format. If**quantize_params**is**None**, quantization is disabled. If**quantize_params**is not**None**, quantization is enabled. The parameters are as follows:**quantize_alg**: quantization mode. The value can be**NON_OFFSET**(default) or**HALF_OFFSET_A**.**scale_mode_a**: reserved**scale_mode_b**: reserved**scale_mode_out**: output dequantization, that is, the value type of the quantization parameter. The value can be**SCALAR**(default) or**VECTOR**.**sqrt_mode_a**: reserved**sqrt_mode_b**: reserved**sqrt_mode_out**: whether the square root of**scale_drq**is extracted. The value can be**NON_SQRT**(default) or**SQRT**.**scale_q_a**: reserved**offset_q_a**: reserved**scale_q_b**: reserved**offset_q_b**: reserved**scale_drq**: placeholder of the output dequantization or requantization weight parameter. Defaults to**None**.**offset_drq**: reserved

The quantization modes are as follows:

- Input quantization: refers to quantization from input data to intermediate data. Generally,
**fp16**data is quantized to the**int8**or**uint8**data. - Output quantization: refers to quantization from intermediate data to output data. The following two quantization modes are available:
- Requantization: quantizes

**int32**to**int8**.- Dequantization: quantizes

**int32**to**fp16**.

#### Returns

**tensor_c**: a tvm.tensor for the result tensor

#### Example

from te import tvm import te.lang.cce a_shape = (1024, 256) b_shape = (256, 512) bias_shape = (512, ) in_dtype = "float16" dst_dtype = "float32" tensor_a = tvm.placeholder(a_shape, name='tensor_a', dtype=in_dtype) tensor_b = tvm.placeholder(b_shape, name='tensor_b', dtype=in_dtype) tensor_bias = tvm.placeholder(bias_shape, name='tensor_bias', dtype=dst_dtype) res = te.lang.cce.matmul(tensor_a, tensor_b, False, False, False, dst_dtype=dst_dtype, tensor_bias=tensor_bias)

## Pooling2d Compute API

### pooling2d

#### Description

Samples signals in different sliding windows of **tensor_in** in different pooling modes.

The pooling mode can be **MAX**, **AVG**, **GMP**, or **GAP**.

**MAX**: max pooling, used to compute the maximum of the elements covered by each sliding window**AVG**: avg pooling, used to compute the average value for the sum of the elements covered by each sliding window**GMP**: global max pooling, a special mode of max pooling, that is, max pooling whose window size is the same as the feature map size.**GAP**: global avg pooling, a special mode of avg pooling, that is, avg pooling whose window size is the same as the feature map size.

When **pooling_mode** is set to **MAX** and **padding_mode** is set to **SAME** for **tensor_in**, the pooling result is as follows:

q

where,

**input_w**: width of tensor_in**input_h**: height of tensor_in**kernel_w**: width of the window**kernel_h**: height of the window**pad_top**: number of top padding rows in the H direction of**tensor_in**. The value is**1**in the figure.**pad_bottom**: number of bottom padding rows in the H direction of**tensor_in**. The value is**1**in the figure.**pad_left**: number of left padding columns in the W direction of**tensor_in**. The value is**1**in the figure.**pad_right**: number of right padding columns in the W direction of**tensor_in**. The value is**1**in the figure.**stride_w**: width of the stride**stride_h**: height of the stride

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/pooling2d_compute.py** in the ATC installation path.

This API supports the basic pooling functions as well as the output quantization function. The quantization function is disabled by default. To enable the quantization function, set **quantize_params** based on the quantization algorithm requirements. For details, see the parameter description.

#### Restrictions

This API cannot be used in conjunction with other TBE DSL APIs.

- The supported input data type is
**float16**. **tensor_in**is a 5D tensor of format NC1HWC0.- The last dimension C0 of
**tensor_in**must be**16**. - The dimension of window must be
**2**and be a positive integer within the range of [1, 32768]. - The stride dimension must be
**2**and be a positive integer. The width and height of the stride must be within the range of [1, 63]. - If
**pad**is input, the dimension of**pad**must be**4**. The value of**pad**should be greater than or equal to**0**. - The dilation dimension must be
**2**and be a positive integer within the range of [1, 255]. - When
**pooling_mode**is set to**MAX**or**AVG**in VALID mode, the following condition must be met:out_w * window_h * window_w * C0 * SIZE_OF_FP16 + out_w * C0 * SIZE_OF_FP16 < ub_size

- When
**pooling_mode**is set to**AVG**in**SAME**mode, the following condition must be met:out_w * window_h * window_w * C0 * SIZE_OF_FP16 + out_w * C0 * SIZE_OF_FP16

+ out_w * C0 * SIZE_OF_FP16 < ub_size

- When
**pooling_mode**is set to**MAX**or**AVG**, the following conditions must be met: stride_h ≤ 2 x window_h, and stride_w ≤ 2 x window_w - When
**pooling_mode**is set to**MAX**or**AVG**, the following condition must be met: Window width x Window height < 256 - When
**pooling_mode**is set to**MAX**or**AVG**, the**tensor_in**,**pad**, and**window**must meet the following conditions:stride_h <= in_size_h + pad_top + pad_bottom – window_h

stride_w <= in_size_w + pad_left + pad_right – window_w

- When
**pooling_mode**is set to**GAP**or**GMP**, the following conditions must be met: window_h = in_size_h and window_w = in_size_w - When
**pooling_mode**is set to**GAP**or**GMP**, the following condition must be met: padding_mode = "VALID"

**ub_size**indicates the available size of the unified buffer (UB).**out_w**indicates the width of the output tensor**window_h**indicates the height of the window.**window_w**indicates the width of the window.**C0**indicates C0 of tensor_in.**SIZE_OF_FP16**indicates the size of**float16**type.

#### Prototype

te.lang.cce.pooling2d(tensor_in, window, stride, pooling_mode, padding_mode="SAME", pad = (0,0,0,0), dilation = (1,1), data_mode=1, ceil_mode=0)

#### Arguments

**tensor_in**: input feature map of the**tvm.tensor**type. A 5D tensor of format NC1HWC0.**window**: size of the input sliding window, list or tuple type.**window[0]**indicates the width of the input window, and**window[1]**indicates the height of the input window.**stride**: stride of the input sliding window, list or tuple type.**stride[0]**indicates the stride of the window in the W direction of the feature map, and**stride[1]**indicates the stride of the window in the H direction of the feature map.**pooling_mode**: pooling mode selected from**MAX**,**AVG**,**GMP**, and**GAP**.**MAX**: max pooling, used to compute the maximum of the elements covered by each sliding window**AVG**: average pooling, used to compute the average value for the sum of the elements covered by each sliding window**GMP**: global max pooling, which is a special mode of max pooling. The feature map size is the same as the window size. The maximum value of feature map elements is used as the computation output.**GAP**: global average pooling, which is a special mode of avg pooling. The feature map size is the same as the window size. The average value of feature map elements is used as the computation output.

**padding_mode**: padding mode selected from**VALID**(padding disabled) and**SAME**(padding enabled).- In
**VALID**mode:When the window movement in the W or H direction can cover only some parts of the feature map, the data that does not cover a complete window is discarded. That is, this data in the feature map is not involved in the computation.

**MAX**,**AVG**,**GMP**, and**GAP**all involve the**VALID**mode. - In
**SAME**mode:When the window movement in the W or H direction can cover only some parts of the feature map, pad

**0**to ensure that a complete window can be covered. That is, this data in the feature map is involved in the computation.**MAX**and**AVG**involve the**SAME**mode, while**GMP**and**GAP**do not involve the**SAME**mode.

- In
**pad**: a list or tuple for the padding sizes. An optional argument used to be compatible with Caffe pooling.**pad[0]**,**pad[1]**,**pad[2]**, and**pad[3]**indicate the padding in the top, bottom, left, and right, respectively. The default values are (0, 0, 0, 0).**dilation**: a list or tuple for the dilation factors. An optional argument.**dilation[0]**and**dilation[1]**indicate the dilation factors of the window in the H and W directions, respectively. The default values are (1, 1).**data_mode**: template type. 0 = CAFFE_DATA_MODE; 1 = TENSORFLOW_DATA_MODE.**ceil_mode**: corresponding to**round_mode**in Caffe.**0**: ceiling (default);**1**: floor

#### Returns

**res_tensor**: a 5D tvm.tensor for the result tensor (NC1HWC0)

The **shape** of **tensor_in** is [N, C1, H, W, C0=16], the **shape** of the window is [F, F], and the stride is [S, S].

In **VALID **mode and **SAME **mode of **MAX** pooling and **AVG** pooling, the **shape** of the output tensor is computed as follows:

- In
**VALID**mode:- The N and C dimensions remain unchanged.
- The dimensions of
**Hout**and**Wout**are as follows:

- In
**SAME**mode:- The N and C dimensions remain unchanged.
- The dimensions of
**Hout**and**Wout**are as follows:**W**is the input size,**F**is the filter size,**S**is the stride, and**[]**is the round-up symbol.

**VALID**modes of

**GMP**pooling and

**GAP**pooling, the

**shape**of the output tensor is computed as follows:

- The N and C dimensions remain unchanged.
- Hout = Wout = 1

#### Example

from te import tvm import te.lang.cce shape = (1, 2, 416, 416, 16) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.pooling2d(data, (3, 3), (2, 2), "AVG", "SAME")# res.shape = (1, 2, 208, 208, 16)

## Common Compute APIs

### round_to

#### Description

Rounds **data** towards the range of [min_value, max_value] and compares **data** with **min_value** and **max_value** element-wise. If the element value is between **min_value** and **max_value**, use the value of the data element. If the element value is less than **min_value** or greater than **max_value**, these **min_value** and **max_value** will be preferred.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/common.py** in the ATC installation path.

#### Restrictions

In case of data type inconsistency, **max_value** and **min_value** will be converted into the same data type as **data** during computation.

The supported data types are **float16**, **float32**, **int8**, **uint8**, and **int32**. However, **int8**, **uint8**, and **int32** will be converted to **float16**.

#### Prototype

te.lang.cce.round_to(data, max_value, min_value)

#### Arguments

**data**: a tvm.tensor for the input tensor**max_value**: a scalar for the maximum value of the target range**min_value**: a scalar for the minimum value of the target range

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) max_value = tvm.const(2, dtype =input_dtype) min_value = tvm.const(3, dtype =input_dtype) res = te.lang.cce.round_to(data, max_value, min_value)

### cast_to

#### Description

Converts the data type of a tensor, specifically, from **data** to **dtype**.

The API is defined in **python/site-packages/te/te/lang/cce/te_compute/common.py** in the ATC installation path.

#### Restrictions

Source Data Type |
Destination Data Type |
---|---|

float32 |
float16 |

float32 |
int8 |

float32 |
uint8 |

float16 |
float32 |

float16 |
int8 |

float16 |
uint8 |

float16 |
int32 |

int8 |
float16 |

int8 |
uint8 |

int32 |
float16 |

int32 |
int8 |

int32 |
uint8 |

#### Prototype

te.lang.cce.cast_to(data, dtype, f1628IntegerFlag=True)

#### Arguments

**data**: a tvm.tensor for the input tensor**dtype**: destination data type, string type**f1628IntegerFlag**: Defaults to**True**. If the decimal part of the data before conversion is 0, set**f1628IntegerFlag**to**True**. If the decimal part of the data before conversion is not 0, set**f1628IntegerFlag**to**False**.

#### Returns

**res_tensor**: a tvm.tensor for the result tensor

#### Example

shape = (1024,1024) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.cast_to(data,"float32")

## Build APIs

### auto_schedule

Generates a **schedule **object based on the defined computation process.

The API is defined in **python/site-packages/topi/topi/generic/cce.py** in the ATC installation path.

#### Prototype

topi.generic.auto_schedule(outs, option=None)

#### Arguments

**outs**: list of output tensors of the tensor. A single output or multiple outputs are supported.

**option**: configuration parameter used for RL operator search. Does not need to be configured for auto_schedule.

#### Returns

**schedule**: computation schedule of the operator

#### Example

import te.lang.cce from te import tvm import topi.generic shape = (28,28) dtype = "float16" # Define input. data = tvm.placeholder(shape, name="data", dtype=dtype) # Describe the computation process of the operator. res = te.lang.cce.vabs(data) with tvm.target.cce(): # Generate a schedule object. sch = topi.generic.auto_schedule(res)

### cce_build_code

Builds the schedule object to generate an operator binary file and an operator description file.

The API is defined in **python/site-packages/te/te/lang/cce/te_schedule/cce_schedule.py** in the ATC installation path.

#### Prototype

te.lang.cce.cce_build_code(sch, config_map=None)

#### Arguments

**sch**: tvm.schedule, schedule to build or to print lower code**config_map**: a dictionary of build configuration. Defaults to**None**. The keys include:**print_ir**: whether to print lower IR code. Defaults to**True**.**need_build**: whether build is performed. Defaults to**True**.**name**: operator name. Defaults to**cce_op**.The value can contain only uppercase letters, lowercase letters, digits, and underscores (_), must start with a letter or underscore (_), and cannot exceed 200 characters.

**tensor_list**: a list of input and output tensors. The input is the tensor object returned by the**placeholder**API. The output is the computed tensor object. The value is mandatory. Otherwise, an error is reported. In addition, this list determines the sequence of the parameters of the kernel function for generating an operator, which is the same as the sequence of the input and output in the list.**bool_storage_as_1bit**: whether to store bools by 1 bit.**True**: 1-bit storage**False**: 8-bit storage

Default value:

**True**

#### Returns

None

#### Example

import te.lang.cce from te import tvm from topi import generic # Define the input placeholder. data = tvm.placeholder(shape, name="data", dtype=dtype) with tvm.target.cce(): # Describe the computation process of the operator. res = te.lang.cce.vabs(data) # Generate a schedule object. sch = generic.auto_schedule(res) # Define build parameters. config = {"print_ir" : True, "need_build" : True, "name" : "abs_28_28_float16", "tensor_list" : [data,res] } # Build operator te.lang.cce.cce_build_code(sch, config)

## Instructions

### Usage Example

The following example shows how to concatenate the preceding APIs. It implements a simple operator that supports the **float16** type to obtain an absolute value.

import te.lang.cce from te import tvm import topi shape = (28,28) dtype = "float16" # Define input. data = tvm.placeholder(shape, name="data", dtype=dtype) # Describe the computation process of the operator. res = te.lang.cce.vabs(data) with tvm.target.cce(): # Generate a schedule object. sch = topi.generic.auto_schedule(res) # Define build parameters. config = {"print_ir" : True, "need_build" : True, "name" : "abs_28_28_float16", "tensor_list" : [data,res]} # Build operator te.lang.cce.cce_build_code(sch, config)

### Exception Handling

If an exception occurs when an API is executed, the error is usually caused by incorrect input parameters. The following example shows the error information when **tensor_list** is incomplete.

Code example:

data = tvm.placeholder(shape, name="data", dtype=inp_dtype) with tvm.target.cce(): res = te.lang.cce.vabs(data) sch = generic.auto_schedule(res) config = {"print_ir": need_print, "need_build": need_build, "name": kernel_name, "tensor_list": [res]} te.lang.cce.cce_build_code(sch, config)

The following error information is displayed:

Traceback (most recent call last): File "llt/tensor_engine/ut/testcase_python/tf_abs/test_tf_abs_cce.py", line 71, in test_cce_tf_abs_99991_fp16 tf_abs_cce((99991,), dtype = "Float16", need_build = False, need_print = False, kernel_name = "cce_tf_abs") File "/home1/repotvm/tensor_engine/topi/python/topi/cce/tf_abs.py", line 68, in tf_abs_cce te.lang.cce.cce_build_code(sch, config) File "/home1/repotvm/tensor_engine/python/te/lang/cce/te_schedule/cce_schedule.py", line 381, in cce_build_code _build(sch, tensor_list, local_config_map["name"]) File "/home1/repotvm/tensor_engine/python/te/lang/cce/te_schedule/cce_schedule.py", line 338, in _build mod = tvm.build(sch, tensor_list, device, name=name) File "/home1/repotvm/tensor_engine/python/te/tvm/build_module.py", line 432, in build binds=binds) File "/home1/repotvm/tensor_engine/python/te/tvm/build_module.py", line 353, in lower stmt = ir_pass.StorageFlatten(stmt, binds, 64) File "/home1/repotvm/tensor_engine/python/te/tvm/_ffi/function.py", line 280, in my_api_func return flocal(*args) File "/home1/repotvm/tensor_engine/python/te/tvm/_ffi/_ctypes/function.py", line 183, in __call__ ctypes.byref(ret_val), ctypes.byref(ret_tcode))) File "/home1/repotvm/tensor_engine/python/te/tvm/_ffi/base.py", line 66, in check_call raise TVMError(py_str(_LIB.TVMGetLastError())) TVMError: [17:12:02] /home1/repotvm/tensor_engine/src/pass/storage_flatten.cc:249: Check failed: it != buf_map_.end() Cannot find allocated buffer for placeholder(data, 0x27d7290)

The problem is solved after the parameter is modified as follows:

"tensor_list" : [data, res]