Inplace Compute APIs
Inplace compute APIs are used to compute tensors based on rows, such as add, sub, and update.
inplace_add
Description
Adds lhs and rhs based on a specified row.
For example:
res = lhs res[ids,:] += rhs return res
The API is defined in python/site-packages/te/lang/cce/te_compute/inplace_compute.py in the ATC installation path.
Restrictions
Ascend 310 AI Processor: float16, float32, int32
- The maximum value of the first dimension of rhs is 7934. A value larger than 7934 cannot be processed.
- If the first dimension of rhs is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the ulimit -s command to increase the stack space, for example, from 8192 to 81920.
Prototype
te.lang.cce.inplace_add(lhs, inplace_ids, rhs)
Parameters
- lhs: input left tensor
- inplace_ids: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of lhs. The length must be the same as that of the first dimension of rhs.
- rhs: right tensor or scalar. The dimensions must be the same as those of lhs, except the first dimension. If inplace_ids is of integer type, rhs has one dimension less than lhs. For example, lhs is (10, 1024), inplace_ids is [5], and rhs is (1, 1024); lhs is (10, 1024), inplace_ids is 5, and rhs is (1024,).
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_add(dataA, inplace_ids, dataB) res.shape = (6,1024) # res[0] = dataA[0] # res[1] = dataA[1] + dataB[0] + dataB[1] # res[2] = dataA[2] + dataB[3] + dataB[4] # res[3] = dataA[3] # res[4] = dataA[4] + dataB[2] # res[5] = dataA[5]
inplace_sub
Description
Subtracts lhs and rhs based on a specified row.
For example:
res = lhs res[ids,:] -= rhs return res
The API is defined in python/site-packages/te/lang/cce/te_compute/inplace_compute.py in the ATC installation path.
Restrictions
Ascend 310 AI Processor: float16, float32, int32
- The maximum value of the first dimension of rhs is 7934. A value larger than 7934 cannot be processed.
- If the first dimension of rhs is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the ulimit -s command to increase the stack space, for example, from 8192 to 81920.
Prototype
te.lang.cce.inplace_sub(lhs, inplace_ids, rhs)
Parameters
- lhs: input left tensor
- inplace_ids: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of lhs. The length must be the same as that of the first dimension of rhs.
- rhs: right tensor or scalar. The dimensions must be the same as those of lhs, except the first dimension. If inplace_ids is of integer type, rhs has one dimension less than lhs. For example, lhs is (10, 1024), inplace_ids is [5], and rhs is (1, 1024); lhs is (10, 1024), inplace_ids is 5, and rhs is (1024,).
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_sub(dataA, inplace_ids, dataB) res.shape = (6,1024) # res[0] = dataA[0] # res[1] = dataA[1] - dataB[0] - dataB[1] # res[2] = dataA[2] - dataB[3] - dataB[4] # res[3] = dataA[3] # res[4] = dataA[4] - dataB[2] # res[5] = dataA[5]
inplace_update
Description
Replaces the specified row of lhs with rhs for computation.
For example:
res = lhs res[ids,:] = rhs return res
The API is defined in python/site-packages/te/lang/cce/te_compute/inplace_compute.py in the ATC installation path.
Restrictions
Ascend 310 AI Processor: float16, float32, int32
- The maximum value of the first dimension of rhs is 7934. A value larger than 7934 cannot be processed.
- If the first dimension of rhs is large (for example, 5000), a core dump may occur due to OS stack overflow. In this case, you can run the ulimit -s command to increase the stack space, for example, from 8192 to 81920.
Prototype
te.lang.cce.inplace_update(lhs, inplace_ids, rhs)
Parameters
- lhs: input left tensor
- inplace_ids: int or list type. The value must be an integer greater than or equal to 0 and less than or equal to the first dimension of lhs. The length must be the same as that of the first dimension of rhs.
- rhs: right tensor or scalar. The dimensions must be the same as those of lhs, except the first dimension. If inplace_ids is of integer type, rhs has one dimension less than lhs. For example, lhs is (10, 1024), inplace_ids is [5], and rhs is (1, 1024); lhs is (10, 1024), inplace_ids is 5, and rhs is (1024,).
Returns
res_tensor: tensor after computation
Example
import tvm import te.lang.cce input_dtype = "float16" dataA = tvm.placeholder((6,1024), name="dataA", dtype=input_dtype) dataB = tvm.placeholder((5,1024), name="dataB", dtype=input_dtype) inplace_ids = [1,1,4,2,2] res = te.lang.cce.inplace_update(dataA, inplace_ids, dataB) res.shape = (6,1024) # res[0] = dataA[0] # res[1] = dataB[1] # res[2] = dataB[4] # res[3] = dataA[3] # res[4] = dataB[2] # res[5] = dataA[5]