Operator Basics
A deep learning algorithm consists of multiple compute units referred to as operators (Ops). In network models, an operator describes the compute logic of the layer, for example, the convolution layer that performs convolution and the Fully-Connected (FC) layer that multiplies the input by a weight matrix.
The following introduces some basic terms about operators.
Operator Name
The name of an operator identifies the operator on a network, and therefore must be unique on a network. The example network has operators conv1, pool1, and conv2. conv1 and conv2 are of the same type convolution. conv1 and conv2 each indicates a convolution operation.
Operator Type
Every operator is of a specific type, for example, convolution. A network can have different operators of the same type.
Tensor
Tensors are used to represent the input data and output data in TBE computations. TensorDesc (the tensor descriptor) describes the input data and output data. Table 2-1 describes the attributes of the TensorDesc struct.
Attribute |
Definition |
---|---|
name |
Indexes a tensor and must be unique. |
shape |
Specifies the shape of a tensor, for example, (10,), (1024, 1024), or (2, 3, 4). For details, see Shape. Default value: none Format: (i1, i2, ..., in), where, i1 to in are positive integers. |
dtype |
Specifies the data types of a tensor object. Default value: none Value range: float16, float32, int8, int16, int32, uint8, uint16, bool NOTE:
Different operations support different data types. For details, see API Reference. |
format |
Specifies the data layout format. For details, see Format. |
- Shape
The shape of a tensor is described in the format of (D0, D1, ..., Dn – 1), where, D0 to Dn are positive integers.
For example, the shape (3, 4) indicates a 3 x 4 matrix, where the first dimension has three elements, and the second dimension has four elements.
The number count in the round bracket equals to the dimension count of the tensor. The first element depends on the element count in the outer square brackets, and the second element depends on the element count in the second left square bracket, and so on. See the following examples.
Table 2-2 Tensor shape examplesTensor
Shape
1
(0,)
[1, 2, 3]
(3,)
[[1, 2],[3, 4]]
(2, 2)
[[[1, 2],[3, 4]], [[5, 6],[7, 8]]]
(2, 2, 2)
The tensor shape has its physical meanings:
For a tensor with shape (4, 20, 20, 3), it indicates four 20 x 20 (corresponding to the two 20s in the shape) pictures (corresponding to 4 in the shape), each of whose pixel contains the red, green, and blue color components (corresponding to 3 in the shape).
Figure 2-2 Physical meanings of tensor shapeIn programming, the shape can be simply understood as a loop of each layer of a tensor. For example, to operate tensor A with shape (4, 20, 20, 3), the loop statement is as follows.
produce A { for (i, 0, 4) { for (j, 0, 20) { for (p, 0, 20) { for (q, 0, 3) { A[((((((i*20) + j)*20) + p)*3) + q)] = a_tensor[((((((i*20) + j)*20) + p)*3) + q)] } } } } }
- Format
In deep learning, n-dimensional data is stored by using an n-dimensional array. For example, a feature map of a convolutional neural network is stored by using a four-dimensional array, including the batch size (Batch, N), feature map height (Height, H), feature map width (Width, W), and number of feature map channels (Channels, C), respectively.
Data is stored in linear mode only because the dimensions are arranged with a fixed layout. Different deep learning frameworks store feature maps with different layouts. For example, Caffe uses the layout [Batch, Channels, Height, Width], that is, NCHW, while TensorFlow uses the layout [Batch, Height, Width, Channels], that is, NHWC.
As shown in Figure 2-3, for an RGB image, the pixel values of each channel are clustered in sequence as RRRGGGBBB with the NCHW layout. However, with the NHWC layout, the pixel values are interleaved as RGBRGBRGB.
To improve data access efficiency, the tensor data in the Ascend AI Processor is stored in the 5D format NC1HWC0. C0, closely related to the micro architecture, is the size of the Cube Unit in the AI Core. C0 is 16 for FP16 or 32 for INT8. C0 needs to be stored contiguously. C1 = C/C0. If the result is not an integer, round it up.
Steps of NHWC-to-NC1HWC0 conversion:
- Split the NHWC data into C1 pieces of NHWC0 along the C dimension.
- Arrange the C1 pieces of NHWC0 in the memory contiguously, obtaining NC1HWC0.
Applications of NHWC-to-NC1HWC0 conversion:- Convert RGB images at the first layer into the NC1HWC0 format by using AIPP.
- Rearrange the NC1HWC0 output from intermediate layers of the feature map during data movement.
For conv3D, if the original data layout format is NDHWC, the data layout format in the Ascend AI Processor is NDC1HWC0. During model conversion or online network construction, FE automatically inserts cast operators as described in Cast Operator to convert the original data format NDHWC into NDC1HWC0 in the Ascend AI Processor.
Axis
An axis is denoted by the index of a dimension of a tensor. For a two-dimensional tensor with five rows and six columns, that is, with shape (5, 6), axis 0 represents the first dimension in the tensor, that is, the rows; axis 1 represents the second dimension of tensor, that is, the columns.
For example, for tensor [[[1, 2],[3, 4]], [[5, 6],[7, 8]]] with shape (2, 2, 2), axis 0 represents data in the first dimension, that is, matrices [[1, 2],[3, 4]] and [[5, 6],[7, 8]], axis 1 represents data in the second dimension, that is, arrays [1, 2], [3, 4], [5, 6], and [7, 8], and axis 2 indicates the data in the third dimension, that is, numbers 1, 2, 3, 4, 5, 6, 7, and 8.
A negative axis is interpreted as indexing from the end.
The axes of an n-dimensional tensor include 0, 1, 2, ..., and n – 1.
Weight
The input data is multiplied by a weight value in the compute unit. For example, for a two-input operator, an associated weight value is allocated to each of the inputs. Generally, data with more importance is assigned with a greater weight value. Therefore, the feature indicated by data with zero weight can be ignored.
As shown in Figure 2-5, in the compute unit, input X1 is multiplied by its associated weight W1, that is, X1 * W1.
Bias
A bias is another linear component to be applied to the input data, in addition to a weight. The bias is added to the product of the input and its weight.
As shown in Figure 2-6, in the compute unit, input X1 is multiplied by its associated weight W1 and then added with its associated bias B1, that is, X1 * W1+B1.
Broadcast
Broadcasting is the process of making arrays with different shapes have compatible shapes for arithmetic operations. TBE requires that the size of each dimension of the array to be broadcast should be 1 or the same as that of the target shape. TBE performs broadcasting on one-element dimensions only.
For example, the array with shape (2, 1, 64) can be broadcast to shape (2, 128, 64).
The computation APIs of TBE do not support automatic broadcasting. The two input tensors must be with the same shape. Therefore, you need to compute the target shape and broadcast the input tensors before arithmetic operations.
For example, before summing tensor A with shape (4, 3, 1, 5) and tensor B with shape (1, 1, 2, 1), you need to perform the following steps:
- Compute the target shape.Figure 2-7 Example 1
Use the larger ones of the corresponding dimensions as the target dimensions, that is, (4, 3, 2, 5).
- Call the broadcast API to broadcast Tensor A and Tensor B to the target shape, respectively.
- Call the computation API to sum Tensor A and Tensor B.
The size of each dimension of the tensor to be broadcast must be 1 or the same as that of the target shape. Otherwise, broadcasting fails. See the following example.
Reduction
Reduction is an operation that removes one or more dimensions from a tensor by performing certain operations across those dimensions. There are many operators for dimension reduction, such as Sum, Min, Max, All, and Mean in TensorFlow and the Reduction operator in Caffe. The following uses the Reduction operator in Caffe as an example.
- Attributes of the Reduction operator
- ReductionOp: operation type. Four operation types are supported.
Table 2-3 Operation types supported by the Reduction operator
Operator Type
Description
SUM
Computes the sum of elements across specified dimensions of a tensor.
ASUM
Computes the sum of absolute values of elements across specified dimensions of a tensor.
SUMSQ
Computes the sum of squares of elements across specified dimensions of a tensor.
MEAN
Computes the mean values of elements across specified dimensions of a tensor.
- axis: first dimension to reduce. The value range is [–N, N – 1].For example, for an input tensor with shape (5, 6, 7, 8):
- If axis = 3, the shape of the output tensor is (5, 6, 7).
- If axis = 2, the shape of the output tensor is (5, 6, 8).
- If axis = 1, the shape of the output tensor is (5, 7, 8).
- If axis = 0, the shape of the output tensor is (6, 7, 8).
- coeff: a scalar for the scaling factor. The value 1 indicates that the output is not scaled.
- ReductionOp: operation type. Four operation types are supported.
For example, to reduce a 2D matrix:
- If axis = 0, compute the sum of elements across the rows, obtaining [2, 2, 2]. That is, the 2D matrix is reduced to 1D.
- If axis = 1, compute the sum of elements across the columns.
- If axis = [0, 1], perform reduction with axis = 0 to obtain [2, 2, 2] before reduction with axis = 1 to obtain 6, resulting in 0D.
- If axis = [], reduction is not performed and the dimensions are retained.
- If axis is NULL, all dimensions are reduced, resulting in a 0D scalar.
Cast Operator
Cast operators convert tensor attributes such as the data type and format to facilitate the computation of upstream and downstream engines.
Since the graph has gone through a series of processings, such as offloading, optimization, and fusion, the tensor attributes in the nodes may have changed. In this case, cast operators are needed. During the network topology building, FE automatically inserts cast operators. You do not need to manually convert the format between upstream and downstream operators.