Pooling2d Compute API
pooling2d
Description
Samples signals in different sliding windows of tensor_in in different pooling modes.
The pooling mode can be MAX, AVG, GMP, or GAP.
- MAX: max pooling, used to compute the maximum of the elements covered by each sliding window
- AVG: average pooling, used to compute the average value for the sum of the elements covered by each sliding window
- GMP: global max pooling, a special mode of max pooling, that is, max pooling whose window size is the same as the feature map size.
- GAP: global avg pooling, a special mode of avg pooling, that is, avg pooling whose window size is the same as the feature map size.
When pooling_mode is set to MAX and padding_mode is set to SAME for tensor_in, the pooling result is as follows:
Specifically,
- input_w: width of tensor_in
- input_h: height of tensor_in
- kernel_w: width of the window
- kernel_h: height of the window
- pad_top: number of top padding rows in the H direction of tensor_in. The value is 1 in the figure.
- pad_bottom: number of bottom padding rows in the H direction of tensor_in. The value is 1 in the figure.
- pad_left: number of left padding columns in the W direction of tensor_in. The value is 1 in the figure.
- pad_right: number of right padding columns in the W direction of tensor_in. The value is 1 in the figure.
- stride_w: width of the stride
- stride_h: height of the stride
The API is defined in python/site-packages/te/lang/cce/te_compute/pooling2d_compute.py in the ATC installation path.
This API supports the basic pooling functions as well as the output quantization function. The quantization function is disabled by default. To enable the quantization function, set quantize_params based on the quantization algorithm requirements. For details, see the parameter description.
Restrictions
This API cannot be used in conjunction with other TBE DSL APIs.
- The supported input data type is float16.
- tensor_in is a 5D tensor of format NC1HWC0.
- The last dimension C0 of tensor_in must be 16.
- The dimension of window must be 2 and be a positive integer within the range of [1, 32768].
- The stride dimension must be 2 and be a positive integer. The width and height of the stride must be within the range of [1, 63].
- If pad is input, the dimension of pad must be 4. The value of pad should be greater than or equal to 0.
- The dilation dimension must be 2 and be a positive integer within the range of [1, 255].
- When pooling_mode is set to MAX or AVG in VALID mode, the following condition must be met:
out_w * window_h * window_w * C0 * SIZE_OF_FP16 + out_w * C0 * SIZE_OF_FP16 < ub_size
- When pooling_mode is set to AVG in SAME mode, the following condition must be met:
out_w * window_h * window_w * C0 * SIZE_OF_FP16 + out_w * C0 * SIZE_OF_FP16
+ out_w * C0 * SIZE_OF_FP16 < ub_size
- When pooling_mode is set to MAX or AVG, the following conditions must be met: stride_h ≤ 2 x window_h, and stride_w ≤ 2 x window_w
- When pooling_mode is set to MAX or AVG, the following condition must be met: Window width x Window height < 256
- When pooling_mode is set to MAX, the window width/height must be less than or equal to 20.
- When pooling_mode is set to MAX or AVG, the tensor_in, pad, and window must meet the following conditions:
stride_h <= in_size_h + pad_top + pad_bottom – window_h
stride_w <= in_size_w + pad_left + pad_right – window_w
- When pooling_mode is set to GAP or GMP, the following conditions must be met: window_h = in_size_h and window_w = in_size_w
- When pooling_mode is set to GAP or GMP, the following condition must be met: padding_mode = "VALID"
- ub_size indicates the available size of the unified buffer (UB).
- out_w indicates the width of the output tensor.
- window_h indicates the height of the window.
- window_w indicates the width of the window.
- C0 indicates C0 of tensor_in.
- SIZE_OF_FP16 indicates the size of float16 type.
Prototype
te.lang.cce.pooling2d(tensor_in, window, stride, pooling_mode, padding_mode="SAME", pad = (0,0,0,0), dilation = (1,1), data_mode=1, ceil_mode=0)
Parameters
- tensor_in: input feature map of the tvm.tensor type. A 5D tensor of format NC1HWC0.
- window: size of the input sliding window, list or tuple type. window[0] indicates the width of the input window, and window[1] indicates the height of the input window.
- stride: stride of the input sliding window, list or tuple type. stride[0] indicates the stride of the window in the W direction of the feature map, and stride[1] indicates the stride of the window in the H direction of the feature map.
- pooling_mode: pooling mode selected from MAX, AVG, GMP, and GAP.
- MAX: max pooling, used to compute the maximum of the elements covered by each sliding window
- AVG: average pooling, used to compute the average value for the sum of the elements covered by each sliding window
- GMP: global max pooling, which is a special mode of max pooling. The feature map size is the same as the window size. The maximum value of feature map elements is used as the computation output.
- GAP: global average pooling, which is a special mode of avg pooling. The feature map size is the same as the window size. The average value of feature map elements is used as the computation output.
- padding_mode: padding mode selected from VALID (padding disabled) and SAME (padding enabled).
- In VALID mode:
When the window movement in the W or H direction can cover only some parts of the feature map, the data that does not cover a complete window is discarded. That is, this data in the feature map is not involved in the computation.
MAX, AVG, GMP, and GAP all involve the VALID mode.
- In SAME mode:
When the window movement in the W or H direction can cover only some parts of the feature map, pad 0 to ensure that a complete window can be covered. That is, this data in the feature map is involved in the computation.
MAX and AVG involve the SAME mode, while GMP and GAP do not involve the SAME mode.
- In VALID mode:
- pad: a list or tuple for the padding sizes. An optional argument used to be compatible with Caffe pooling. pad[0], pad[1], pad[2], and pad[3] indicate the padding in the top, bottom, left, and right, respectively. The default values are (0, 0, 0, 0).
- dilation: a list or tuple for the dilation factors. An optional argument. dilation[0] and dilation[1] indicate the dilation factors of the window in the H and W directions, respectively. The default values are (1, 1).
- data_mode: template type. 0 = CAFFE_DATA_MODE; 1 = TENSORFLOW_DATA_MODE.
- ceil_mode: corresponding to round_mode in Caffe. 0: ceiling (default); 1: floor
Returns
res_tensor: a 5D tvm.tensor for the result tensor (NC1HWC0)
The shape of tensor_in is [N, C1, H, W, C0=16], the shape of the window is [F, F], and the stride is [S, S].
In VALID mode and SAME mode of MAX pooling and AVG pooling, the shape of the output tensor is computed as follows:
- In VALID mode:
- The N and C dimensions remain unchanged.
- The dimensions of Hout and Wout are as follows:
- In SAME mode:
- The N and C dimensions remain unchanged.
- The dimensions of Hout and Wout are as follows:
W is the input size, F is the filter size, S is the stride, and [] is the round-up symbol.
- The N and C dimensions remain unchanged.
- Hout = Wout = 1
Example
from te import tvm import te.lang.cce shape = (1, 2, 416, 416, 16) input_dtype = "float16" data = tvm.placeholder(shape, name="data", dtype=input_dtype) res = te.lang.cce.pooling2d(data, (3, 3), (2, 2), "AVG", "SAME") # res.shape = (1, 2, 208, 208, 16)