Custom Network Modification
Overview
Operators supported by the Ascend AI Processor are classified as follows:
- Standard operators: standard Caffe operators, such as Convolution.
- Extended operators: open-source but non-standard Caffe operators, including:
- Operators extended based on the Caffe framework, such as ROIPooling in Faster R-CNN and Normalize in SSD.
- Operators extended based on other deep learning frameworks, such as PassThrough in YOLOv2.
Networks such as Faster R-CNN and SSD include some operator structures not defined in the Caffe framework, such as ROIPooling, Normalize, PSROI Pooling, and Upsample. To support these networks, the Caffe operators are extended for the Ascend AI Processor to reduce developers' workload of operator customization and post-processing. If these extended operators are used in Caffe networks, you need to modify or add the definition of the extension layer in the .prototxt file prior to model conversion.
This chapter provides the rundown of the extended operators supported by the Ascend AI Processor and the instructions of modifying the .prototxt file.
Custom Operator List
Category |
Operator Type |
Description |
---|---|---|
Computation operators |
Reverses the dimensions of a tensor. |
|
Performs region-of-interest pooling in Faster R-CNN, which is mainly used for a target detection task. |
||
Performs position-sensitive region-of-interest pooling in R-FCN, which is mainly used for a target detection task. |
||
Performs upsampling using pooling mask Used in the YOLO network. |
||
Normalizes the input tensor along the channel dimension in an SSD network using an L2 norm. |
||
Rearranges blocks of spatial data into depth, or vice versa, in Darknet. Implemented as a PassThrough operator as defined in the operator specifications. |
||
Filters bounding boxes (BBoxes) and outputs only those with the highest prediction confidence based on the foreground output of rpn_cls_prob and BBox regression output of rpn_bbox_pred in Faster R-CNN. |
||
A regional feature aggregation method that solves the problem of misalignment caused by two quantifications in ROIPooling operation. |
||
Permutes data in the channel dimension of the input. |
||
Yolo (Yolo/Detection/Region) |
Replacement to Yolo, Detection, and Region operators to generate coordinates, confidence scores, and category probability of the anchor boxes on the feature map output by the convolutional network. |
|
Generates prior boxes based on the input parameters in the SSD network. |
||
Performs affine transformation. |
||
Post-processing operators |
YoloV3DetectionOutput |
Generates coordinates, confidence scores, and category probability of the anchor boxes on the feature map output by the convolutional network for the post-processing of YOLOv3. |
YoloV2DetectionOutput |
Generates coordinates, confidence scores, and category probability of the anchor boxes on the feature map output by the convolutional network for the post-processing of YOLOv2. |
|
SSDDetectionOutput |
Integrates the BBoxes, BBox offsets, and scores, and outputs object predictions of SSD. |
|
FSRDetectionOutput |
Classifies the results, and outputs the final number, coordinates, category probability, and category indexes of BBoxes of Faster R-CNN. |
Custom Operator Description
Custom operators can be implemented in deep learning frameworks including Caffe, TensorFlow, and MindSpore. The first type of operators is customized from the Caffe framework, such as the ROIPooling, Normalize, PSROIPolling, and Upsample layers. In other words, these operators have been defined in Caffe's prototxt files. For operators customized from frameworks other than Caffe, it is also necessary to give corresponding definitions in prototxt format.
Reverse
Reverses the dimensions of a tensor. For example, reverse from [1, 2, 3] to [3, 2, 1].
Define the operator as follows:
- Add ReverseParameter to LayerParameter.
message LayerParameter { ... optional ReverseParameter reverse_param = 157; ... }
- Define the data types and attributes of ReverseParameter.
message ReverseParameter{ repeated int32 axis = 1; }
ROIPooling
The major hurdle for going from image classification to object detection is fixed size input requirement to the network because of the existing fully connected (FC) layers. In object detection, different proposals have different shapes. Therefore, it is necessary to convert all the proposals to a fixed shape as required by FC layers.
Region of Interest pooling (ROIPooling) is used for utilizing a single feature map for all the generated proposals in a single pass. ROIPooling solves the problem of fixed image size requirement for object detection network.
You need to extend the caffe.proto file and define ROIPoolingParameter as follows:
- spatial_scale: multiplicative spatial scale factor to translate ROI coordinates from their input scale to the scale used when pooling
- pooled_h and pooled_w: height and width of the ROI output feature map
- Add ROIPoolingParameter to LayerParameter.
message LayerParameter { ... optional ROIPoolingParameter roi_pooling_param = 161; ... }
- Define the data types and attributes of ROIPoolingParameter.
message ROIPoolingParameter { required int32 pooled_h = 1; required int32 pooled_w = 2; optional float spatial_scale = 3 [default=0.0625]; optional float spatial_scale_h = 4; optional float spatial_scale_w = 5; }
Example .prototxt definition of ROIPooling:
layer { name: "roi_pooling" type: "ROIPooling" bottom: "res4f" bottom: "rois" bottom: "actual_rois_num" top: "roi_pool" roi_pooling_param { pooled_h: 14 pooled_w: 14 spatial_scale:0.0625 spatial_scale_h:0.0625 spatial_scale_w:0.0625 } }
PSROIPooling
Position Sensitive ROI Pooling (PSROIPooling) works in similar way to ROIPooling. However, unlike ROIPooling, the feature map output from PSROIPooling is obtained from different feature map channels, and average pooling (instead of max-pooling) is performed on each divided bin.
PSROIPooling divides the ROI into k * k bins and outputs a k * k feature map. The number of output channels for pooling is the same as the number of input channels.
You need to extend the caffe.proto file and define PSROIPoolingParameter as follows:
- spatial_scale: multiplicative spatial scale factor to translate ROI coordinates from their input scale to the scale used when pooling
- output_dim: number of output channels
- group_size: number of groups to encode position-sensitive score maps, that is, k
- Add PSROIPoolingParameter to LayerParameter.
message LayerParameter { ... optional PSROIPoolingParameter psroi_pooling_param = 207; ... }
- Define the data types and attributes of PSROIPoolingParameter.
message PSROIPoolingParameter { required float spatial_scale = 1; required int32 output_dim = 2; // output channel number required int32 group_size = 3; // number of groups to encode position-sensitive score maps }
Example .prototxt definition of PSROIPooling:
layer { name: "psroipooling" type: "PSROIPooling" bottom: "some_input" bottom: "some_input" top: "some_output" psroi_pooling_param { spatial_scale: 0.0625 output_dim: 21 group_size: 7 } }
Upsample
The Upsample layer is the reverse of the Pooling layer. Each decoder upsamples the activations generated by the corresponding encoder.
You need to extend the caffe.proto file and define UpsampleParameter as follows: The stride parameter is the upsampling factor, for example, 2.
- Add UpsampleParameter to LayerParameter.
message LayerParameter { ... optional UpsampleParameter upsample_param = 160; ... }
- Define the data types and attributes of UpsampleParameter.
message UpsampleParameter{ optional float scale = 1[default = 1]; optional int32 stride = 2[default = 2]; optional int32 stride_h = 3[default = 2]; optional int32 stride_w = 4[default=2]; }
Example .prototxt definition of Upsample:
layer { name: "layer86-upsample" type: "Upsample" bottom: "some_input" top: "some_output" upsample_param { scale: 1 stride: 2 } }
Normallize
The Normalize layer is a normalization layer in the SSD network, and is mainly used to normalize elements in a space or a channel to the range [0, 1. The Normalize layer is to output a tensor of a same size for a c*h*w three-dimensional tensor. In the formula, Normalize is calculated based on the square root of the sum of squares in the channel direction for each element. The formula is as follows:
where, the cumulative vector of the square sum in the denominator part is the sum of the channel vectors that share the same height and width, as the orange part shown in Figure 2-6.
After the preceding normalization calculation, the Normalize layer scales each feature map using separate scale factors.
You need to extend the caffe.proto file and define NormalizeParameter as follows:
- across_spatial: a bool. If True, normalizes every channel to 1 x c x h x w. If False, normalizes every pixel to 1 x c x 1 x 1.
- channels_shared: a bool. If True, the scale parameters are shared across channels. Defaults to True.
- eps: a small number to avoid division by zero while normalizing.
The mathematical formulation of Normalize is as follows:
Define the operator as follows:
- Add NormalizeParameter to LayerParameter.
message LayerParameter { ... optional NormalizeParameter norm_param = 206; ... }
- Define the data types and attributes of NormalizeParameter.
message NormalizeParameter { optional bool across_spatial = 1 [default = true]; // Initial value of scale. Default is 1.0 for all optional FillerParameter scale_filler = 2; // Whether or not scale parameters are shared across channels. optional bool channel_shared = 3 [default = true]; // Epsilon for not dividing by zero while normalizing variance optional float eps = 4 [default = 1e-10]; }
Example .prototxt definition of Normalize:
layer { name: "normalize_layer" type: "Normalize" bottom: ""some_input" top: "some_output" norm_param { across_spatial: false scale_filler { type: "constant" value: 20 } channel_shared: false } }
Reorg
The Reorg operator is implemented as a PassThrough operator in Ascend AI Processor, which rearranges blocks of spatial data into depth, or vice versa.
The PassThrough layer is not implemented using the Caffe framework. Therefore, there is no standard definition for this layer. The PassThrough layer concatenates the higher resolution features with lower one by stacking adjacent features into different channels instead of spatial locations.
Define the operator as follows:
- Add ReorgParameter to LayerParameter.
message LayerParameter { ... optional ReorgParameter reorg_param = 155; ... }
- Define the data types and attributes of ReorgParameter.
message ReorgParameter{ optional uint32 stride = 2 [default = 2]; optional bool reverse = 1 [default = false]; }
Example of Reorg .prototxt definition:
layer { bottom: "some_input" top: "some_output" name: "reorg" type: "Reorg" reorg_param { stride: 2 } }
Proposal
The proposal operator modifies anchors based on foreground of rpn_cls_prob and the BBox regression of rpn_bbox_pred to obtain accurate proposals.
Three operators are used: decoded_bbox, topk, and nms, as shown in Figure 2-7.
Define the operator as follows:
- Add ProposalParameter to LayerParameter.
message LayerParameter { ... optional ProposalParameter proposal_param = 201; ... }
- Define the data types and attributes of ReorgParameter.
message ProposalParameter { optional float feat_stride = 1 [default = 16]; optional float base_size = 2 [default = 16]; optional float min_size = 3 [default = 16]; repeated float ratio = 4; repeated float scale = 5; optional int32 pre_nms_topn = 6 [default = 3000]; optional int32 post_nms_topn = 7 [default = 304]; optional float iou_threshold = 8 [default = 0.7]; optional bool output_actual_rois_num = 9 [default = false]; }
Example .prototxt definition of Proposal:
layer { name: "faster_rcnn_proposal" type: "Proposal" //Operator type bottom: "rpn_cls_prob_reshape" bottom: "rpn_bbox_pred" bottom: "im_info" top: "rois" top: "actual_rois_num" // Added operator output proposal_param { feat_stride: 16 base_size: 16 min_size: 16 pre_nms_topn: 3000 post_nms_topn: 304 iou_threshold: 0.7 output_actual_rois_num: true } }
ROIAlign
ROIAlign is a regional feature aggregation method proposed by Mask-RCNN, which solves the problem of misalignment caused by two quantifications in ROIPooling operation.
The size of the feature map after pooling is pooled_w * pooled_h. Each ROI is divided into sampling_ratio * sampling_ratio grids of the same size. The grid points are the sampling points. As shown in Figure 2-8, the dashed line indicates the feature map, and the solid line indicates the ROI, which is divided into 2 x 2 cells. Assuming that the number of sampling points is 4, it means that four grids are equally divided, each of which takes its center point position. The pixel of the center point position (denoted by four arrows in Figure 2-8) is calculated by bilinear interpolation method. Finally, average the four pixel values as the ROIAlign result.
Define the operator as follows:
- Add ROIAlignParameter to LayerParameter.
message LayerParameter { ... optional ROIAlignParameter roi_align_param = 154; ... }
- Define the data types and attributes of ROIAlignParameter.
message ROIAlignParameter { // Pad, kernel size, and stride are all given as a single value for equal // dimensions in height and width or as Y, X pairs. optional uint32 pooled_h = 1 [default = 0]; // The pooled output height optional uint32 pooled_w = 2 [default = 0]; // The pooled output width // Multiplicative spatial scale factor to translate ROI coords from their // input scale to the scale used when pooling optional float spatial_scale = 3 [default = 1]; optional int32 sampling_ratio = 4 [default = -1]; optional int32 roi_end_mode = 5 [default = 0]; }
You can customize the .prototxt file based on the preceding data types and attributes.
ShuffleChannel
ShuffleChannel permutes data in the channel dimension of the input.
For example, if channel = 4 and group = 2, ShuffleChannel transposes channel[1] and channel[2].
Define the operator as follows:
- Add ShuffleChannelParameter to LayerParameter.
message LayerParameter { ... optional ShuffleChannelParameter shuffle_channel_param = 159; ... }
- Define the data types and attributes of ShuffleChannelParameter.
message ShuffleChannelParameter{ optional uint32 group = 1[default = 1]; // The number of group }
Example .prototxt definition of ShuffleChannel:
layer { name: "layer_shuffle" type: "ShuffleChannel" bottom: "some_input" top: "some_output" shuffle_channel_param { group: 3 } }
Yolo
The YOLO operator is introduced to the YOLOv2 network and is applied only on the YOLOv2 and YOLOv3 networks. It performs sigmoid and softmax operations on input.
- In YOLOv2, there are four scenarios based on the background and softmax parameters:
- background = false, softmax = true:
sigmoid is performed on (x, y) in (x, y, h, w), sigmoid is performed on b, and softmax is performed on classes.
- background = false, softmax = false:
sigmoid is performed on (x, y) in (x, y, h, w), sigmoid is performed on b, and sigmoid is performed on classes.
- background = true, softmax = false:
sigmoid is performed on (x, y) in (x, y, h, w), b is ignored, and sigmoid is performed on classes.
- background = true, softmax = true:
sigmoid is performed on (x, y) in (x, y, h, w), and softmax is performed on b and classes .
- background = false, softmax = true:
- In YOLOv3, there is only one scenario: sigmoid is performed on (x,y) in (x,y,h,w), sigmoid is performed on b, and sigmoid is performed on classes.
The input data format is Tensor(n, coords+backgroup+classes,l.h,l.w), where n indicates the number of anchor boxes and corrds indicates x, y, w, and h.
Define the operator as follows:
- Add YoloParameter to LayerParameter.
message LayerParameter { ... optional YoloParameter yolo_param = 199; ... }
- Define the data types and attributes of YoloParameter.
message YoloParameter { optional int32 boxes = 1 [default = 3]; optional int32 coords = 2 [default = 4]; optional int32 classes = 3 [default = 80]; optional string yolo_version = 4 [default = "V3"]; optional bool softmax = 5 [default = false]; optional bool background = 6 [default = false]; optional bool softmaxtree = 7 [default = false]; }
Example .prototxt definition of Yolo:
layer { bottom: "layer82-conv" top: "yolo1_coords" top: "yolo1_obj" top: "yolo1_classes" name: "yolo1" type: "Yolo" yolo_param { boxes: 3 coords: 4 classes: 80 yolo_version: "V3" softmax: true background: false } }
PriorBox
The prior box is generated according to the arguments.
The following uses conv7_2_mbox_priorbox as an example. The definition is as follows:
layer{ name:"conv7_2_mbox_priorbox" type:"PriorBox" bottom:"conv7_2" bottom:"data" top:"conv7_2_mbox_priorbox" prior_box_param{ min_size:162.0 max_size:213.0 aspect_ratio:2 aspect_ratio:3 flip:true clip:false variance:0.1 variance:0.1 variance:0.2 variance:0.2 img_size:300 step:64 offset:0.5 } }
- A prior box is generated when the width and height are both min_size.
- If max_size is available, sqrt(min_size x max_size) is used to determine the width and height of generated boxes (max_size > min_size).
- The prior box is generated based on the aspect ratios (1/2 and 1/3 according to the definition).
Therefore, num_priors_ = min_sizes + aspect_ratios * min_size + max_size
Define the operator as follows:
- Add PriorBoxParameter to LayerParameter.
message LayerParameter { ... optional PriorBoxParameter prior_box_param = 203; ... }
- Define the data types and attributes of PriorBoxParameter.
message PriorBoxParameter { // Encode/decode type. enum CodeType { CORNER = 1; CENTER_SIZE = 2; CORNER_SIZE = 3; } // Minimum box size (in pixels). Required! repeated float min_size = 1; // Maximum box size (in pixels). Required! repeated float max_size = 2; // Various of aspect ratios. Duplicate ratios will be ignored. // If none is provided, we use default ratio 1. repeated float aspect_ratio = 3; // If true, will flip each aspect ratio. // For example, if there is aspect ratio "r", // we will generate aspect ratio "1.0/r" as well. optional bool flip = 4 [default = true]; // If true, will clip the prior so that it is within [0, 1] optional bool clip = 5 [default = false]; // Variance for adjusting the prior bboxes. repeated float variance = 6; // By default, we calculate img_height, img_width, step_x, step_y based on // bottom[0] (feat) and bottom[1] (img). Unless these values are explicitely // provided. // Explicitly provide the img_size. optional uint32 img_size = 7; // Either img_size or img_h/img_w should be specified; not both. optional uint32 img_h = 8; optional uint32 img_w = 9; // Explicitly provide the step size. optional float step = 10; // Either step or step_h/step_w should be specified; not both. optional float step_h = 11; optional float step_w = 12; // Offset to the top left corner of each cell. optional float offset = 13 [default = 0.5]; }
Example .prototxt definition of PriorBox:
layer { name: "layer_priorbox" type: "PriorBox" bottom: "some_input" bottom: "some_input" top: "some_output" prior_box_param { min_size: 30.0 max_size: 60.0 aspect_ratio: 2 flip: true clip: false variance: 0.1 variance: 0.1 variance: 0.2 variance: 0.2 step: 8 offset: 0.5 } }
SpatialTransformer
This operator performs affine transformation in the computation process. If you need only one set of parameters of affine transformation, define them in the .prototxt file and use them for multiple batches. Alternatively, use dynamic parameters as the second input of the operator layer. In this way, parameters for each batch are different.
The procedure is as follows:
- Convert the output coordinates into values in the range of [–1,1] by using the following formulas:
The corresponding code is as follows:
Dtype* data = output_grid.mutable_cpu_data(); for(int i=0; i< output_H_ * output_W_; ++i) { data[3 * i] = (i / output_W_) * 1.0 / output_H_ * 2 - 1; data[3 * i + 1] = (i % output_W_) * 1.0 / output_W_ * 2 - 1; data[3 * i + 2] = 1; }
- Perform affine transformation to convert the output coordinates into input coordinates. In the following formula, s indicates input coordinates, and t indicates output coordinates.
The corresponding code is as follows:
caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasTrans, output_H_ * output_W_, 2, 3, (Dtype)1., output_grid_data, full_theta_data + 6 * i, (Dtype)0., coordinates);
- Obtain the value of a specific position based on the input coordinates and assign the value to the corresponding output position.The output coordinates are converted in step 1. Therefore, you need to convert the input coordinates in the same way. The following is a code example:
Dtype x = (px + 1) / 2 * H; Dtype y = (py + 1) / 2 * W; if(debug) std::cout<<prefix<<"(x, y) = ("<<x<<", "<<y<<")"<<std::endl; for(int m = floor(x); m <= ceil(x); ++m) for(int n = floor(y); n <= ceil(y); ++n) { if(debug) std::cout<<prefix<<"(m, n) = ("<<m<<", "<<n<<")"<<std::endl; if(m >= 0 && m < H && n >= 0 && n < W) { res += (1 - abs(x - m)) * (1 - abs(y - n) * pic[m * W + n]); if(debug) std::cout<<prefix<<" pic[m * W + n]= "<<std::endl; } }
Define the operator as follows:
- Add SpatialTransformParameter to LayerParameter.
message LayerParameter { ... optional SpatialTransformParameter spatial_transform_param = 153; ... }
- Define the SpatialTransformParameter class and attribute parameters.
message SpatialTransformParameter { optional uint32 output_h = 1 [default = 0]; optional uint32 output_w = 2 [default = 0]; optional float border_value = 3 [default = 0]; repeated float affine_transform = 4; enum Engine { DEFAULT = 0; CAFFE = 1; CUDNN = 2; } optional Engine engine = 15 [default = DEFAULT]; }
Example .prototxt definition of SpatialTransform:
layer { name: "st_1" type: "SpatialTransformer" bottom: "data" bottom: "theta" top: "transformed" st_param { to_compute_dU: false theta_1_1: -0.129 theta_1_2: 0.626 theta_2_1: 0.344 theta_2_2: 0.157 } }
Sample Reference
This section provides instructions for modifying frequently-used networks.
Modifying Faster R-CNN Prototxt
All code samples in this section cannot be directly copied to the network model. You need to adjust the parameters to suit your use case. For example, the bottom and top parameters must match to those in the corresponding network model, and the sequence of the bottom and top parameters is fixed.
The following uses the Faster R-CNN ResNet-34 model as an example.
- Modify the Proposal operator.According to Operator Specifications (Caffe), the operator has three inputs and two outputs. Modify the type argument to that defined in the caffe.proto file and add the actual_rois_num output node. Add attribute description by referring to the attribute definition in the caffe.proto file. Figure 2-9 shows the .prototxt file before and after modification for adapting to Ascend AI Processor.A code example is as follows.
layer { name: "faster_rcnn_proposal" type: "Proposal" //Operator type bottom: "rpn_cls_prob_reshape" bottom: "rpn_bbox_pred" bottom: "im_info" top: "rois" top: "actual_rois_num" // Added operator output proposal_param { feat_stride: 16 base_size: 16 min_size: 16 pre_nms_topn: 3000 post_nms_topn: 304 iou_threshold: 0.7 output_actual_rois_num: true } }
For details about parameter descriptions, see Operator Specifications (Caffe).
- Add a FSRDetectionOutput operator to the last layer to output the final detection result.
On the Faster R-CNN network, add a post-processing layer FSRDetectionOutput at the end of the original .prototxt file by referring to Custom Operator List. The FSRDetectionOutput operator has five inputs and two outputs as described in Operator Specifications (Caffe). Define the data types and the attributes of the operator accordingly.
A code example is as follows.
layer { name: "FSRDetectionOutput_1" type: "FSRDetectionOutput" bottom: "rois" bottom: "bbox_pred" bottom: "cls_prob" bottom: "im_info" bottom: "actual_rois_num" top: "actual_bbox_num1" top: "box1" fsrdetectionoutput_param { num_classes:3 score_threshold:0.0 iou_threshold:0.7 batch_rois:1 } }
For details about parameter descriptions, see Operator Specifications (Caffe).
Modifying YOLOv3 Prototxt
All code samples in this section cannot be directly copied to the network model. You need to adjust the parameters to suit your use case. For example, the bottom and top parameters must match to those in the corresponding network model, and the sequence of the bottom and top parameters is fixed.
- Modify the upsample_param attribute of the Upsample operator.
Change scale:2 in the .prototxt file of the original operator to scale:1 stride:2 by referring to Operator Specifications (Caffe).
Figure 2-10 shows the .prototxt file before and after modification for adapting to Ascend AI Processor.
For details about parameter descriptions, see Operator Specifications (Caffe).
- Add three Yolo operators.
The Yolo and DetectionOutput operators complete the post-processing logic of the feature detection network. According to the original operator .prototxt file, three Yolo operators should be added before adding the YoloV3DetectionOutput operator.
According to Operator Specifications (Caffe), a Yolo operator has one input and three outputs. The code examples of the Yolo operators are provided.- Code example of operator 1
layer { bottom: "layer82-conv" top: "yolo1_coords" top: "yolo1_obj" top: "yolo1_classes" name: "yolo1" type: "Yolo" yolo_param { boxes: 3 coords: 4 classes: 80 yolo_version: "V3" softmax: true background: false } }
- Code example of operator 2
layer { bottom: "layer94-conv" top: "yolo2_coords" top: "yolo2_obj" top: "yolo2_classes" name: "yolo2" type: "Yolo" yolo_param { boxes: 3 coords: 4 classes: 80 yolo_version: "V3" softmax: true background: false } }
- Code example of operator 3
layer { bottom: "layer106-conv" top: "yolo3_coords" top: "yolo3_obj" top: "yolo3_classes" name: "yolo3" type: "Yolo" yolo_param { boxes: 3 coords: 4 classes: 80 yolo_version: "V3" softmax: true background: false } }
For details about parameter descriptions, see Operator Specifications (Caffe).
- Code example of operator 1
- Add a YoloV3DetectionOutput operator to the last layer.
On the YOLOv3 network, add a post-processing layer YoloV3DetectionOutput to the end of the original .prototxt file by referring to Custom Operator List. The YoloV3DetectionOutput operator has 10 inputs and two outputs as described in Operator Specifications (Caffe).
layer { name: "detection_out3" type: "YoloV3DetectionOutput" bottom: "yolo1_coords" bottom: "yolo2_coords" bottom: "yolo3_coords" bottom: "yolo1_obj" bottom: "yolo2_obj" bottom: "yolo3_obj" bottom: "yolo1_classes" bottom: "yolo2_classes" bottom: "yolo3_classes" bottom: "img_info" top: "box_out" top: "box_out_num" yolov3_detection_output_param { boxes: 3 classes: 80 relative: true obj_threshold: 0.5 score_threshold: 0.5 iou_threshold: 0.45 pre_nms_topn: 512 post_nms_topn: 1024 biases_high: 10 biases_high: 13 biases_high: 16 biases_high: 30 biases_high: 33 biases_high: 23 biases_mid: 30 biases_mid: 61 biases_mid: 62 biases_mid: 45 biases_mid: 59 biases_mid: 119 biases_low: 116 biases_low: 90 biases_low: 156 biases_low: 198 biases_low: 373 biases_low: 326 } }
For details about parameter descriptions, see Operator Specifications (Caffe).
- Add the input.
The YoloV3DetectionOutput operator has the img_info input. Add img_info to model inputs. Figure 2-11 shows the .prototxt file before and after modification for adapting to Ascend AI Processor.
The following is a code example. img_info has shape [batch, 4], where 4 is formatted [netH, netW, scaleH, scaleW]. netH and netW are H and W of the network model input, and scaleH and scaleW are H and W of the original image.
input: "img_info" input_shape { dim: 1 dim: 4 }
Modifying YOLOv2 Prototxt
All code samples in this section cannot be directly copied to the network model. You need to adjust the parameters to suit your use case. For example, the bottom and top parameters must match to those in the corresponding network model, and the sequence of the bottom and top parameters is fixed.
- Modify the Region operator.
The Yolo and DetectionOutput operators complete the post-processing logic of the feature detection network. Before adding the YoloV2DetectionOutput operator, replace the Region operator with a Yolo operator.
A Yolo operator has one input and three outputs according to Operator Specifications (Caffe). Figure 2-12 shows the .prototxt file before and after modification for adapting to the Ascend AI Processor.
A code example is as follows.
layer { bottom: "layer31-conv" top: "yolo_coords" top: "yolo_obj" top: "yolo_classes" name: "yolo" type: "Yolo" yolo_param { boxes: 5 coords: 4 classes: 80 yolo_version: "V2" softmax: true background: false } }
For details about parameter descriptions, see Operator Specifications (Caffe).
- Add a YoloV2DetectionOutput operator to the last layer.
On the YOLOv2 network, add a post-processing layer YoloV2DetectionOutput to the end of the original .prototxt file by referring to Custom Operator List. The YoloV2DetectionOutput operator has four inputs and two outputs as described in Operator Specifications (Caffe).
layer { name: "detection_out2" type: "YoloV2DetectionOutput" bottom: "yolo_coords" bottom: "yolo_obj" bottom: "yolo_classes" bottom: "img_info" top: "box_out" top: "box_out_num" yolov2_detection_output_param { boxes: 5 classes: 80 relative: true obj_threshold: 0.5 score_threshold: 0.5 iou_threshold: 0.45 pre_nms_topn: 512 post_nms_topn: 1024 biases: 0.572730 biases: 0.677385 biases: 1.874460 biases: 2.062530 biases: 3.338430 biases: 5.474340 biases: 7.882820 biases: 3.527780 biases: 9.770520 biases: 9.168280 } }
For details about parameter descriptions, see Operator Specifications (Caffe).
- Add the input.
The YoloV2DetectionOutput operator has the img_info input. Add img_info to model inputs. Figure 2-13 shows the .prototxt file before and after modification for adapting to Ascend AI Processor.
The following is a code example. img_info has shape [batch, 4], where 4 is formatted [netH, netW, scaleH, scaleW]. netH and netW are H and W of the network model input, and scaleH and scaleW are H and W of the original image.
input: "img_info" input_shape { dim: 1 dim: 4 }
Modifying SSD Prototxt
All code samples in this section cannot be directly copied to the network model. You need to adjust the parameters to suit your use case. For example, the bottom and top parameters must match to those in the corresponding network model, and the sequence of the bottom and top parameters is fixed.
On the SSD network, add a post-processing layer SSDDetectionOutput at the end of the original .prototxt file by referring to Custom Operator List.
For details, see the caffe.proto file (file path: ${install_path}/atc/include/proto). Add the declaration of the custom layer parameters to message LayerParameter.
message LayerParameter { ... optional SSDDetectionOutputParameter ssddetectionoutput_param = 232; ... }
According to the caffe.proto file, the operator type and attributes are defined as follows.
message SSDDetectionOutputParameter { optional int32 num_classes= 1 [default = 2]; optional bool share_location = 2 [default = true]; optional int32 background_label_id = 3 [default = 0]; optional float iou_threshold = 4 [default = 0.45]; optional int32 top_k = 5 [default = 400]; optional float eta = 6 [default = 1.0]; optional bool variance_encoded_in_target = 7 [default = false]; optional int32 code_type = 8 [default = 2]; optional int32 keep_top_k = 9 [default = 200]; optional float confidence_threshold = 10 [default = 0.01]; }
As described in Operator Specifications (Caffe), the SSDDetectionOutput operator has three inputs and two outputs. A code example is provided as follows.
layer { name: "detection_out" type: "SSDDetectionOutput" bottom: "bbox_delta" bottom: "score" bottom: "anchors" top: "out_boxnum" top: "y" ssddetectionoutput_param { num_classes: 2 share_location: true background_label_id: 0 iou_threshold: 0.45 top_k: 400 eta: 1.0 variance_encoded_in_target: false code_type: 2 keep_top_k: 200 confidence_threshold: 0.01 } }
- In the bottom input, bbox_delta corresponds to mbox_loc in the original Caffe network, score corresponds to mbox_conf_flatten in the original Caffe network, and anchors corresponds to mbox_priorbox in the original Caffe network. The value of num_classes must be the same as that in the original network.
- In the scenario where the top output has a batch size greater than 1:
- The output shape of out_boxnum is (batchnum, 8). The first element of batchnum is the number of actual boxes.
- The output shape of y is (batchnum,len,8), where len is the value of keep_top_k after 128-byte alignment. For example, if batch = 2 and keep_top_k = 200, the output shape is (2,256,8), the first 256 x 8 data elements is the result of the first batch.
For details about parameter descriptions, see Operator Specifications (Caffe).
Modifying BatchedMatMul Prototxt
All code samples in this section cannot be directly copied to the network model. You need to adjust the parameters to suit your use case. For example, the bottom and top parameters must match to those in the corresponding network model, and the sequence of the bottom and top parameters is fixed.
The BatchedMatMul operator multiply the two tensors: y = x1 x x2. (The number of x1 and x2 dimensions must be greater than 2 and less than or equal to 8.) To use this operator in a network model, you need to modify its .prototxt file by referring to this section and then convert the model.
For details, see the caffe.proto file (file path: ${install_path}/atc/include/proto). Add the declaration of the custom layer parameters to message LayerParameter.
message LayerParameter { ... optional BatchMatMulParameter batch_matmul_param = 235; ... }
According to the caffe.proto file, the operator type and attributes are defined as follows.
message BatchMatMulParameter{ optional bool adj_x1 = 1 [default = false]; optional bool adj_x2 = 2 [default = false]; }
According to Operator Specifications (Caffe), the BatchedMatMul operator has two inputs and one output. An example of the constructed operator code is as follows.
layer { name: "batchmatmul" type: "BatchedMatMul" bottom: "matmul_data_1" bottom: "matmul_data_2" top: "batchmatmul_1" batch_matmul_param { adj_x1:false adj_x2:true }
For details about parameter descriptions, see Operator Specifications (Caffe).
Modifying SENet Prototxt
All code samples in this section cannot be directly copied to the network model. You need to adjust the parameters to suit your use case. For example, the bottom and top parameters must match to those in the corresponding network model, and the sequence of the bottom and top parameters is fixed.
The Axpy operator in the network model needs to be modified to the Reshape, Scale, and Eltwise operators. The following figure shows the modification.
An example of the modified code is as follows.
layer { name: "conv3_1_axpy_reshape" type: "Reshape" bottom: "conv3_1_1x1_up" top: "conv3_1_axpy_reshape" reshape_param { shape { dim: 0 dim: -1 } } } layer { name: "conv3_1_axpy_scale" type: "Scale" bottom: "conv3_1_1x1_increase" bottom: "conv3_1_axpy_reshape" top: "conv3_1_axpy_scale" scale_param { axis: 0 bias_term: false } } layer { name: "conv3_1_axpy_eltwise" type: "Eltwise" bottom: "conv3_1_axpy_scale" bottom: "conv3_1_1x1_proj" top: "conv3_1" }
For details about parameter descriptions, see Operator Specifications (Caffe).