Quantization Configuration
This section describes the quantization configuration file of image classification networks.
Overview
If the inference accuracy of the config.json quantization configuration file generated by the create_quant_config call does not meet the requirements, you need to tune the config.json file until the accuracy is as expected. The following is an example of the file content.
{ "version": 1, "batch_num": 30, "activation_offset": true, "do_fusion":true, "skip_fusion_layers":[], "layer_name1":{ "quant_enable": false, "activation_quant_params":[ { "max_percentile":0.999999, "min_percentile":0.999999, "search_range":[0.7, 1.3], "search_step":0.01 } ], "weight_quant_params":[ { "channel_wise":true } ] }, "layer_name2":{ "quant_enable": true, "activation_quant_params":[ { "max_percentile":0.999999, "min_percentile":0.999999, "search_range":[0.7, 1.3], "search_step":0.01 } ], "weight_quant_params":[ { "channel_wise":true } ] } }
Configuration File Options
The following describes the configuration options available in the configuration file.
Function |
Version number of the quantization configuration file |
---|---|
Type |
int |
Value Range |
1 |
Description |
Currently, only version 1 is available. |
Recommended Value |
1 |
Required/Optional |
Optional |
Function |
Batch count for quantization |
---|---|
Type |
int |
Value Range |
At least 0 |
Description |
If this option is not set, the default value 1 is used. It is recommended that the number of images in the calibration dataset be less than or equal to 50. The value of batch_num is calculated based on the value of batch_size. batch_num x batch_size equals the number of images in the calibration dataset used for quantization. batch_size indicates the number of images per batch. |
Recommended Value |
1 |
Required/Optional |
Optional |
Function |
Whether to quantize data with offset |
---|---|
Type |
bool |
Value Range |
true or false |
Description |
If set to true, data is quantized with offset. If set to false, the data quantization does not contain offset. |
Recommended Value |
true |
Required/Optional |
Optional |
Function |
BN fusion switch |
---|---|
Type |
bool |
Value Range |
true or false |
Description |
If set to true, BN fusion is enabled. If set to false, BN fusion is disabled. |
Recommended Value |
true |
Required/Optional |
Optional |
Function |
Layers to skip in BN fusion |
---|---|
Type |
string |
Value Range |
Currently, only Conv+BN fusion, Depthwise_Conv+BN fusion, and Group_conv+BN fusion are supported. |
Description |
Layers to skip in BN fusion |
Recommended Value |
- |
Required/Optional |
Optional |
Function |
Quantization configuration of a layer |
---|---|
Type |
object |
Value Range |
None |
Description |
Includes the following parameters:
|
Recommended Value |
None |
Required/Optional |
Optional |
Function |
Quantization enable by layer |
---|---|
Type |
bool |
Value Range |
true or false |
Description |
If set to true, the layer is to be quantized. If set to false, otherwise. |
Recommended Value |
true |
Required/Optional |
Optional |
Function |
Data quantization parameters of a layer |
---|---|
Type |
object |
Value Range |
None |
Description |
Includes the following parameters:
|
Recommended Value |
None |
Required/Optional |
Optional |
Function |
Weight quantization parameters of a layer |
---|---|
Type |
object |
Value Range |
None |
Description |
Includes the following parameter: channel_wise |
Recommended Value |
None |
Required/Optional |
Optional |
Function |
Upper search limit |
---|---|
Type |
float |
Value Range |
(0.5, 1] |
Description |
Indicates the maximum number to be considered as the search result among a group of numbers in descending order. For example, if there are 100 numbers, the value 1.0 indicates that number 0 (100 - 100 x 1.0) is considered as the maximum, that is, the largest number. A larger value indicates that the upper clip limit is closer to the maximum value of the data to be quantized. |
Recommended Value |
0.999999 |
Required/Optional |
Optional |
Function |
Lower search limit |
---|---|
Type |
float |
Value Range |
(0.5, 1] |
Description |
Indicates the minimum number to be considered as the search result among a group of numbers in ascending order. For example, if there are 100 numbers, the value 1.0 indicates that number 0 (100 - 100 x 1.0) is considered as the minimum, that is, the smallest number. A larger value indicates that the lower clip limit is closer to the minimum value of the data to be quantized. |
Recommended Value |
0.999999 |
Required/Optional |
Optional |
Function |
Quantization factor search range: [search_range_start, search_range_end] |
---|---|
Type |
A list of two floats |
Value Range |
0<search_range_start<search_range_end |
Description |
Controls the quantization factor search range:
|
Recommended Value |
[0.7, 1.3] |
Required/Optional |
Optional |
Function |
Quantization factor search step |
---|---|
Type |
float |
Value Range |
(0, (maxval – minval)] |
Description |
Controls the quantization factor search step. A smaller value indicates a smaller step. |
Recommended Value |
0.01 |
Required/Optional |
Optional |
Function |
Whether to use different quantization factors for each channel. |
---|---|
Type |
bool |
Value Range |
true or false |
Description |
|
Recommended Value |
true |
Required/Optional |
Optional |
Configuration Tuning
If the inference accuracy of the model quantized based on the default configuration in the config.json file drops significantly, perform the following steps to tune the quantization configuration file:
- Execute the quantization script in the amct_tensorflow_sample.tar.gz package to perform quantization based on the default configuration generated by the create_quant_config API.
- If the inference accuracy with the model quantized in 1 is as expected, configuration tuning ends. Otherwise, go to 3.
- Tune batch_num in the quantization configuration file.
batch_num controls the batch count for quantization. Tune it based on the batch size and the number of images required for quantization. Generally, a larger quantity of data samples used in a quantization process indicates a smaller accuracy loss after quantization. However, excessive data does not improve accuracy, but occupies more memory, reduces the quantization speed, and may result in insufficient memory, video RAM, and thread resources. Therefore, it is recommended that the product of batch_num and batch_size be 16 or 32.
- If the inference accuracy with the model quantized in 3 is as expected, configuration tuning ends. Otherwise, go to 5.
- Tune quant_enable in the quantization configuration file.
quant_enable specifies whether to quantize a layer. If set to true, the layer is to be quantized. If set to false, otherwise. If the configuration of a layer is not present, the quantization of the layer is skipped. Generally, specifying less layers to quantize improves quantization accuracy. When the network accuracy is not as expected, locate the quantization-sensitive layers (whose error increases significantly after quantization, such as the top layer, bottom layer, depthwise convolutional layer, and layers with few parameters) in the network, and disable quantization on these layers as needed.
- If the inference accuracy with the model quantized in 5 is as expected, configuration tuning ends. Otherwise, go to 7.
- Tune the values of activation_quant_params and weight_quant_params in the quantization configuration file.
- Data is clipped to the range [left,right] specified by the activation_quant_params parameters. Generally, values distributed near a boundary are sparse, and clip may be performed on all the values, to improve the accuracy. A larger value of min_percentile (max_percentile) indicates that left (right) is closer to the minimum value (maximum value) of the to-be-quantized data. search_range and search_step affect the range of [left, right]. Generally, a larger value of search_range and a smaller value of search_step may achieve higher quantization accuracy, but the quantization takes more time.
- channel_wise in weight_quant_params determines whether to use a different quantization factor for each channel during weight quantization. If set to true, channels are separately quantized using different quantization factors. If set to false, all channels are quantized at a time using the same quantization factors. Generally, the inference accuracy is higher if the channels are separately quantized. However, the MatMul and AVE Pooling layers are channel-irrelevant. Therefore, this parameter does not take effect on these layers.
- If the inference accuracy with the model quantized in 7 is as expected, configuration tuning ends. Otherwise, it indicates that quantization has severe adverse impact on the inference accuracy. In this case, remove the quantization configuration.