Quantization Configuration
Overview
If inference based on the config.json quantization configuration file generated by the create_quant_retrain_config call has significant accuracy drop, tune the config.json file until the accuracy is as expected. The following is an example of the file content. Keep the layer names unique in the JSON file.
{ "version":1, "conv1":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize" }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true } }, "conv2_1/expand":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize" }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true } }, "conv2_1/dwise":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize" }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true } }, }
Configuration File Options
The following describes the configuration options available in the configuration file.
Function |
Version number of the quantization configuration file |
---|---|
Type |
int |
Value Range |
1 |
Description |
Currently, only version 1 is available. |
Recommended Value |
1 |
Required/Optional |
Optional |
Function |
Retrain enable by layer |
---|---|
Type |
bool |
Value Range |
true or false |
Description |
If set to true, the retrain-based quantization is performed on the layer. If set to false, otherwise. |
Recommended Value |
true |
Required/Optional |
Optional |
Function |
Data quantization configuration by layer |
---|---|
Type |
object |
Value Range |
None |
Description |
Includes the following parameters:
|
Recommended Value |
None |
Required/Optional |
Optional |
Function |
Weight quantization configuration by layer |
---|---|
Type |
object |
Value Range |
None |
Description |
Includes the following parameters:
|
Recommended Value |
None |
Required/Optional |
Optional |
Function |
Quantization algorithm by layer |
---|---|
Type |
object |
Value Range |
None |
Description |
|
Recommended Value |
Set to ulq_quantize for data quantization or arq_retrain for weight quantization. |
Required/Optional |
Optional |
Function |
Whether to use different quantization factors for each channel |
---|---|
Type |
bool |
Value Range |
true or false |
Description |
|
Recommended Value |
true |
Required/Optional |
Optional |
Function |
Lower limit enable of the data quantization algorithm |
---|---|
Type |
bool |
Value Range |
true or false |
Description |
If this option is not included, the AMCT automatically sets the lower limit of the data quantization algorithm according to the graph structure. If this option is included: when the upstream layer of the quantization layer is ReLU, you need to manually set this option to true; when the upstream layer of the quantization layer is not ReLU, you need to manually set this option to false |
Recommended Value |
Do not include this option. |
Required/Optional |
Optional |
Function |
Upper limit of the data quantization algorithm |
---|---|
Type |
float |
Value Range |
clip_max>0 Controls the upper limit max based on the data distribution of the feature map at different layers. The recommended value range is as follows: [0.3 * max, 1.7 * max] |
Description |
If this option is included, the clip upper limit of the data quantization algorithm is fixed. If this option is not included, the clip upper limit is learned using the IFMR algorithm. |
Recommended Value |
Do not include this option. |
Required/Optional |
Optional |
Function |
Lower limit of the data quantization algorithm |
---|---|
Type |
float |
Value Range |
clip_min<0 Controls the lower limit min based on the data distribution of the feature map at different layers. The recommended value range is as follows: [0.3 * min, 1.7 * min] |
Description |
If this option is included, the clip lower limit of the data quantization algorithm is fixed. If this option is not included, the clip lower limit is learned using the IFMR algorithm. |
Recommended Value |
Do not include this option. |
Required/Optional |
Optional |
Configuration Tuning
If the inference accuracy of the model quantized based on the default configuration in the config.json file drops significantly, perform the following steps to tune the quantization configuration file:
- Execute the quantization script in the amct_tensorflow_sample.tar.gz package to perform quantization based on the default configuration generated by the create_quant_retrain_config API.
- If the inference accuracy with the model quantized in 1 is as expected, configuration tuning ends. Otherwise, go to 3.
- Tune the quantization configuration file and include the fixed_min, clip_max, and clip_min options in the file.
- If the fixed_min option is included: when the upstream layer of the quantization layer is ReLU, you need to manually set this option to true; when the upstream layer of the quantization layer is not ReLU, you need to manually set this option to false
- clip_max and clip_min control the upper max and lower limit min based on the data distribution of the feature map at each layer. The recommended value range of clip_max is [0.3 * max, 1.7 * max], and the recommended value range of clip_min is [0.3 * min, 1.7 * min]. The mean square error (MSE) after quantization and dequantization is computed to find the optimal clip_max and clip_min values.
The following is an example of the tuned quantization configuration file:{ "version":1, "inference/Conv2D":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize", "clip_max":3.0, "clip_min":-3.0, "fixed_min":true }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true } }, "inference/Conv2D_1":{ "retrain_enable":true, "retrain_data_config":{ "algo":"ulq_quantize", "clip_max":3.0, "clip_min":-3.0, "fixed_min":true }, "retrain_weight_config":{ "algo":"arq_retrain", "channel_wise":true } } }
- Configuration tuning ends if the inference accuracy meets the requirement. Otherwise, it indicates that retrain has severe adverse impact on the inference accuracy. In this case, remove the retrain configuration.