How to apply 16-bits quantization to all the convolution layers in the model

victorc · August 26, 2024, 3:54pm

If you try to quantize the whole model to 16-bits using the alls command:

'model_optimization_config(compression_params, auto_16bit_weights_ratio=1.0)\n',

and it fails because not all the layers of the model support 16-bits quantization, please try to quantize only the convolution layers by using:

'quantization_param({conv*}, precision_mode=a16_w16)\n',
in the alls file.

Note if you do that, the input layer might be modified to accept 16-bits data instead of 8-bits data if the first compute layer is a convolution layer.

To verify that, you can use the tool 'hailortcli parse-hef' to see the format of the input layer, whether it is UINT8 or UIN16.
If it is UINT16, please refer to the post Can I run inference on a model that was quantized to have fully 16-bit weights? to see how to handle this situation in your inference code.

Topic		Replies	Views
16 bit conv layers for object detection yolov8m General	2	48	December 30, 2024
[PLZ] Is it possible to perform 4-bit quantization on the multi-head attention layers of the DETR model? General hailort , hailo8	1	91	September 25, 2024
Precision mode conflict error General dfc	3	79	August 26, 2024
Can I run inference on a model that was quantized to have fully 16-bit weights? General python , cpp	0	271	April 9, 2024
Strange behavior with quantization General	3	81	February 5, 2025

How to apply 16-bits quantization to all the convolution layers in the model

Related topics