Can I run inference on a model that was quantized to have fully 16-bit weights?

Omer · April 9, 2024, 11:47am

Firstly yes, Hailo supports compiling a model with all weights quantized to 16-bit. But, notice that this will not work in all cases. I’ll elaborate:
For the HailoRT CLI tool (hailortcli run or hailortcli run2) - there’s no problem running a 16-bit weights HEF filw.
Python API - The Hailo Python API supports 16-bit HEFs, but the format type of the input params and output params needs to be set to format_type=FormatType.UINT16, like so:

input_vstreams_params = InputVStreamParams.make(network_group, quantized=False, format_type=FormatType.UINT16)

output_vstreams_params = OutputVStreamParams.make(network_group, quantized=False, format_type=FormatType.UINT16)

C++ API - It would work, BUT notice that if you are using actual video\images for inference, the way to load it is using OpenCV. OpenCV uses by default uint8, which means that it WOULD NOT WORK with 16-bit inputs. You’ll either get a segmentation fault when trying to load the data or the inference results would be bad because OpenCV would add “garbage” data to fill the 8 remaining bits. The simplest workaround in this case is to have the entire model in 16-bit except for the input layer(s).

In conclusion - Hailo supports full 16-bit weights models, but you need to take care when testing it’s accuracy using the HailoRT API, especially the C++ API when working with OpenCV.

Topic		Replies	Views
Where to quantize inputs General raspberry-pi , hailo8	3	346	September 25, 2024
16-bit quantization on final layers General raspberry-pi	1	118	August 11, 2024
Working python hailo inference General	15	745	March 18, 2025
BF16 Not supported? General hailo8	1	39	May 6, 2025
Argmax 16bit output General	17	130	May 15, 2025

Can I run inference on a model that was quantized to have fully 16-bit weights?

Related topics