Inquiry about the root cause of FPS drops when 4-bit quantization applied to small models

Sanghyeon_Lee · November 21, 2025, 10:57pm

Hello.

I am doing some inferences with RegnetX-800MF on Hailo-8 raspberry pi.
Recently I encountered some FPS drops when I apply 4-bit quantization while compiling. I know that applying 4-bit quantization to small models are not a recommended configuration, but anyway I wanted to figure out the root cause of the issue.

The hardware utilization metrics which are printed right after the compilation using the hailo DFC showed a consistent trend with FPS, so I suspect the cause of the FPS drop is the underutilization of hardware units.

Consequently, I am wondering what really makes the hardware being underutilized if 4-bit quantization is applied to the model. I want to see the memory dumps to inspect how the 4-bit and 8-bit data values are aligned in the memory during the inference and figure out that if the arrangement of data actually causes some bottlenecks.

Could you please inform me of any method that I can observe how the data are actually stored and managed in the memory(L1, L2, L3, L4)?

And if possible, could you also explain the underlying mechanisms contributing to the observed FPS drops when the 4-bit quantization is applied to relatively small models?

Thanks!

The example model script used for compilation

normalization1 = normalization([123.675, 116.28, 103.53], [58.395, 57.12, 57.375])
model_optimization_flavor(optimization_level=2, compression_level=0, batch_size=8)
model_optimization_config(compression_params, auto_4bit_weights_ratio=0.4)

KlausK · November 21, 2025, 11:07pm

Welcome to the Hailo Community!

We do not share information about the underlying architecture of our hardware. Our team is continuously improving the tools and methods used to convert models for execution on Hailo hardware. These improvements should not change how you work with our tools. The Hailo Dataflow Compiler will handle the model conversion process.

If your model fits into a single Hailo-8 context, there is generally no benefit in quantizing any layers to 4-bit.

Could you share more about the goal of your experiments? Understanding your objective will help us offer more specific guidance.

Sanghyeon_Lee · November 22, 2025, 5:55am

Thanks for your answer.

The experiment is just for my personal interest.
While reading the hailo DFC documentation, I was curious about why 4-bit quantization is disabled for small models by default, so I just tried it.

Topic		Replies	Views
Hailo Calibration - Quantization process General raspberry-pi , hailo8	3	735	August 2, 2024
Accuracy degradation after quantization for Hailo HW General network	1	373	March 1, 2024
16-bit quantization on final layers General raspberry-pi	1	141	August 11, 2024
Hailo profiler usage General	1	317	August 22, 2024
Custom ONNX models on H8L Raspberry General dfc , raspberry-pi , hailo8	3	2418	July 30, 2024

Inquiry about the root cause of FPS drops when 4-bit quantization applied to small models

Related topics