Data understanding on hailo quantization

SAN · November 29, 2024, 3:38am

hye everyone,

got a question, during quantization the dataset should be in floating point inorder the quantization process so quantization algorithm understand the range and distribution of the data that will map the floating-point values to integers.

if my understand above is correct then after i compiled the quantized model to .hef when infer on hailo8l on raspberry pi why it requires to convert the input to int8 doesnt it should be integrated inside the quantizaiton algorithm.
if im wrong feeding int 8 input will obviously provide incorrect raw data so should i process it to get floating point output does that how it works or all the output from the model should be closely same whether it is not quantized or quantized

I hope to get an example on yolo model convertion step by step example

omria · December 2, 2024, 1:17pm

Hey @SAN

When you quantize a model to run on the Hailo-8L, it’s important to understand that the quantized model expects the input data to be in INT8 format during inference. This is because the quantization process maps the floating-point values to integers using scale and zero-point values.

So, even though you use floating-point data during the model training and quantization process, you need to convert the input data to INT8 format before running inference on the Hailo-8L. If you don’t preprocess the input data, it won’t align with the model’s quantized values, and you’ll get incorrect results.

Here’s a quick overview of the workflow:

Train your model using floating-point data.
Quantize the model using a representative dataset to get the scale and zero-point values.
When you’re ready to run inference, preprocess your input data by normalizing it and converting it to INT8 using the scale and zero-point values.
Run inference on the Hailo-8L using the quantized model and the preprocessed INT8 input data.
If needed, postprocess the model’s output by converting it back to floating-point.

The quantized model should perform very similarly to the original floating-point model, with only minor precision loss.

Let me know if you have any other questions!

Best Regards,
Omria

SAN · December 3, 2024, 5:33am

Noted but how you guys get the scale and zero point value is that values we get after quantization after this runner.optimize(calib_dataset_dict)

right now im not doing any post process because the end node is 6 conv from yolov8n so without any post process the raw result should be same right before quantization and after quantization if i follow the step correctly for preprocessing and quantization

How do we obtain the scale and zero point value is that running after this runner.optimize(calib_dataset_dict).
alls_lines = [

add normalization layer

Batch size is 8 by default

“normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])\n”,
“resize_input1= resize(resize_shapes=[640,640])\n”,
“model_optimization_flavor(optimization_level=2, compression_level=2, batch_size=8)\n”,
]

with this model script i follow the correct pre processing and quantization the output from the 6 conv i should get the same before and after quantization right?

Topic		Replies	Views
Hailo Calibration - Quantization process General raspberry-pi , hailo8	3	623	August 2, 2024
Where to quantize inputs General raspberry-pi , hailo8	3	323	September 25, 2024
Hailo model input layer is UINT8 but expects FLOAT32 during runtime General hailo8	2	103	November 20, 2024
Inference output dtype leads to different results General hailo8	6	139	December 4, 2024
Accessing Quantization Offsets - Python SDK (4.19+) General	1	45	February 3, 2025

Data understanding on hailo quantization

add normalization layer

Batch size is 8 by default

Related topics