got a question, during quantization the dataset should be in floating point inorder the quantization process so quantization algorithm understand the range and distribution of the data that will map the floating-point values to integers.
if my understand above is correct then after i compiled the quantized model to .hef when infer on hailo8l on raspberry pi why it requires to convert the input to int8 doesnt it should be integrated inside the quantizaiton algorithm.
if im wrong feeding int 8 input will obviously provide incorrect raw data so should i process it to get floating point output does that how it works or all the output from the model should be closely same whether it is not quantized or quantized
I hope to get an example on yolo model convertion step by step example
When you quantize a model to run on the Hailo-8L, it’s important to understand that the quantized model expects the input data to be in INT8 format during inference. This is because the quantization process maps the floating-point values to integers using scale and zero-point values.
So, even though you use floating-point data during the model training and quantization process, you need to convert the input data to INT8 format before running inference on the Hailo-8L. If you don’t preprocess the input data, it won’t align with the model’s quantized values, and you’ll get incorrect results.
Here’s a quick overview of the workflow:
Train your model using floating-point data.
Quantize the model using a representative dataset to get the scale and zero-point values.
When you’re ready to run inference, preprocess your input data by normalizing it and converting it to INT8 using the scale and zero-point values.
Run inference on the Hailo-8L using the quantized model and the preprocessed INT8 input data.
If needed, postprocess the model’s output by converting it back to floating-point.
The quantized model should perform very similarly to the original floating-point model, with only minor precision loss.
Noted but how you guys get the scale and zero point value is that values we get after quantization after this runner.optimize(calib_dataset_dict)
right now im not doing any post process because the end node is 6 conv from yolov8n so without any post process the raw result should be same right before quantization and after quantization if i follow the step correctly for preprocessing and quantization
How do we obtain the scale and zero point value is that running after this runner.optimize(calib_dataset_dict).
with this model script i follow the correct pre processing and quantization the output from the 6 conv i should get the same before and after quantization right?