The Hailo model conversion starts from models in TFLite or ONNX format in floating point format.
To understand the workflow I would recommend you start with working trough the tutorials in the Hailo AI Software Suite Docker. Just call:
hailo tutorial
This will start a Jupyter Notebook server with notebooks for each step.
The model optimization will require a number images depending on which optimization level you are using. You can find more details in the Hailo Dataflow Compiler User Guide. More is not always better.
I followed the Hailo DFC tutorial documentation and noticed that with the default settings, only a subset of my entire dataset is being used for calibration. As a result, I observed a performance degradation in the quantized model. In contrast, when quantizing a TensorFlow model to an int8 TFLite model, the process uses every sample from the representative dataset with a TFLite default quantization method, and I didn’t see any performance loss.
I would like to apply the same basic quantization approach used by TFLite to Hailo. How can I configure Hailo’s optimization to replicate this behavior?
Any suggestions or insights would be greatly appreciated!