How to Replicate TFLite’s int8 Quantization in Hailo Optimization?

I achieved satisfactory performance when quantizing a TensorFlow model to an int8 TFLite model using 600,000 representative samples using

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

How can I apply the same quantization approach using Hailo’s optimization? Would the following configuration work?

model_script_lines = [
    "model_optimization_flavor(optimization_level=0, compression_level=0)\n", 
    "model_optimization_config(calibration, batch_size=1, calibset_size=600000)\n"
]
runner.load_model_script("".join(model_script_lines))
runner.optimize(calib_dataset)

Any feedback or suggestions would be appreciated!

The Hailo model conversion starts from models in TFLite or ONNX format in floating point format.

To understand the workflow I would recommend you start with working trough the tutorials in the Hailo AI Software Suite Docker. Just call:

hailo tutorial

This will start a Jupyter Notebook server with notebooks for each step.

The model optimization will require a number images depending on which optimization level you are using. You can find more details in the Hailo Dataflow Compiler User Guide. More is not always better.

Thanks for your reply.

I followed the Hailo DFC tutorial documentation and noticed that with the default settings, only a subset of my entire dataset is being used for calibration. As a result, I observed a performance degradation in the quantized model. In contrast, when quantizing a TensorFlow model to an int8 TFLite model, the process uses every sample from the representative dataset with a TFLite default quantization method, and I didn’t see any performance loss.

I would like to apply the same basic quantization approach used by TFLite to Hailo. How can I configure Hailo’s optimization to replicate this behavior?

Any suggestions or insights would be greatly appreciated!