Accuracy degradation after quantization for Hailo HW

alexf · December 28, 2023, 12:18pm

Basic Checklist of “immediate suspects”:

Is the model accurate in SDK_FP_OPTIMIZED inference emulation mode?
1 …Otherwise, it’s not a pure quantization issue…
1. Please make sure to use actual images from calibration set and also apply the postprocessing and inspect final result to validate the SW pipeline running the model within the Hailo block, before the actual HW-targeted compression.
2. Common pitfalls: (A) Faulty (duplication/omission) distribution of pre/post processing bocks (e.g., normalization) among implementations - programmatic vs. Hailo-integrated using model-script command. (B) Inconsistency of parsing start/end nodes with the pre/post-processing applied around the Hailo blocks.
Is the calibration set diverse and representative enough of the use case?
1. Common pitfalls: Using a single image, possibly repeated. Using random/single-color or otherwise out-of-domain images
Is the test methodology used actually indicative of failure?
1. The proper test is three-way - compare both original and compressed model outputs to the ground-truth using the task-specific postprocessing and metric. Ideally, using a significant number of images.
2. Common pitfalls: (A) Using a single image and visual inspection - good only for the most basic proof-of-principle and detecting catastrophic failures. (B) Two-way test - only comparing the outputs of full-precision and compressed model. This can yield seemingly significant errors even in very good cases.
Do you use BatchNormalization (BN) after convolution layers?
1. BN-less convolutions typically make quantization very challenged because of wide outputs ranges, as you can see in histograms provided by the Layer Analysis Tool.
2. Clipping can alleviate the issue but it’s potential is limited. The most recommended course of action is re-train the network with full usage of BatchNorm.

For further information check out DFCguide–>ModelOptimization under HailoDeveloperZone, and specifically the DebuggingAccuracy section.
TODO link to degradation-debugging Webinar (video/presentation)

victorc · March 1, 2024, 7:10pm

Note that QAT can be performed even if the original model was not created using Keras. Thanks to the Hailo runner’s API: runner.get_keras_model(), it is possible to obtain the Keras equivalent of any quantized model stored in a HAR file. Once the Keras is obtained, it is possible to add new Keas layer to it and retrain the model with quantized weights using the training dataset. Please refer to: https://hailo.ai/developer-zone/documentation/dataflow-compiler-v3-26-0/?sp_referrer=tutorials_notebooks/notebooks/DFC_6_QAT_Tutorial.html#Running-QAT

Topic		Replies	Views
Model quantization issue from tensorflow to tensorflow Lite General hailo8 , error	6	199	December 9, 2024
Obb model quantization poor benchmark General network	3	114	November 27, 2024
TF Resnet50v1 accuracy degradation General raspberry-pi , hailo8	2	31	March 24, 2025
Poor Inference Results of My MobileNetV2-UNet on Hailo8L General	10	108	February 27, 2025
Optimization/quantization problem General	1	134	March 5, 2025

Accuracy degradation after quantization for Hailo HW

Related topics