Accuracy degradation after quantization for Hailo HW

Basic Checklist of “immediate suspects”:

  1. Is the model accurate in SDK_FP_OPTIMIZED inference emulation mode?
    1 …Otherwise, it’s not a pure quantization issue…
    1. Please make sure to use actual images from calibration set and also apply the postprocessing and inspect final result to validate the SW pipeline running the model within the Hailo block, before the actual HW-targeted compression.
    2. Common pitfalls: (A) Faulty (duplication/omission) distribution of pre/post processing bocks (e.g., normalization) among implementations - programmatic vs. Hailo-integrated using model-script command. (B) Inconsistency of parsing start/end nodes with the pre/post-processing applied around the Hailo blocks.
  2. Is the calibration set diverse and representative enough of the use case?
    1. Common pitfalls: Using a single image, possibly repeated. Using random/single-color or otherwise out-of-domain images
  3. Is the test methodology used actually indicative of failure?
    1. The proper test is three-way - compare both original and compressed model outputs to the ground-truth using the task-specific postprocessing and metric. Ideally, using a significant number of images.
    2. Common pitfalls: (A) Using a single image and visual inspection - good only for the most basic proof-of-principle and detecting catastrophic failures. (B) Two-way test - only comparing the outputs of full-precision and compressed model. This can yield seemingly significant errors even in very good cases.
  4. Do you use BatchNormalization (BN) after convolution layers?
    1. BN-less convolutions typically make quantization very challenged because of wide outputs ranges, as you can see in histograms provided by the Layer Analysis Tool.
    2. Clipping can alleviate the issue but it’s potential is limited. The most recommended course of action is re-train the network with full usage of BatchNorm.

For further information check out DFCguide–>ModelOptimization under HailoDeveloperZone, and specifically the DebuggingAccuracy section.
TODO link to degradation-debugging Webinar (video/presentation)


Note that QAT can be performed even if the original model was not created using Keras. Thanks to the Hailo runner’s API: runner.get_keras_model(), it is possible to obtain the Keras equivalent of any quantized model stored in a HAR file. Once the Keras is obtained, it is possible to add new Keas layer to it and retrain the model with quantized weights using the training dataset. Please refer to: