I noticed a clear difference in performance between my original PyTorch model (.pt) and the compiled Hailo Executable Format (.hef) version. While the .pt model works fine, the .hef model sometimes fails to detect objects or behaves differently in terms of inference.
For example, if you take a screenshot of the same image, the .hef model in production may detect nothing, whereas the .pt model detects all objects correctly.
I initially thought that the production model wasn’t receiving images of the same input size, but even after adjusting this, it didn’t improve. (It’s possible I did something incorrectly during this adjustment.)
One possible cause could be the optimization level used during compilation. I’m currently using level 0, and my local machine has no GPU, which might affect how certain operations are executed.
I’m trying to compile on Google Colab, which has NVIDIA GPUs, but the Hailo SDK doesn’t seem to detect the GPU, likely because it expects specific Hailo hardware (Hailo-8 or Hailo-15), not generic CUDA GPUs.
Additionally, I don’t fully understand the calibration dataset requirements: how different the images should be, how many images are needed, and whether I should only include images from my production cameras.
There could be a couple of reasons for the behavior you are seeing.
In your inference script, you could be passing BGR image arrays instead of RGB.
During compilation, you might have not used calibration images from your use case.
One way to find the root cause is to find the mAP of the hef file on your validation set and compare it with the mAP of the pytorch checkpoint. You can also try compiling using our cloud compiler that allows you to upload some calibration images: Early Access to DeGirum Cloud Compiler
Just to clarify, my dataset is black and white, so there’s no issue with RGB vs BGR.
Regarding the calibration images: do they need to be very diverse? My videos come from surveillance cameras where the backgrounds barely change (only 3–4 different backgrounds), and it’s mainly certain elements in the scene that change. In this case, how many images would you recommend for calibration?
Also, my model was optimized at level 0 because I didn’t have access to a NVIDIA GPU. Could the difference in optimization level be affecting the performance of the compiled HEF?
Finally, I have already submitted a request to access the DeGirum Cloud Compiler.
Yes, higher optimization levels should give you better FPS from the HEF file.
Since you’re working with fixed surveillance cameras, focus on lighting variety and object diversity rather than background changes. Use 100-300 well-selected frames from actual production footage, including different times of day, object occlusion levels, motion blur, and varying object distances/sizes. Avoid using too many similar frames.
About grayscale input: Most Hailo networks expect 3-channel RGB input. If you’re using grayscale, make sure you’re either duplicating the single channel three times [gray, gray, gray] or have custom parsing for single-channel input. The key is keeping your calibration images, original model, and inference pipeline all consistently using the same format (either grayscale or 3-channel).
For calibration, how should I determine if two images are too similar? For example, if an object moves slightly between frames but the background remains the same, would these images be considered redundant or can both be used? Also, does the Hailo optimization level affect only the FPS of the HEF file, or can it also influence the model’s accuracy? The logs mention that using optimization level 0 is not recommended for production and might reduce precision—could you clarify how it impacts performance?