Poor performance

Hello,
I have yolo11 medium model that I finetuned.

Like the poster here, I am seeing serious performance degradation when I use the .hef model produced by the optimization and compilation process. Like that poster, the compile was unable to recognize my GPU – I was on a workstation with a T550 laptop GPU. I had cuda v12 on the system and also tried with 11.8 in the conda environment – no luck on either.

I got this message on the optimization step:

[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune

which I wasn’t sure how to interpret?

I saw on the other post that having optimization set to 0 (as I ended up with) will affect FPS, but will it effect accuracy as well?

For what it’s worth, I was running the model through the script in Hailo-Application-Code-Examples/runtime/hailo-8/python/object_detection.

Should the accuracy on the HEF usually be comparable to what you get from the .pt model on an Nvidia card?

Thanks in advance for any insights!

Hey @Justin_Brody,

Let me break down what’s happening with your model compilation:

That dataset warning you’re seeing just means you provided more training data than the system was set up to use. By default, it only uses about 1024 images for calibration, but you can bump this up with --calibration-size to use more of your data. More data usually means better quantization results, especially for YOLO models.

About optimization level 0 - yeah, it definitely affects accuracy, not just speed. When it’s set to 0, the compiler basically skips a bunch of optimizations that could help with precision and resource allocation. Your model architecture might be tricky to optimize, which is why it defaulted to this conservative setting.

The GPU detection issue - can you share the specific error messages you’re getting? That way I can help you troubleshoot what’s actually going wrong with your T550. It might be a CUDA version mismatch, driver issue, or memory requirements that we can work around. The exact error output would tell us whether it’s a compatibility problem or something we can fix.

As for accuracy differences - don’t expect your HEF to match your PyTorch model exactly. You’re going from full precision (FP32) on NVIDIA to quantized INT8 on specialized hardware, so some drop is inevitable. Hailo aims for less than 2% mAP loss with proper fine-tuning, but without it (or with optimization level 0), you could see much bigger drops.

Could you share your config and alls files, plus those GPU error messages? That would help me give you more specific advice on how to improve things.