custom yolo11 model conversion to Hailo hef

Hi,
I have a yolo11m model which is trained at 1920x1920 on a custom dataset of 10 classes and exported it to onnx at 1088x1920(H,W) with opset 11.

I want to convert that onnx model it hailo .hef using the hailo AI suite docker for running the inference on a hailo8.

I tried to follow the yolov8 retraining example given in the hailo model zoo git.
for the calib dataset I am using the same validation dataset used in the model training. the dataset has more than 3000 images with different resolutions going upto 2K.

In the tutorial I saw that I have to generate a tf record using the create_coco_tfrecord.py file. I generated the tf record for my custom dataset with coco annotations.
then I tried to run the model compilation using hailo mz after modifying the yolo11m.yaml and base/yolo.yaml and yolov11m_nms_config.yaml with input size as 1088,1920.

hailomz compile –ckpt atr_yolo11m_hailo_op11_1088x1920.onnx 
–calib-path val.tfrecord –yaml yolov11m.yaml –classes 10 –hw-arch hailo8 –performance

I get this error:-

"tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__ReduceDataset_Targuments_0_Tstate_1_output_types_1_device_/job:localhost/replica:0/task:0/device:CPU:0}} Error in user-defined function passed to MapDataset:3 transformation with iterator: ReduceIterator::Root::Prefetch::FiniteTake::map: Paddings must be non-negative: 0 -88 [[{{node PadV2}}]] [Op:ReduceDataset] name:

Just to give a try , when I used the tfrecord generated with coco val2017 dataset , I don’t see this error and the compilation proceeds.

In the tutorial I also saw that the calib dataset can be a dir of images too. so when i pointed the hailomz –calib-path to the dir containing the val images of my custom dataset, the hailomz proceeeds with the model optimisation , but then fails to compile at the end with the following error.

512/512 ━━━━━━━━━━━━━━━━━━━━ 751s 1s/step - _distill_loss_yolov11m/conv102: 0.4609 - _distill_loss_yolov11m/conv105: 0.6831 - _distill_loss_yolov11m/conv67: 0.3585 - _distill_loss_yolov11m/conv71: 0.3621 - _distill_loss_yolov11m/conv74: 0.5544 - _distill_loss_yolov11m/conv83: 0.6819 - _distill_loss_yolov11m/conv87: 0.3968 - _distill_loss_yolov11m/conv90: 0.4695 - _distill_loss_yolov11m/conv99: 0.6219 - total_distill_loss: 4.5892
[info] Model Optimization Algorithm Quantization-Aware Fine-Tuning is done (completion time is 00:50:42.38)
[info] Starting Layer Noise Analysis
Full Quant Analysis: 100%|█████████████████████████████████████████| 8/8 [02:00<00:00, 15.10s/iterations]
[info] Model Optimization Algorithm Layer Noise Analysis is done (completion time is 00:02:04.91)
[info] Model Optimization is done
[info] Saved HAR to: /local/shared_with_docker/yolov11/yolov11m.har
Using generic alls script found in /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov11m.alls because there is no specific hardware alls
[info] Loading model script commands to yolov11m from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov11m.alls
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Appending model script commands to yolov11m from string
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv71
[info] Adding an output layer after conv74
[info] Adding an output layer after conv87
[info] Adding an output layer after conv90
[info] Adding an output layer after conv102
[info] Adding an output layer after conv105
[info] Building optimization options for network layers…
[info] Successfully built optimization options - 51s 2ms
[error] Mapping Failed (allocation time: 51s)
Performance Flow requires automatic resource utilization

[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: Performance Flow requires automatic resource utilization

There is a lot of confusion in this. The calib path has only images,but no ground truths. How will the hailo mz run the quantization porcess and compile the model without the ground truths?

I request @HAILO Please let me know the correct and the exact way to compile a custom trained yolo11 and if possible for yolo26 models , along with the best parameters for the model to work with the best accuracy possible. Lets assume that there will be only one model running at a time on the Hailo chip.

Thank you.

FYI, I also tried an approach , where I downloaded the yolov11m.onnx file from the hailo model zoo link, created the coco tfrecord for the val2017. All the yaml files internally being default,I ran the model compilation command


hailomz compile  yolov11m  --ckpt yolo11m.onnx   --calib-path coco_coco_val2017.tfrecord  --hw-arch hailo8     --performance


Still I get the same error Mapping Failed as below.

[info] Model Optimization is done
[info] Saved HAR to: /local/shared_with_docker/yolov11/yolov11m.har
<Hailo Model Zoo INFO> Using generic alls script found in /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov11m.alls because there is no specific hardware alls
[info] Loading model script commands to yolov11m from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov11m.alls
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Appending model script commands to yolov11m from string
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv71
[info] Adding an output layer after conv74
[info] Adding an output layer after conv87
[info] Adding an output layer after conv90
[info] Adding an output layer after conv102
[info] Adding an output layer after conv105
[info] Building optimization options for network layers...
[info] Successfully built optimization options - 55s 331ms
[error] Mapping Failed (allocation time: 55s)
Performance Flow requires automatic resource utilization

[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: Performance Flow requires automatic resource utilization

Please tell me what is happening here and how do i successfully compile the yolov11m model of mine trained on custom data.

UPDATE, After removing the –performance flag, , I do not see the “Performance Flow requires automatic resource utilization” error and it proceeds further.

But later on i get a different error as below:-
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv71
[info] Adding an output layer after conv74
[info] Adding an output layer after conv87
[info] Adding an output layer after conv90
[info] Adding an output layer after conv102
[info] Adding an output layer after conv105
[info] Building optimization options for network layers…
[info] Successfully built optimization options - 50s 852ms
[info] Trying to compile the network in a single context
[info] Single context flow failed: Recoverable single context error
[info] Building optimization options for network layers…
[info] Successfully built optimization options - 1m 31s 314ms
[info] Using Multi-context flow
[info] Resources optimization params: max_control_utilization=80%, max_compute_utilization=80%, max_compute_16bit_utilization=80%, max_memory_utilization (weights)=80%, max_input_aligner_utilization=80%, max_apu_utilization=80%
[info] Finding the best partition to contexts…
[…<==>…] Elapsed: 01:10:16
[error] Mapping Failed (Timeout, allocation time: 1h 11m 48s)
Value doesn’t fit in field (1,266)
Mapping Failed (Timeout, allocation time: 1h 11m 48s)

[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: Value doesn’t fit in field (1,266)
Mapping Failed (Timeout, allocation time: 1h 11m 48s)

Hi @vaishnav_raju ,

Thanks for all the details.
Can you please confirm that now only this error occur?

[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv71
[info] Adding an output layer after conv74
[info] Adding an output layer after conv87
[info] Adding an output layer after conv90
[info] Adding an output layer after conv102
[info] Adding an output layer after conv105
[info] Building optimization options for network layers…
[info] Successfully built optimization options - 50s 852ms
[info] Trying to compile the network in a single context
[info] Single context flow failed: Recoverable single context error
[info] Building optimization options for network layers…
[info] Successfully built optimization options - 1m 31s 314ms
[info] Using Multi-context flow
[info] Resources optimization params: max_control_utilization=80%, max_compute_utilization=80%, max_compute_16bit_utilization=80%, max_memory_utilization (weights)=80%, max_input_aligner_utilization=80%, max_apu_utilization=80%
[info] Finding the best partition to contexts…
[…<==>…] Elapsed: 01:10:16
[error] Mapping Failed (Timeout, allocation time: 1h 11m 48s)
Value doesn’t fit in field (1,266)
Mapping Failed (Timeout, allocation time: 1h 11m 48s)

[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: Value doesn’t fit in field (1,266)
Mapping Failed (Timeout, allocation time: 1h 11m 48s)

Yes Michael,
Now only this error occurs.
Please help me solve this.

Hi @vaishnav_raju,

The issue might be that YOLO11m at 1088x1920 is too large to fit on Hailo-8. The combination of a medium-sized model with such a high resolution might exceeds the device’s resources.

To resolve this, try one or more of the following:

  1. Reduce the input resolution
  2. Use a smaller model variant
  3. If you must keep the high resolution, consider using the tiling approach - please see here: hailo-apps/hailo_apps/python/pipeline_apps/tiling at main · hailo-ai/hailo-apps · GitHub

Thanks,

Hi Michael,
I am able to convert the yolov11m model if i use the standard 640x640.
Can you please let me know the following:-

  1. What is the maximum input resolution which the hailo8 or 10 can handle?
  2. The calibration dataset requires only the images directory and it has no ground truth labels. and i see that there is a QAT running while the model is compiling.It doesnt make sense as there are no ground truth labels for my dataset being considered. Can you please elaborate on this ?
  3. How do i get the best accuracy of a yolov11m model after compiling it to Hailo .hef. Please let me know all of the parameters to play with , which can maximize the accruacy of my model running on Hailo.
  4. The FPS performance of the yolov11m 640x640 is low even if I use standard yolov11m.hef given from the hailo_model_zoo or the custom compiled model of mine, the FPS on a RaspberryPi5 is around 10 FPS(even with pcie_Gen3 enabled).
  5. How do I improve the yolov11m model performance on the RPi to give a minimum of 15-20 fps .
  6. Please let me know all the best practices to be followed to have the best performance and accuracy for converting and running a yolo model on Rpi5.
  7. Will FPS performance and accuracy be higher for the same model running on a HAILO10/HAILO15 when compared to HAILO8 ?

Hi @vaishnav_raju,

  1. Max input resolution on Hailo-8?
    No hard limit - it depends on model size.

  2. Why QAT without ground truth labels?
    Calibration images are only used to collect activation statistics for quantization. The QAT you see uses distillation - it compares the quantized model against the original float model, not against labels. No ground truth needed.

  3. How to maximize accuracy?

  • Use calibration images from your dataset (256-512 images).
  • Remove the --performance flag.

4 & 5. YOLO11m gets ~10 FPS, how to reach 15-20?
Maybe worth trying YOLO11s/YOLO11n?

  1. Best practices?
    One strategy might be to use the smallest model that meets your accuracy needs.

  2. Hailo-10/15 vs Hailo-8?
    FPS for Yolo on 10 vs 8 should be roughly the same.

Thanks,

Hi @Michael

  1. I was able to compile yolo11m at 720P resolution and was able to deploy it successfully
  2. Thanks for the clarity on QAT
  3. For maximizing the accuracy:-
  • I am using my validation dataset of 3000+ images. while compilation I see in the terminal log that “[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
    [info] Using dataset with 1024 entries for finetune”
  • Seems like it is hardcoded to use 1024 entries. Will it actually improve the accuracy if there is a way to make the compilation process use more than 1024 images? If yes please tell me how to do it
  1. In the Hailo-apps inference code. Is the input images being resized to the model input size by the inference script or the compiled .hef model takes the image and then converts it to the model input size internally ?

  2. Since Hailo-10H is a better chip than Hailo8, why is it that the FPS performance of Yolo the same on both ? Can you please give some clarity?

  3. I am using the compiled yolo11m.hef model on a Hailo8 M.2 card connected to a nvidia orin as well as an rpi 5. The same model running on the same Hailo8 Chip gives different FPS as follows:-

    1. Nvidia Orin - 15 FPS
    2. Rpi5-AI HAT+ - under 8 FPS ( With PICE Gen 3 enabled)

Why am I seeing such drastic change in performance on the Rpi5? Even if I use the standard hailo model zoo yolov11m which has quoted as 50 FPS in the hailo model zoo git hub, but when i run it in the Rpi5, i see an average of 8 FPS. Why is this so ?

Please tell me how i will be able to achieve the FPS quoted by Hailo in the Hailo model zoo repo on a RaspberryPi 5 Hailo8 ?