Hailo8L model convertion issues.

I am in the process of converting several models to the HEF format for the Hailo-8L architecture and have run into some technical challenges. I would appreciate any clarification or suggestions you might have regarding these issues.

Here are the details of my environment:

  • Hailo Dataflow Compiler version 3.33.0

  • ONNX version 1.16.0

  • NumPy version 1.26.4

  • Ubuntu 24 running under WSL2

  • No GPU

1. Parsing Problems with DETR and yolo26l-pose

I have tried converting these two models:

  • DETR (Roboflow object detection for IR)

  • yolo26l-pose

Both fail during the parsing phase when executing hailo parser onnx. The error isn’t just a generic “list index out of range”—it specifically originates within the ONNX translator during the creation of a Tile layer:

File “…/onnx_translator.py”, line 1371, in _create_tile_layer

axis, repeats = filtered_repeats\[0\] 

IndexError: list index out of range

According to the stack trace, this failure occurs inside these functions:

  • _create_tile_layer

  • _layer_callback_from_vertex

  • _add_direct_layers

  • convert_model()

So, my question is:

  • Is it currently feasible to convert DETR and yolo26l-pose to the HEF format for the Hailo-8L?

2. Compilation Failures with yolo11l-pose, yolo11m-pose, and yolov8m-pose

The following models:

  • yolo11l-pose

  • yolo11m-pose

  • yolov8m-pose

successfully complete the parsing and optimization stages. However, compilation then fails, returning errors such as “No valid partition found”, “Mapping Failed”.

This occurs during the allocation or mapping phase, despite several attempts at optimization beforehand.

Conversion process used:

  1. Exported the model from Ultralytics to ONNX format (opset version 12, image size set at 640 by 640 pixels).

  2. Parsed the ONNX file using the following command:

hailo parser onnx --hw-arch hailo8l

  1. Performed optimization with hailo optimize utilising a calibration dataset derived from COCO:
  • Dataset contained 1500 images

  • Images were resized to 640 by 640 pixels

  • No normalization to the 0–1 range was applied

  1. Compiled the model using:

hailo compiler --hw-arch hailo8l

The model script used included the line:

performance_param(compiler_optimization_level=max)

Despite following these steps, compilation repeatedly fails with errors related to partitioning or allocation.

I have a few questions regarding this:

  • Is there a preprocessing step specific to YOLO pose models on the Hailo-8L platform that I might have overlooked?

  • Or a problem within my current conversion process.

  • Given that yolov8m-pose is listed in the Model Explorer for Hailo-8L, does the version available there differ structurally from a standard Ultralytics export in a way that must be accounted for prior to compilation, and could this difference explain the issues I am encountering?

Hi @Kac_Zal,

Good questions indeed!

  1. There isn’t any specific pre-processing for the Yolo26/Yolo11 family. But, in any case, I don’t believe that this affected the compiler.
  2. Need to test these models, I don’t think that there is a blocker on running these.
    1. I expect RT-DETR to be a much tougher convert than the Yolo’s.

This is what I’ve used to compile yolo26-pose to Hailo8L. Please note that this will NOT yield a good accurate model, I’ve only done it with false calibset to test the ability to compile it.

#!/usr/bin/env python3

from hailo_sdk_client import ClientRunner
from zenlog import log
import numpy as np

def main():
    runner = ClientRunner(hw_arch='hailo8l')
    model = 'yolo26n-pose'
    end_node_names = [
        '/model.23/one2one_cv2.0/one2one_cv2.0.2/Conv',
        '/model.23/one2one_cv2.1/one2one_cv2.1.2/Conv',
        '/model.23/one2one_cv2.2/one2one_cv2.2.2/Conv',
        '/model.23/one2one_cv4_kpts.0/Conv',
        '/model.23/one2one_cv4_kpts.1/Conv',
        '/model.23/one2one_cv4_kpts.2/Conv',
        '/model.23/one2one_cv3.0/one2one_cv3.0.2/Conv',
        '/model.23/one2one_cv3.1/one2one_cv3.1.2/Conv',
        '/model.23/one2one_cv3.2/one2one_cv3.2.2/Conv'
    ]
    runner.translate_onnx_model(model + '.onnx', end_node_names=end_node_names)
    log.info('Model translation completed successfully.')
    runner.save_har(model + '.har')
    log.info('Model saved as HAR file successfully.')
    calibset = np.random.rand(10, 640, 640, 3).astype(np.float32)  # Example calibration dataset
    runner.optimize(calibset)
    log.info('Model optimization completed successfully.')
    runner.save_har(model + '_q.har')
    hef = runner.compile()
    with open(model + '.hef', 'wb') as f:
        f.write(hef)
    log.info('Model compiled and saved as HEF file successfully.')




if __name__ == "__main__":
    main()

Thanks for help. I modified your script to work with a specific dataset and tried using the COCO dataset. The conversion itself went through alright, but now I’m stuck with a bunch of problems:

  1. Detections are way off and keep showing super high confidence

    • The model almost never actually spots a human right.

    • The confidence scores are weirdly high—usually around 0.991 to 0.999, sometimes dipping to about 0.95—but it still fires detections all over the place.

    • It looks like the model “sees humans” even when none are present.

  2. Warning about “GPU not found” during conversion

    • While converting, I keep getting this warning that no GPU was detected, so the optimization level drops to zero.

    • How does this affect the conversion and the converted model itself?

  3. Odd behavior regarding dataset usage during conversion

    • Even when I use a dataset with like 200–1500 images, the logs say optimization only runs on about 64 images. Any idea why it caps at that amount?

    • Also, does the optimizer really use the images I provide, or is it pulling some internal samples?

  4. Questions about dataset formatting

    • What’s the ideal layout for the dataset when doing HEF conversion?

    • How big should it be? And should it be just humans in various poses, or is it fine to use other objects and backgrounds?

The mode advanced optimization flows (ADAround, QFT etc.) requires a GPU in order to run them. The lack of GPU is just mentioned to let you know that these flows are not available on that setup.

How would that affect? Well, it will not let you run these in order to try and fix accuracy issues.

The optimization process is built from 3 major parts:

  1. Pre-Quantization optimizations (e.g. Tiled-Squeeze and Excite, Equalization)

  2. Quantization

  3. Post-Quantization (e.g. QFT, Adaround)

In the Quantization part, we use by default 64 images for the calibration, there is no need for more. For the latter parts, more images are crucial, but also there is a need for a GPU. So the depends on the stage, the optimizer could use all the images you provide.

Not sure what you mean by ideal layout, it depends on the format that the model expects to get the data (e.g. BGR, RGB, Grayscale, NV12). The dataset should be representive of the actual setup that you’re going to use in terms of different lighting, angles, objects sizes.

For being practical on the debug, I suggest to try to run the noise-analysis, to see what’s your SNR, while not exactly accuracy it can give you good hints where you are loosing your accuracy.

Hi folks, quoted is the allocation script I used to successfully optimize and compile the yolo26m-pose (medium) model for Hailo 10H.

yolo26m_pose/normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])

model_optimization_config(calibration, batch_size=8, calibset_size=1024)
model_optimization_flavor(optimization_level=4, compression_level=0)

post_quantization_optimization(adaround, batch_size=1, policy=enabled)
quantization_param({yolo26m_pose/dw*}, precision_mode=a16_w16)
quantization_param([yolo26m_pose/conv73, yolo26m_pose/conv74, yolo26m_pose/conv77, yolo26m_pose/conv92, yolo26m_pose/conv93, yolo26m_pose/conv96, yolo26m_pose/conv109, yolo26m_pose/conv110, yolo26m_pose/conv113], precision_mode=a16_w16)
quantization_param([yolo26m_pose/output_layer1, yolo26m_pose/output_layer2, yolo26m_pose/output_layer3, yolo26m_pose/output_layer4, yolo26m_pose/output_layer5, yolo26m_pose/output_layer6, yolo26m_pose/output_layer7, yolo26m_pose/output_layer8, yolo26m_pose/output_layer9], precision_mode=a16_w16)

Note the optimization runs overnight. Quite possibly good precision can be also reached with QFT without Adaround, just didn’t test. But compilation is also an hour or two… This assumes parsing up to last convs before the postproc, leading to the following correpondence of layer names/shapes:

"yolo26m_pose/conv73": \["/model.23/one2one_cv2.0/one2one_cv2.0.2/Conv_output_0", \[4, 80, 80\]\],

"yolo26m_pose/conv92": \["/model.23/one2one_cv2.1/one2one_cv2.1.2/Conv_output_0", \[4, 40, 40\]\],

"yolo26m_pose/conv109": \["/model.23/one2one_cv2.2/one2one_cv2.2.2/Conv_output_0", \[4, 20, 20\]\],

"yolo26m_pose/conv74": \["/model.23/one2one_cv4_kpts.0/Conv_output_0", \[51, 80, 80\]\],

"yolo26m_pose/conv93": \["/model.23/one2one_cv4_kpts.1/Conv_output_0", \[51, 40, 40\]\],

"yolo26m_pose/conv110": \["/model.23/one2one_cv4_kpts.2/Conv_output_0", \[51, 20, 20\]\],

"yolo26m_pose/conv77": \["/model.23/one2one_cv3.0/one2one_cv3.0.2/Conv_output_0", \[1, 80, 80\]\],

"yolo26m_pose/conv96": \["/model.23/one2one_cv3.1/one2one_cv3.1.2/Conv_output_0", \[1, 40, 40\]\],

"yolo26m_pose/conv113": \["/model.23/one2one_cv3.2/one2one_cv3.2.2/Conv_output_0", \[1, 20, 20\]\]
1 Like

This reaches 48 FPS and apparently very good results visually although i didn’t benchmark exact COCO precision. Let me know if you need other variants (n/s/l)