Can't generate detection models with Hailo 4.22

We just updated from Hailo 4.20 to Hailo 4.22, in order to have paddle OCR working reliably (see paddle_ocr example times out · Issue #353 · hailo-ai/Hailo-Application-Code-Examples · GitHub ) .We are running an object detection pipeline before running the ocr and this is now broken. What previously worked (with 4.20) now returns:

Aug 21 15:15:17 raspberrypi bash[12695]: terminate called after throwing an instance of 'std::invalid_argument'
Aug 21 15:15:17 raspberrypi bash[12695]:   what():  Output tensor best/yolov8_nms_postprocess is not an NMS type
Aug 21 15:15:17 raspberrypi bash[12695]: --------------------------------------
Aug 21 15:15:17 raspberrypi bash[12695]: C++ Traceback (most recent call last):
Aug 21 15:15:17 raspberrypi bash[12695]: --------------------------------------
Aug 21 15:15:17 raspberrypi bash[12695]: 0   filter_letterbox
Aug 21 15:15:17 raspberrypi bash[12695]: 1   filter
Aug 21 15:15:17 raspberrypi bash[12695]: 2   HailoNMSDecode::HailoNMSDecode(std::shared_ptr<HailoTensor>, std::map<unsigned char, std::string, std::less<unsigned char>, std::allocator<std::pair<unsigned char const, std::string > > >&, float, unsigned int, bool)

the model was compiled with NMS, here is the log of the onnx to hef conversion:

xxx:~$ hailo parser onnx best.onnx --hw-arch hailo8
[info] No GPU chosen, Selected GPU 0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1755811062.500358    9861 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1755811062.507213    9861 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[info] Current Time: 21:17:49, 08/21/25
[info] CPU: Architecture: x86_64, Model: Intel(R) Xeon(R) CPU @ 2.20GHz, Number Of Cores: 12, Utilization: 0.0%
[info] Memory: Total: 83GB, Available: 81GB
[info] System info: OS: Linux, Kernel: 6.8.0-1034-gcp
[info] Hailo DFC Version: 3.32.0
[info] HailoRT Version: Not Installed
[info] PCIe: No Hailo PCIe device was found
[info] Running `hailo parser onnx best.onnx --hw-arch hailo8`
[info] Translation started on ONNX model best
[info] Restored ONNX model best (completion time: 00:00:00.25)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.93)
[info] Simplified ONNX model for a parsing retry attempt (completion time: 00:00:02.17)
Parsing failed with recommendations for end node names: ['/model.23/Concat_3'].
Would you like to parse again with the recommendation? (y/n)
y
[info] According to recommendations, retrying parsing with end node names: ['/model.23/Concat_3'].
[info] Translation started on ONNX model best
[info] Restored ONNX model best (completion time: 00:00:00.18)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.84)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.23/cv2.0/cv2.0.2/Conv /model.23/cv3.0/cv3.0.2/Conv /model.23/cv2.1/cv2.1.2/Conv /model.23/cv3.1/cv3.1.2/Conv /model.23/cv2.2/cv2.2.2/Conv /model.23/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: 'images': 'best/input_layer1'.
[info] End nodes mapped from original model: '/model.23/Concat_3'.
[info] Translation completed on ONNX model best (completion time: 00:00:02.12)
Would you like to parse the model again with the mentioned end nodes and add nms postprocess command to the model script? (y/n)
y
[info] Translation started on ONNX model best
[info] Restored ONNX model best (completion time: 00:00:00.19)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.89)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.23/cv2.0/cv2.0.2/Conv /model.23/cv3.0/cv3.0.2/Conv /model.23/cv2.1/cv2.1.2/Conv /model.23/cv3.1/cv3.1.2/Conv /model.23/cv3.2/cv3.2.2/Conv /model.23/cv2.2/cv2.2.2/Conv.
[info] Start nodes mapped from original model: 'images': 'best/input_layer1'.
[info] End nodes mapped from original model: '/model.23/cv2.0/cv2.0.2/Conv', '/model.23/cv3.0/cv3.0.2/Conv', '/model.23/cv2.1/cv2.1.2/Conv', '/model.23/cv3.1/cv3.1.2/Conv', '/model.23/cv2.2/cv2.2.2/Conv', '/model.23/cv3.2/cv3.2.2/Conv'.
[info] Translation completed on ONNX model best (completion time: 00:00:02.40)
[info] Appending model script commands to best from string
[info] Added nms postprocess command to model script.
[info] Saved HAR to: /home/stanislas.duprey/tmp/best.har

xxx:~$ hailo optimize best.har --calib-set-path calib_set.npy --hw-arch hailo8
[info] No GPU chosen, Selected GPU 0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
[info] Current Time: 21:20:22, 08/21/25
[info] CPU: Architecture: x86_64, Model: Intel(R) Xeon(R) CPU @ 2.20GHz, Number Of Cores: 12, Utilization: 0.1%
[info] Memory: Total: 83GB, Available: 81GB
[info] System info: OS: Linux, Kernel: 6.8.0-1034-gcp
[info] Hailo DFC Version: 3.32.0
[info] HailoRT Version: Not Installed
[info] PCIe: No Hailo PCIe device was found
[info] Running `hailo optimize best.har --calib-set-path calib_set.npy --hw-arch hailo8`
[info] For NMS architecture yolov8 the default engine is cpu. For other engine please use the 'engine' flag in the nms_postprocess model script command. If the NMS has been added during parsing, please parse the model again without confirming the addition of the NMS, and add the command manually with the desired engine.
[info] The layer best/conv80 was detected as cls_layer.
[info] Using the default score threshold of 0.001 (range is [0-1], where 1 performs maximum suppression) and IoU threshold of 0.7 (range is [0-1], where 0 performs maximum suppression).
Changing the values is possible using the nms_postprocess model script command.
[info] The activation function of layer best/conv80 was replaced by a Sigmoid
[info] Starting Model Optimization
[warning] Reducing optimization level to 1 (the accuracy won't be optimized and compression won't be used) because there's less data than the recommended amount (1024)
[info] Model received quantization params from the hn
...
[info] The calibration set seems to not be normalized, because the values range is [(0.0, 1.0), (0.0, 1.0), (0.0, 1.0)].
Since the neural core works in 8-bit (between 0 to 255), a quantization will occur on the CPU of the runtime platform.
Add a normalization layer to the model to offload the normalization to the neural core.
Refer to the user guide Hailo Dataflow Compiler user guide / Model Optimization / Optimization Related Model Script Commands / model_modification_commands / normalization for details.
[info] Model Optimization is done
[info] Saved HAR to: /home/stanislas.duprey/tmp/best_optimized.har
xxx:~$ hailo compiler best_optimized.har --hw-arch hailo8
[info] No GPU chosen, Selected GPU 0
[info] Current Time: 21:40:13, 08/21/25
[info] CPU: Architecture: x86_64, Model: Intel(R) Xeon(R) CPU @ 2.20GHz, Number Of Cores: 12, Utilization: 0.0%
[info] Memory: Total: 83GB, Available: 81GB
[info] System info: OS: Linux, Kernel: 6.8.0-1034-gcp
[info] Hailo DFC Version: 3.32.0
[info] HailoRT Version: Not Installed
[info] PCIe: No Hailo PCIe device was found
[info] Running `hailo compiler best_optimized.har --hw-arch hailo8`
[info] Compiling network
[info] To achieve optimal performance, set the compiler_optimization_level to "max" by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv80
[info] Building optimization options for network layers...
[info] Successfully built optimization options - 10s 337ms
[info] Trying to compile the network in a single context
[info] Single context flow failed: Recoverable single context error
[info] Building optimization options for network layers...
[info] Successfully built optimization options - 17s 878ms
[info] Using Multi-context flow
[info] Resources optimization params: max_control_utilization=60%, max_compute_utilization=60%, max_compute_16bit_utilization=60%, max_memory_utilization (weights)=60%, max_input_aligner_utilization=60%, max_apu_utilization=60%
[info] Finding the best partition to contexts...
[info] Searching for a better partition...
[...<==>.................................] Elapsed: 00:00:38
[info] Partition to contexts finished successfully
[info] Partitioner finished after 152 iterations, Time it took: 26m 15s 958ms
[info] Applying selected partition to 3 contexts...
[info] Validating layers feasibility
...
[info] Successful Mapping (allocation time: 29m 0s)
[info] Compiling kernels of best_context_2...
[info] Bandwidth of model inputs: 9.375 Mbps, outputs: 6.79321 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 14.0625 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 60.9375 Mbps (for a single frame)
[info] Building HEF...
[info] Successful Compilation (compilation time: 28s)
[info] Compilation complete
[info] Saved HEF to: /home/stanislas.duprey/tmp/best.hef
[info] Saved HAR to: /home/stanislas.duprey/tmp/best_compiled.har
Done with the following hailo version:
Hailo Dataflow Compiler v3.32.0

Hey @Stanislas_Duprey,

Welcome to the Hailo Community!

Looking at your issue, it appears the update from version 4.20 to 4.22 has caused problems with the detection component of your model, while the OCR part is still working fine. From what I can see in the logs, it looks like the detection model previously relied on NMS (Non-Maximum Suppression) post-processing, but that’s no longer happening after the update.

Could you share some additional details about your model setup? This would help me better understand what’s going on and figure out how to resolve this issue.

Thanks!

Our detection model is a YoloV11s retrained over a custom dataset.We are running it in a gstreamer pipeline, based off of hailo’s examples. Here’s the relevant snippet of our gstreamer pipeline:

f"hailonet name=inference_hailonet hef-path={model_path} batch-size=4 multi-process-service=true device-count=1 vdevice-group-id=SHARED "
f"nms-score-threshold=0.6 nms-iou-threshold=0.65 output-format-type=HAILO_FORMAT_TYPE_FLOAT32 force-writable=true ! "
f"queue name=inference_hailonet_output_q leaky=downstream max-size-buffers=16 max-size-bytes=0 max-size-time=0 ! "
f"hailofilter name=inference_hailofilter so-path={constants.POST_PROCESS_PATH} config-path={config_path} function-name=filter_letterbox qos=true ! "

Can you run the following command and share the output?

hailortcli parse-hef {model.hef}

I’d suggest managing your NMS thresholds directly in your model script rather than using hailonet properties. When an HEF already contains built-in NMS, setting nms-score-threshold or nms-iou-threshold properties can lead to:

  • Parameters being silently ignored
  • Runtime errors if the output isn’t properly detected as NMS-enabled

Hi! Thank you for your suggestion. Unfortunately it didn’t resolve the issue, I ran with

f"hailonet name=inference_hailonet hef-path={model_path} batch-size=4 multi-process-service=true device-count=1 vdevice-group-id=SHARED "
f"output-format-type=HAILO_FORMAT_TYPE_FLOAT32 force-writable=true ! "
f"queue name=inference_hailonet_output_q leaky=downstream max-size-buffers=16 max-size-bytes=0 max-size-time=0 ! "
f"hailofilter name=inference_hailofilter so-path={constants.POST_PROCESS_PATH} config-path={config_path} function-name=filter_letterbox qos=true ! "

But I keep getting the same error

Here is the ouput of the hailortcli command:

HailoRT warning: Cannot create log file hailort.log! Please check the file ./hailort.log write permissions.
Architecture HEF was compiled for: HAILO8
Network group name: best, Multi Context - Number of contexts: 3
    Network name: best/best
        VStream infos:
            Input  best/input_layer1 UINT8, NHWC(640x640x3)
            Output best/yolov8_nms_postprocess FLOAT32, HAILO NMS BY CLASS(number of classes: 53, maximum bounding boxes per class: 100, maximum frame size: 106212)
            Operation:
                Op YOLOV8
                Name: YOLOV8-Post-Process
                Score threshold: 0.001
                IoU threshold: 0.70
                Classes: 53
                Cross classes: false
                NMS results order: BY_CLASS
                Max bboxes per class: 100
                Image height: 640
                Image width: 640

Any thoughts? This is quite blocking for us :folded_hands:

Hey @Stanislas_Duprey,

Your HEF file has YOLOv8 NMS already integrated, but your current pipeline configuration is attempting to apply hailofilter post-processing designed for pre-NMS tensors.

The issue occurs because:

  1. HEF Output: Your model outputs NMS-processed detection tensors
  2. Pipeline Expectation: The hailofilter with function-name=filter_letterbox expects raw YOLO output tensors
  3. Configuration Conflict: The batch-size=4 setting compounds the incompatibility

The filter_letterbox function in the post-processing library is specifically designed to handle raw YOLO outputs, not NMS-processed tensors.

Technical Details

In the Hailo Apps Infrastructure:

  • The INFERENCE_PIPELINE helper automatically adds hailofilter when post_process_so is provided
  • This assumes the HEF outputs raw tensors requiring post-processing
  • Your NMS-enabled HEF breaks this assumption

Recommended Solution

For HEFs with integrated NMS, modify your pipeline configuration:

  1. Remove post-processing: Omit the post_process_so parameter from INFERENCE_PIPELINE
  2. Adjust batch size: Set batch-size=1 unless your HEF was compiled with batching support
  3. Handle NMS tensors: Process the NMS output directly in your callback or implement a lightweight NMS-to-metadata converter

Hope this Helps!

I have a problem with

`Remove post-processing: Omit the post_process_so parameter from INFERENCE_PIPELINE`

I tried removing

f"queue name=inference_hailonet_output_q leaky=downstream max-size-buffers=16 max-size-bytes=0 max-size-time=0 ! "
f"hailofilter name=inference_hailofilter so-path={constants.POST_PROCESS_PATH} config-path={config_path} function-name=filter_letterbox qos=true ! "

But this fails silently, did I misunderstood your recommendation?

We would love to use paddle but that issue is blocking for us :folded_hands: :prayer_beads: