Error when converting .onnx to .hef

Long_Nguyen · August 20, 2025, 10:32pm

i’m using DFC to convert a .onnx model trained on a custom dataset into .hef to run on hailo8 hat devices. my system information:

CPU: Xeon E3-1270V3
RAM: 16GB
GPU: NVIDIA GTX 1060 6GB
GPU Driver version: 575.64.03
CUDA version: 12.5.1
CUDNN version: 9.10
Hailo DFC version: 3.32.0
Hailo Model Zoo version: 2.16

the command that i use: hailomz compile yolov11s --ckpt=yolov11s-vehicles.onnx --hw-arch hailo8 --calib-path train/images --classes 4 --performance

the output of the command:
[info] No GPU chosen, Selected GPU 0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1755710630.788529 6570 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1755710630.792892 6570 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
In file included from /home/hyann/hailodfc/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1929,
from /home/hyann/hailodfc/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/hyann/hailodfc/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5,
from /home/hyann/.pyxbld/temp.linux-x86_64-3.10/home/hyann/vehicle-detection-model-training/hailo_model_zoo/hailo_model_zoo/core/postprocessing/cython_utils/cython_nms.c:1145:
/home/hyann/hailodfc/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " “#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION” [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with "
| ^~~~~~~
Start run for network yolov11s …
Initializing the hailo8 runner…
[info] Translation started on ONNX model yolov11s
[info] Restored ONNX model yolov11s (completion time: 00:00:00.30)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.79)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.23/cv3.0/cv3.0.2/Conv /model.23/cv2.0/cv2.0.2/Conv /model.23/cv3.1/cv3.1.2/Conv /model.23/cv2.1/cv2.1.2/Conv /model.23/cv2.2/cv2.2.2/Conv /model.23/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: ‘images’: ‘yolov11s/input_layer1’.
[info] End nodes mapped from original model: ‘/model.23/cv2.0/cv2.0.2/Conv’, ‘/model.23/cv3.0/cv3.0.2/Conv’, ‘/model.23/cv2.1/cv2.1.2/Conv’, ‘/model.23/cv3.1/cv3.1.2/Conv’, ‘/model.23/cv2.2/cv2.2.2/Conv’, ‘/model.23/cv3.2/cv3.2.2/Conv’.
[info] Translation completed on ONNX model yolov11s (completion time: 00:00:01.70)
[info] Appending model script commands to yolov11s from string
[info] Added nms postprocess command to model script.
[info] Saved HAR to: /home/hyann/vehicle-detection-model-training/yolov11s.har
Using generic alls script found in /home/hyann/vehicle-detection-model-training/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov11s.alls because there is no specific hardware alls
Preparing calibration data…
[info] Loading model script commands to yolov11s from /home/hyann/vehicle-detection-model-training/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov11s.alls
[info] Loading model script commands to yolov11s from string
[info] Found model with 3 input channels, using real RGB images for calibration instead of sampling random data.
[info] Starting Model Optimization
I0000 00:00:1755710644.504764 6570 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5242 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
I0000 00:00:1755710644.709806 6570 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5242 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
I0000 00:00:1755710646.735036 6570 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5242 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
I0000 00:00:1755710646.952869 6570 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5242 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.90)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 0%| | 0/64 [00:00<?, ?entries/s]I0000 00:00:1755710690.993486 6728 cuda_dnn.cc:529] Loaded cuDNN version 91200
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1755710691.291733 6728 conv_ops_gpu.cc:330] None of the algorithms provided by cuDNN frontend heuristics worked; trying fallback algorithms. Conv: batch: 8
in_depths: 3
out_depths: 32
in: 641
in: 641
data_format: 1
filter: 3
filter: 3
filter: 3
dilation: 1
dilation: 1
stride: 2
stride: 2
padding: 0
padding: 0
dtype: DT_FLOAT
group_count: 1
device_identifier: “sm_6.1 with 6357188608B RAM, 10 cores, 1784500KHz clock, 4004000KHz mem clock, 1572864B L2$”
version: 3

Traceback (most recent call last):
………..

2 root error(s) found.
(0) NOT_FOUND: No algorithm worked! Error messages:
Profiling failure on CUDNN engine eng1{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
Profiling failure on CUDNN engine eng28{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
Profiling failure on CUDNN engine eng0{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
[[{{node yolov11s_1/conv1_1/conv_op_1/Conv2D}}]]
[[yolov11s_1/ew_sub_softmax1_1/act_op_1/StatefulPartitionedCall/critical_section_execute/StatefulPartitionedCall/cond_2/pivot_f/_50/_35]]
(1) NOT_FOUND: No algorithm worked! Error messages:
Profiling failure on CUDNN engine eng1{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
Profiling failure on CUDNN engine eng28{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
Profiling failure on CUDNN engine eng0{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
[[{{node yolov11s_1/conv1_1/conv_op_1/Conv2D}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_one_step_on_data_distributed_162809]

do anyone have any idea why this error is happening, please help, many thanks

omria · August 21, 2025, 5:46pm

Hey @Long_Nguyen,

Welcome to the Hailo Community!

I see you’re encountering a TensorFlow/cuDNN kernel selection failure during the calibration/optimization stage. Looking at your log, there are a couple of key issues:

cuDNN is testing multiple convolution algorithms but failing with “No algorithm worked! … CUDNN_STATUS_EXECUTION_FAILED” on a 3→32 convolution
Your convolution input appears to be 641×641 pixels. YOLOv8/11 models typically expect inputs that are multiples of 32 (like 640×640), and these odd dimensions can cause cuDNN algorithm selection problems

Try running the stages separately for better debugging control:

# 1) Parse → HAR
hailomz parse yolov11s --ckpt yolov11s-vehicles.onnx --hw-arch hailo8 --output yolov11s_unopt.har

# 2) Optimize (calibrate/quantize) → optimized HAR
# Note: For Pascal GPU + cuDNN 9, consider CPU-only calibration to avoid cuDNN issues
CUDA_VISIBLE_DEVICES="" \
hailomz optimize yolov11s \
  --har yolov11s_unopt.har \
  --calib-path train/images \
  --classes 4 \
  --calib-batch-size 1 \
  --input-shape 1x640x640x3 \
  --output yolov11s_opt.har

# 3) Compile → HEF
hailomz compile yolov11s --har yolov11s_opt.har --hw-arch hailo8 --output yolov11s.hef

This follows the “parse first, compile last” workflow, using the HAR file as the intermediate format between stages. The approach is better for controling the compilation.

Hope this helps!

Long_Nguyen · August 21, 2025, 7:08pm

Thanks for the reply, the conversion is done successfully after I follow your steps. But still, the compatibility between Hailo DFC and NVIDIA driver kit is not great. After I got the error above, I restarted the computer, then hailomz didn’t recognize my GPU anymore and had to fallback to CPU.
(hailodfc) hyann@hyann-MS-7817:~/vehicle-detection-model-training$ hailomz compile yolov11s --har yolov11s.har --hw-arch hailo8
[info] No GPU chosen and no suitable GPU found, falling back to CPU.
Is there any long-term fix for this problem?

omria · August 25, 2025, 10:18am

Updated R&D with this , hopefully will have a solution for this in next release!

user293 · September 9, 2025, 4:49pm

how are you using the gpu. currently i am running optimization of yolo model on linux OS that is present on camera along with hailo accelerator chip.

I do have a 4090 on my local desktop, can i quantize model from .onnx into .hef on my local system, (that is not connected with hailo hardware) and then send the hef file over to my system?

Topic		Replies	Views
Error encountered during onnx to hef converstion General dfc	21	383	August 20, 2024
Convert onnx to hef using hailomz General dfc , error	4	504	August 27, 2024
How to convert custom yolov7_tiny model into hef General dfc , hailo8	6	777	August 26, 2024
Custom ONNX model optimization fails with Exception General dfc , hailo8 , error	13	684	February 10, 2025
Error converting ONNX to .hef General dfc , raspberry-pi	2	960	September 27, 2024

Error when converting .onnx to .hef

Related topics