i’m using DFC to convert a .onnx model trained on a custom dataset into .hef to run on hailo8 hat devices. my system information:
- CPU: Xeon E3-1270V3
- RAM: 16GB
- GPU: NVIDIA GTX 1060 6GB
- GPU Driver version: 575.64.03
- CUDA version: 12.5.1
- CUDNN version: 9.10
- Hailo DFC version: 3.32.0
- Hailo Model Zoo version: 2.16
the command that i use: hailomz compile yolov11s --ckpt=yolov11s-vehicles.onnx --hw-arch hailo8 --calib-path train/images --classes 4 --performance
the output of the command:
[info] No GPU chosen, Selected GPU 0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1755710630.788529 6570 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1755710630.792892 6570 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
In file included from /home/hyann/hailodfc/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1929,
from /home/hyann/hailodfc/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/hyann/hailodfc/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5,
from /home/hyann/.pyxbld/temp.linux-x86_64-3.10/home/hyann/vehicle-detection-model-training/hailo_model_zoo/hailo_model_zoo/core/postprocessing/cython_utils/cython_nms.c:1145:
/home/hyann/hailodfc/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " “#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION” [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with "
| ^~~~~~~
Start run for network yolov11s …
Initializing the hailo8 runner…
[info] Translation started on ONNX model yolov11s
[info] Restored ONNX model yolov11s (completion time: 00:00:00.30)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.79)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.23/cv3.0/cv3.0.2/Conv /model.23/cv2.0/cv2.0.2/Conv /model.23/cv3.1/cv3.1.2/Conv /model.23/cv2.1/cv2.1.2/Conv /model.23/cv2.2/cv2.2.2/Conv /model.23/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: ‘images’: ‘yolov11s/input_layer1’.
[info] End nodes mapped from original model: ‘/model.23/cv2.0/cv2.0.2/Conv’, ‘/model.23/cv3.0/cv3.0.2/Conv’, ‘/model.23/cv2.1/cv2.1.2/Conv’, ‘/model.23/cv3.1/cv3.1.2/Conv’, ‘/model.23/cv2.2/cv2.2.2/Conv’, ‘/model.23/cv3.2/cv3.2.2/Conv’.
[info] Translation completed on ONNX model yolov11s (completion time: 00:00:01.70)
[info] Appending model script commands to yolov11s from string
[info] Added nms postprocess command to model script.
[info] Saved HAR to: /home/hyann/vehicle-detection-model-training/yolov11s.har
Using generic alls script found in /home/hyann/vehicle-detection-model-training/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov11s.alls because there is no specific hardware alls
Preparing calibration data…
[info] Loading model script commands to yolov11s from /home/hyann/vehicle-detection-model-training/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov11s.alls
[info] Loading model script commands to yolov11s from string
[info] Found model with 3 input channels, using real RGB images for calibration instead of sampling random data.
[info] Starting Model Optimization
I0000 00:00:1755710644.504764 6570 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5242 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
I0000 00:00:1755710644.709806 6570 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5242 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
I0000 00:00:1755710646.735036 6570 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5242 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
I0000 00:00:1755710646.952869 6570 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5242 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.90)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 0%| | 0/64 [00:00<?, ?entries/s]I0000 00:00:1755710690.993486 6728 cuda_dnn.cc:529] Loaded cuDNN version 91200
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1755710691.291733 6728 conv_ops_gpu.cc:330] None of the algorithms provided by cuDNN frontend heuristics worked; trying fallback algorithms. Conv: batch: 8
in_depths: 3
out_depths: 32
in: 641
in: 641
data_format: 1
filter: 3
filter: 3
filter: 3
dilation: 1
dilation: 1
stride: 2
stride: 2
padding: 0
padding: 0
dtype: DT_FLOAT
group_count: 1
device_identifier: “sm_6.1 with 6357188608B RAM, 10 cores, 1784500KHz clock, 4004000KHz mem clock, 1572864B L2$”
version: 3
Traceback (most recent call last):
………..
2 root error(s) found.
(0) NOT_FOUND: No algorithm worked! Error messages:
Profiling failure on CUDNN engine eng1{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
Profiling failure on CUDNN engine eng28{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
Profiling failure on CUDNN engine eng0{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
[[{{node yolov11s_1/conv1_1/conv_op_1/Conv2D}}]]
[[yolov11s_1/ew_sub_softmax1_1/act_op_1/StatefulPartitionedCall/critical_section_execute/StatefulPartitionedCall/cond_2/pivot_f/_50/_35]]
(1) NOT_FOUND: No algorithm worked! Error messages:
Profiling failure on CUDNN engine eng1{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
Profiling failure on CUDNN engine eng28{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
Profiling failure on CUDNN engine eng0{}: UNKNOWN: CUDNN_STATUS_EXECUTION_FAILED
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(6121): ‘status’
[[{{node yolov11s_1/conv1_1/conv_op_1/Conv2D}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_one_step_on_data_distributed_162809]
do anyone have any idea why this error is happening, please help, many thanks