GPU CuDNN library is not recognized by Hailo SW Suite

I have an issue with using a GPU, which is required for optimizing tasks by the Hailo SW Suite. In short, Hailo SW Suite is not picking up the CuDNN (which is loaded and works fine) and generates this message:

[info] No GPU chosen and no suitable GPU found, falling back to CPU
(see detailed logs below)

I can reproducible this issue on two different hosts with different GPUs: RTX 5080 and RTX 2060.

  1. It doesn’t work for Hailo AI Software Suite docker container, neither for Hailo AI Software Suite SDK environment

  2. Furthermore, the latest Docker container hailo8_ai_sw_suite_2025-07_docker includes, on one hand, CUDA 11.8 and, on the other hand, DFC wants CUDA 12.5 (see the logs below)

I already reported this issue in 2 similar threads, but those threads became inactive, so bringing it up here as a stand-alone topic.

Log from the hailo8_ai_sw_suite_2025-07_docker(note the highlighted items)

Welcome to Hailo AI Software Suite Container
To list available commands, please type:


HailoRT: hailortcli -h
Dataflow Compiler: hailo -h
Hailo Model Zoo: hailomz -h
Run TAPPAS Detection Application: tappas/detection/detection.sh


*(hailo_virtualenv) hailo@ge46fox-Lenovo-Legion-5-15IMH05H:/local/workspace$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

±----------------------------------------------------------------------------------------+
(hailo_virtualenv) hailo@ge46fox-Lenovo-Legion-5-15IMH05H:/local/workspace$ hailo -h
[info] No GPU chosen and no suitable GPU found, falling back to CPU.
[info] First time Hailo Dataflow Compiler is being used. Checking system requirements… (this might take a few seconds)
[Warning] It is recommended to have 32 GB of RAM, while this system has only 17 GB.
[Warning] CUDA version should be 12.5 or higher, found 11.8 .
[Warning] CUDNN version should be 9 or higher, found ..

Component Requirement Found
========== ========== ========== ==========
OS Ubuntu Ubuntu Required
Release 20.04 22.04 Required
Package python3-tk V Required
Package graphviz V Required
Package libgraphviz-dev V Required
Package python3.10-dev V Required
RAM(GB) 16 17 Required
RAM(GB) 32 17 Recommended
CPU-Arch x86_64 x86_64 Required
CPU-flag avx V Required
GPU-Driver 560 575 Recommended
CUDA 12.5 11.8 Recommended
CUDNN 9 . Recommended
Var:CC unset unset Required
Var:CXX unset unset Required
Var:LD unset unset Required
Var:AS unset unset Required
Var:AR unset unset Required
Var:LN unset unset Required
Var:DUMP unset unset Required
Var:CPY unset unset Required*

(hailo_venv) ge46fox@ge46fox-Lenovo-Legion-5-15IMH05H:~/hailo_ai_sw_suite$python3 -c “import tensorflow as tf;
print(‘TensorFlow version:’, tf.version);
print(‘Built with cuDNN:’, tf.test.is_built_with_cuda());
print(‘GPU available:’, tf.config.list_physical_devices(‘GPU’));
a = tf.random.normal([8, 224, 224, 3]);
b = tf.keras.layers.Conv2D(64, 3)(a);
print(‘Computation successful, output shape:’, b.shape)”
TensorFlow version: 2.18.0**
Built with cuDNN: True**
GPU available: [PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’)]
I0000 00:00:1755105444.790273 16970 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4156 MB memory: → device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
I0000 00:00:1755105445.295004 16970 cuda_dnn.cc:529] Loaded cuDNN version 91200
Computation successful, output shape: (8, 222, 222, 64)

Log from the locally installed Hailo AI Software Suite SDK→ (note the highlighted items)

(hailo_venv) ge46fox@ge46fox-Lenovo-Legion-5-15IMH05H:~/hailo_ai_sw_suite$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0

(hailo_venv) ge46fox@ge46fox-Lenovo-Legion-5-15IMH05H:~/hailo_ai_sw_suite$ python -c “import tensorflow as tf;
print(‘TensorFlow version:’, tf.version);
print(‘Built with cuDNN:’, tf.test.is_built_with_cuda());
print(‘GPU available:’, tf.config.list_physical_devices(‘GPU’));
a = tf.random.normal([8, 224, 224, 3]);
b = tf.keras.layers.Conv2D(64, 3)(a);
print(‘Computation successful, output shape:’, b.shape)”
2025-08-13 19:25:08.618908: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow version: 2.18.0
Built with cuDNN: True
GPU available: [PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’)]
I0000 00:00:1755105910.239850 17334 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4108 MB memory: → device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
I0000 00:00:1755105910.735380 17334 cuda_dnn.cc:529] Loaded cuDNN version 91200
Computation successful, output shape: (8, 222, 222, 64)

(hailo_venv) ge46fox@ge46fox-Lenovo-Legion-5-15IMH05H:~/hailo_ai_sw_suite$ rm hailo_venv/etc/hailo/check_system_requirements_was_called
(hailo_venv) ge46fox@ge46fox-Lenovo-Legion-5-15IMH05H:~/hailo_ai_sw_suite$ hailo -h
[info] No GPU chosen and no suitable GPU found, falling back to CPU.
[info] First time Hailo Dataflow Compiler is being used. Checking system requirements… (this might take a few seconds)
[Warning] It is recommended to have 32 GB of RAM, while this system has only 17 GB.
[Warning] CUDNN version should be 9 or higher, found ..
Component Requirement Found
========== ========== ========== ==========
OS Ubuntu Ubuntu Required
Release 20.04 22.04 Required
Package python3-tk V Required
Package graphviz V Required
Package libgraphviz-dev V Required
Package python3.10-dev V Required
RAM(GB) 16 17 Required
RAM(GB) 32 17 Recommended
CPU-Arch x86_64 x86_64 Required
CPU-flag avx V Required
GPU-Driver 560 575 Recommended
CUDA 12.5 12.5 Recommended
CUDNN 9 . Recommended

Var:CC unset unset Required
Var:CXX unset unset Required
Var:LD unset unset Required
Var:AS unset unset Required
Var:AR unset unset Required
Var:LN unset unset Required
Var:DUMP unset unset Required
Var:CPY unset unset Required

Here are the logs from the RTX 5080 host with the similar issues:

→ It picks up GPU 0, but CuDNN is still not detected

(hailo_venv) ge46fox@sim4000rtx5080:~/hailo_ai_sw_suite$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
(hailo_venv) ge46fox@sim4000rtx5080:~/hailo_ai_sw_suite$ nvidia-smi
Wed Aug 20 17:47:00 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 |
|-----------------------------------------±-----------------------±---------------------+


=========================================+=======================
| 0 NVIDIA GeForce RTX 5080 On | 00000000:01:00.0 On | N/A |

(hailo_venv) ge46fox@sim4000rtx5080:~/hailo_ai_sw_suite$ hailo -h
[info] No GPU chosen, Selected GPU 0
[info] First time Hailo Dataflow Compiler is being used. Checking system requirements… (this might take a few seconds)
[Warning] CUDNN version should be 9 or higher, found ..
Component Requirement Found
========== ========== ========== ==========
OS Ubuntu Ubuntu Required
Release 20.04 22.04 Required
Package python3-tk V Required
Package graphviz V Required
Package libgraphviz-dev V Required
Package python3.10-dev V Required
RAM(GB) 16 62 Required
RAM(GB) 32 62 Recommended
CPU-Arch x86_64 x86_64 Required
CPU-flag avx V Required
GPU-Driver 560 575 Recommended
CUDA 12.5 12.5 Recommended
CUDNN 9 . Recommended

Hailo Docker container just fails to start with this GPU:

ge46fox@sim4000rtx5080:~/Downloads$ ./hailo_ai_sw_suite_docker_run.sh --resume
Resuming an old container
Error response from daemon: could not select device driver “” with capabilities: [[gpu]]
Error: failed to start containers: hailo8_ai_sw_suite_2025-07_container

Hey @kd178,

So I looked into that Docker image mismatch issue you’re having. The problem is that the hailo8_ai_sw_suite_2025-07_docker image comes with CUDA 11.8, but the Hailo Dataflow Compiler actually needs CUDA 12.5 or higher plus cuDNN 9. That’s exactly why it keeps falling back to CPU mode when you run it in the container.

I’m going to reach out to our R&D team to figure out how this happened and get it resolved.

For your bare metal SDK issue, that 5080 host Docker error you’re seeing (could not select device driver "" with capabilities: [[gpu]]) is actually a separate problem. Your NVIDIA Container Toolkit isn’t configured properly. You’ll need to install nvidia-container-toolkit and restart Docker to fix this. After that, you should be able to run docker run --gpus all nvidia/cuda:12.5-runtime nvidia-smi successfully.

But honestly, if you’re going with bare metal and only need it for compilation, my advice would be to just install the DFC in a virtual environment and install the model zoo on top of that. It’s much cleaner and you’ll avoid all these Docker hassles.

Hope this helps!

Hey Omira,

Thanks a lot for looking into this - could you please ping here once the problem with the docker is resolved so we can try it?

Regarding the bare metal installation - as you can find in the logs above we did try to install SW suite on bare metal using Hailo AI Software Suite Self Extractable
hailo8_ai_sw_suite_2025-07.run which deploys venv.

For the bare metal setup, the CUDA and CuDNN match the requirement, but:

  • On the RTX 2060 host it fails to find CuDNN and don’t see the GPU at all:

Log from the locally installed Hailo AI Software Suite SDK

[info] No GPU chosen and no suitable GPU found, falling back to CPU.

CUDA 12.5 12.5 Recommended
CUDNN 9 . Recommended
..
(see the full log above)

  • On the RTX 5080 host it does see GPU but fails to find CuDNN and crashes during optimizer execution:

logs from the RTX 5080 host with the similar issues:
(hailo_venv) ge46fox@sim4000rtx5080:~/hailo_ai_sw_suite$ hailo -h
[info] No GPU chosen, Selected GPU 0
….
CUDA 12.5 12.5 Recommended
CUDNN 9 . Recommended

Hey @kd178,

Have you tried installing just the DFC and Model Zoo directly on bare metal using a Python virtual environment? I think the error you’re seeing might be resolved by skipping the suite entirely. This approach tends to work better on bare metal anyway - I personally prefer it over using Docker.

Hi Omria,

I tried as you suggested installing a stand-alone DFC in a clean venv but that didn’t help - I get exactly the same issues:

  1. DFC fails to detect the CUDNN which is installed and work properly
  2. DFC errors out when performing model optimization

Here is the logs for (1) and (2) - problems are highlighted with bold:

(1)

(hailo_venv) ge46fox@sim4000rtx5080:~/hailo_dfc$ hailo -h
[info] No GPU chosen, Selected GPU 0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1756912465.737767 37224 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1756912465.740256 37224 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[info] First time Hailo Dataflow Compiler is being used. Checking system requirements… (this might take a few seconds)
[Warning] CUDNN version should be 9 or higher, found ..
Component Requirement Found
========== ========== ========== ==========
OS Ubuntu Ubuntu Required
Release 20.04 22.04 Required
Package python3-tk V Required
Package graphviz V Required
Package libgraphviz-dev V Required
Package python3.10-dev V Required
RAM(GB) 16 62 Required
RAM(GB) 32 62 Recommended
CPU-Arch x86_64 x86_64 Required
CPU-flag avx V Required
GPU-Driver 560 575 Recommended
CUDA 12.5 12.5 Recommended
CUDNN 9 . Recommended
Var:CC unset unset Required
Var:CXX unset unset Required
Var:LD unset unset Required
Var:AS unset unset Required
Var:AR unset unset Required
Var:LN unset unset Required
Var:DUMP unset unset Required
Var:CPY unset unset Required

(2)

[info] Loading model script commands to resnet_v1_18 from string
[info] Found model with 3 input channels, using real RGB images for calibration instead of sampling random data.
[info] Starting Model Optimization


InternalError Traceback (most recent call last)
Cell In[4], line 12
9 runner.load_model_script(alls)
11 # Call Optimize to perform the optimization process
—> 12 runner.optimize(calib_dataset)
14 # Save the result state to a Quantized HAR file
15 quantized_model_har_path = f"{model_name}_quantized_model.har"

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py:16, in allowed_states..wrap..wrapped_func(self, *args, **kwargs)
12 if self._state not in states:
13 raise InvalidStateException(
14 f"The execution of {func.name} is not available under the state: {self._state.value}",
15 )
—> 16 return func(self, *args, **kwargs)

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py:2206, in ClientRunner.optimize(self, calib_data, data_type, work_dir, checkpoint, memento)
2160 @allowed_states(States.HAILO_MODEL, States.FP_OPTIMIZED_MODEL, States.QUANTIZED_BASE_MODEL)
2161 def optimize(
2162 self,
(…)
2168 memento: Optional[FlowCheckPoint] = None,
2169 ):
2170 “”"
2171 Apply optimizations to the model:
2172
(…)
2204
2205 “”"
→ 2206 result = self._optimize(
2207 calib_data, data_type=data_type, work_dir=work_dir, checkpoint=checkpoint, memento=memento
2208 )
2209 return result.flow_memento

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py:16, in allowed_states..wrap..wrapped_func(self, *args, **kwargs)
12 if self._state not in states:
13 raise InvalidStateException(
14 f"The execution of {func.name} is not available under the state: {self._state.value}",
15 )
—> 16 return func(self, *args, **kwargs)

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py:2025, in ClientRunner._optimize(self, calib_data, data_type, work_dir, checkpoint, memento)
2018 checkpoint_info = self._sdk_backend.lora_quantization(
2019 adapters[-1],
2020 data_continer,
2021 work_dir=work_dir,
2022 checkpoint_info=checkpoint_info,
2023 )
2024 else:
→ 2025 checkpoint_info = self._sdk_backend.full_quantization(
2026 data_continer,
2027 work_dir=work_dir,
2028 checkpoint_info=checkpoint_info,
2029 )
2031 if checkpoint_info.quantization_done:
2032 self._state = States.QUANTIZED_MODEL

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py:1196, in SDKBackendQuantization.full_quantization(self, data_continer, work_dir, checkpoint_info)
1194 self.setup_quantization(data_continer, work_dir=work_dir)
1195 self.pre_quantization_structural()
→ 1196 new_checkpoint_info = self._full_acceleras_run(
1197 data_continer, work_dir=work_dir, checkpoint_info=checkpoint_info
1198 )
1200 if new_checkpoint_info.quantization_done:
1201 self._logger.verbose(“Core and post Quantization is done with Acceleras”)

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py:1415, in SDKBackendQuantization._full_acceleras_run(self, data_continer, adapter_name, work_dir, checkpoint_info)
1402 layers_defaults = {
1403 **self._model.get_per_layer_precision_config(),
1404 **self._model.get_per_layer_translation_config(),
1405 }
1407 parser = MOScriptParser(
1408 self._script_parser.original_script,
1409 layers_defaults,
(…)
1413 self.optimization_target,
1414 )
→ 1415 mo_config = parser.run()
1416 self._validate_commands(mo_config)
1417 self._messages.log_optimization_flavor_comments(parser.results)

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_model_optimization/tools/mo_script_parser.py:86, in MOScriptParser.run(self)
84 def run(self) → ModelOptimizationConfig:
85 flavors_commands, mo_commands = self._parse_script_to_commands()
—> 86 default_cfg = self._parse_flavor_to_default_cfg(flavors_commands)
87 default_cfg.setdefault(“precision_config”, dict())
88 default_cfg[“precision_config”].setdefault(“target”, self._target)

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_model_optimization/tools/mo_script_parser.py:113, in MOScriptParser._parse_flavor_to_default_cfg(self, flavors_commands)
109 “”"
110 Prase flavors commands to get the default configuration of the model.
111 “”"
112 gpu_info = get_gpu_availability_mode()
→ 113 data_length = get_tf_dataset_length(
114 self._data_continer.data,
115 self._data_continer.data_type,
116 RECOMMENDED_DATASET_SIZE,
117 gpu_info.gpu_availability,
118 )
119 has_gpu = gpu_info.gpu_availability != GPUAvailabilityMode.NOT_AVAILABLE
120 kwargs = {}

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_model_optimization/acceleras/utils/tf_utils.py:140, in get_tf_dataset_length(data, data_type, threshold, gpu_state)
134 dataset_length = result[“dataset_length”]
136 else: # gpu_state in use:
137 # avoid using GPU in case of MO subprocess enabled
138 # If GPU is in use, we must use the same process.
139 # If GPU is not available, tf might’ve already worked on the CPU, and can’t be forked
→ 140 dataset_length = convert_and_get_length(data, data_type, threshold)
142 return dataset_length

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_model_optimization/acceleras/utils/tf_utils.py:112, in get_tf_dataset_length..convert_and_get_length(data, data_type, threshold)
110 def convert_and_get_length(data, data_type, threshold):
111 dataset, _ = data_to_dataset(data, data_type)
→ 112 dataset_length = get_dataset_length(dataset, threshold=threshold)
113 return dataset_length

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/hailo_model_optimization/acceleras/utils/dataset_util.py:39, in get_dataset_length(dataset, threshold)
37 dataset = dataset.take(threshold)
38 cardinality = dataset.cardinality()
—> 39 if (cardinality == tf.data.experimental.UNKNOWN_CARDINALITY).numpy():
40 return dataset.reduce(0, lambda x, _: x + 1).numpy()
41 else:

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/tensorflow/python/ops/tensor_math_operator_overrides.py:138, in _tensor_equals_factory(self, other)
135 def _tensor_equals_factory(self, other):
136 from tensorflow.python.ops import math_ops
→ 138 return math_ops.tensor_equals(self, other)

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback..error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.traceback)
→ 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb

File ~/hailo_dfc/hailo_venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py:6002, in raise_from_not_ok_status(e, name)
6000 def raise_from_not_ok_status(e, name) → NoReturn:
6001 e.message += (" name: " + str(name if name is not None else “”))
→ 6002 raise core._status_to_exception(e) from None

InternalError: {{function_node _wrapped__Equal_device/job:localhost/replica:0/task:0/device:GPU:0}} ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’ [Op:Equal] name: