Hi everyone,
I’m currently trying to use the Hailo Dataflow Compiler (v2025.10) to compile a custom face recognition model (ResNet50) and a YOLOv8m-seg model, but I’m running into a critical failure when trying to utilize my GPU for optimization.
The Environment:
-
Host OS: Debian
-
GPU: NVIDIA GeForce RTX 5060 Ti (16GB VRAM - Blackwell Architecture / Compute Capability 12.0)
-
Host Drivers: 595.58.03 (CUDA 13.2)
-
Software Suite:
hailo8_ai_sw_suite_2025-10:1(Docker) -
Container OS: Ubuntu 22.04 / Python 3.10
The Issue: Every time I attempt to run runner.optimize() with a calibration dataset, TensorFlow attempts to initialize the GPU and fails immediately with a CUDA_ERROR_INVALID_HANDLE. This happens even on a simple tf.constant(0.0) + 1.0 test.
Error Log:
tensorflow.python.framework.errors_impl.InternalError: {{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’ [Op:AddV2]
I also receive the following warning regarding JIT compilation:
tensorflow/core/common_runtime/gpu/gpu_device.cc:2433] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
Steps Taken So Far:
-
Verified GPU visibility inside the container via
nvidia-smiandtf.config.list_physical_devices('GPU')(both successful). -
Set
TF_FORCE_GPU_ALLOW_GROWTH=true. -
Set
NVIDIA_DISABLE_REQUIRE=1to bypass driver version mismatches. -
Expanded JIT cache via
CUDA_CACHE_MAXSIZE. -
Attempted to install
cuda-nvcc-12-5inside the container to provide a more modern PTX compiler. -
Forced
LD_LIBRARY_PATHto include cuDNN 9 and CUDA 12.5 libs.
The Problem: It appears that the TensorFlow version bundled with the current Hailo SDK cannot successfully JIT-compile kernels for the Blackwell (RTX 50-series) architecture using the internal CUDA 11.8/12.5 toolkits. This prevents me from using Level 1+ optimization (Adaround/QAT) which is required for the accuracy levels my project demands.
Questions:
-
Is there a workaround to enable GPU support for Blackwell cards in the current SDK?
-
Are there plans to update the internal TensorFlow/CUDA requirements in an upcoming release to support Compute Capability 12.0?
-
Can I manually upgrade the TensorFlow version within the Hailo virtualenv without breaking the
hailo_model_optimizationdependencies?
Any help would be greatly appreciated!
my compile_model.py:
from hailo_sdk_client import ClientRunner
import numpy as np
import os
import glob
from PIL import Image
onnx_path = “w600k_r50.onnx”
model_name = “custom_face_recognition”
calib_dir = “calib_data”
image_height = 112
image_width = 112
print(“1. Initializing Hailo-8 Compiler…”)
runner = ClientRunner(hw_arch=“hailo8”)
print(f"2. Parsing ONNX model: {onnx_path}…")
runner.translate_onnx_model(
onnx_path,
model_name,
start_node_names=None,
end_node_names=None
)
print(“3. Loading Calibration Data…”)
def load_calib_data():
images = glob.glob(os.path.join(calib_dir, “\*.jpg”))
if not images:
raise ValueError(f"CRITICAL: No .jpg images found in {calib_dir}/")
dataset = []
for img_path in images[:500]:
img = Image.open(img_path).convert('RGB')
img = img.resize((image_width, image_height))
# Convert to numpy array and add a batch dimension
img_array = np.array(img).astype(np.float32)
dataset.append(img_array)
dataset = dataset * 3
return np.array(dataset)
calib_data = load_calib_data()
print(f"Loaded {len(calib_data)} images. Starting Optimization (Quantization)…")
runner.load_model_script(‘face_opt.alls’)
print(“Starting Deep Optimization (Guided by .alls script)…”)
runner.optimize(calib_data)
print(“4. Compiling to .hef format…”)
hef = runner.compile()
print(“5. Saving model…”)
with open(f"{model_name}.hef", “wb”) as f:
f.write(hef)
print(f"DONE! saved as {model_name}.hef")
the output of compile_model.py:
Traceback (most recent call last):
File “/workspace/compile_model.py”, line 1, in
from hailo_sdk_client import ClientRunner
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/_init_.py”, line 29, in
import hailo_model_optimization # noqa: F401
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/_init_.py”, line 53, in
tf.constant(0.0) + 1.0
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py”, line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py”, line 6002, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’ [Op:AddV2] name: