Issue with RTX 5060 Ti (Blackwell) - CUDA_ERROR_INVALID_HANDLE in Dataflow Compiler

Hi everyone,

I’m currently trying to use the Hailo Dataflow Compiler (v2025.10) to compile a custom face recognition model (ResNet50) and a YOLOv8m-seg model, but I’m running into a critical failure when trying to utilize my GPU for optimization.

The Environment:

  • Host OS: Debian

  • GPU: NVIDIA GeForce RTX 5060 Ti (16GB VRAM - Blackwell Architecture / Compute Capability 12.0)

  • Host Drivers: 595.58.03 (CUDA 13.2)

  • Software Suite: hailo8_ai_sw_suite_2025-10:1 (Docker)

  • Container OS: Ubuntu 22.04 / Python 3.10

The Issue: Every time I attempt to run runner.optimize() with a calibration dataset, TensorFlow attempts to initialize the GPU and fails immediately with a CUDA_ERROR_INVALID_HANDLE. This happens even on a simple tf.constant(0.0) + 1.0 test.

Error Log:

tensorflow.python.framework.errors_impl.InternalError: {{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’ [Op:AddV2]

I also receive the following warning regarding JIT compilation:

tensorflow/core/common_runtime/gpu/gpu_device.cc:2433] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.

Steps Taken So Far:

  1. Verified GPU visibility inside the container via nvidia-smi and tf.config.list_physical_devices('GPU') (both successful).

  2. Set TF_FORCE_GPU_ALLOW_GROWTH=true.

  3. Set NVIDIA_DISABLE_REQUIRE=1 to bypass driver version mismatches.

  4. Expanded JIT cache via CUDA_CACHE_MAXSIZE.

  5. Attempted to install cuda-nvcc-12-5 inside the container to provide a more modern PTX compiler.

  6. Forced LD_LIBRARY_PATH to include cuDNN 9 and CUDA 12.5 libs.

The Problem: It appears that the TensorFlow version bundled with the current Hailo SDK cannot successfully JIT-compile kernels for the Blackwell (RTX 50-series) architecture using the internal CUDA 11.8/12.5 toolkits. This prevents me from using Level 1+ optimization (Adaround/QAT) which is required for the accuracy levels my project demands.

Questions:

  1. Is there a workaround to enable GPU support for Blackwell cards in the current SDK?

  2. Are there plans to update the internal TensorFlow/CUDA requirements in an upcoming release to support Compute Capability 12.0?

  3. Can I manually upgrade the TensorFlow version within the Hailo virtualenv without breaking the hailo_model_optimization dependencies?

Any help would be greatly appreciated!



my compile_model.py:

from hailo_sdk_client import ClientRunner
import numpy as np
import os
import glob
from PIL import Image

onnx_path = “w600k_r50.onnx” 
model_name = “custom_face_recognition”
calib_dir = “calib_data”
image_height = 112
image_width = 112

print(“1. Initializing Hailo-8 Compiler…”)
runner = ClientRunner(hw_arch=“hailo8”)

print(f"2. Parsing ONNX model: {onnx_path}…")

runner.translate_onnx_model(
onnx_path,
model_name,
start_node_names=None,
end_node_names=None
)

print(“3. Loading Calibration Data…”)
def load_calib_data():
images = glob.glob(os.path.join(calib_dir, “\*.jpg”))
if not images:
raise ValueError(f"CRITICAL: No .jpg images found in {calib_dir}/")

dataset = []
for img_path in images[:500]:
    img = Image.open(img_path).convert('RGB')
    img = img.resize((image_width, image_height))
    # Convert to numpy array and add a batch dimension
    img_array = np.array(img).astype(np.float32)
    dataset.append(img_array)
    
dataset = dataset * 3

return np.array(dataset)

calib_data = load_calib_data()

print(f"Loaded {len(calib_data)} images. Starting Optimization (Quantization)…")

runner.load_model_script(‘face_opt.alls’)

print(“Starting Deep Optimization (Guided by .alls script)…”)
runner.optimize(calib_data)

print(“4. Compiling to .hef format…”)
hef = runner.compile()

print(“5. Saving model…”)
with open(f"{model_name}.hef", “wb”) as f:
f.write(hef)

print(f"DONE! saved as {model_name}.hef")

the output of compile_model.py:
Traceback (most recent call last):
File “/workspace/compile_model.py”, line 1, in
from hailo_sdk_client import ClientRunner
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/_init_.py”, line 29, in
import hailo_model_optimization # noqa: F401
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/_init_.py”, line 53, in
tf.constant(0.0) + 1.0
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py”, line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py”, line 6002, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’ [Op:AddV2] name:

Hi @Creoconcept,

The RTX 5060 Ti (Blackwell / Compute Capability 12.0) is not currently supported by the Hailo Dataflow Compiler. The documentation specifies supported GPU architectures as Pascal, Turing, Ampere, and Ada only. You can see this in the System Requirements section here:

Thanks,

thanks for the reply

when will support be added?

Hi @Creoconcept,

For time being unfortunately there is nothing specific I can share about timelines.

Thanks,