Running model on Hailo with ONNX Runtime — Workflow Clarification

Hi everyone,

I’m attempting to run DinoV2/V3 models on Hailo hardware, but I’ve run into some challenges that I’d like to clarify with the community.

Context:
From what I understand, the Hailo Dataflow Compiler (DFC) doesn’t fully support certain internal layers used in Dino models. As a workaround, I found the ONNX Runtime Hailo Execution Provider and hoped to run the exported Dino ONNX model directly, leveraging at least partial acceleration from the Hailo device.

Current Status:

  • I successfully compiled the model and can run inference using ONNX Runtime.

  • However, when monitoring with hailortcli monitor, the Hailo device does not appear to be utilized.

  • It seems the inference falls back to CPU execution, even though the ONNX Runtime Hailo examples use the accelerator correctly.

Questions:

  1. What is the correct workflow to convert an ONNX model with the DFC when some layers are unsupported? Is there documentation or a working example for generating a Hailo-compatible ONNX file?

  2. Are there plans to further develop or expand ONNX Runtime Hailo integration, especially for models with unsupported layers?

  3. Is it feasible to run Dino models on Hailo in any form (e.g., partial offloading), or would you recommend an alternative approach?

  4. The ONNX Runtime README mentions potential “extraction capabilities” in the DFC — is this feature planned or available to help support models like Dino?

Any guidance or pointers would be greatly appreciated.

Thank you in advance for your help!

Best,
Francesco

Hey @Francesco_Iaia,

Welcome to the Hailo Community!

Converting ONNX Models with Unsupported Layers

Our standard workflow uses the ClientRunner API with three stages: translation (translate_onnx_model()), optimization, and compilation to generate the HEF file for deployment. We support ONNX opset versions 8 and 11–17.

If your model has unsupported layers, the compiler will unfortunately fail. There’s no automatic fallback or partial offloading - you’ll need to manually handle these layers by either replacing them with supported alternatives or restructuring your model. I’d recommend checking our supported layers table to see what operations work with our hardware.

ONNX Runtime Integration

The ONNX Runtime Hailo Execution Provider works well, but only when your model is fully compiled and all layers are supported. If there are unsupported layers, inference will just fall back to CPU (as you’ve seen). Right now, we don’t have plans documented for partial offloading or automatic fallback for mixed models.

Running Dino Models

For DinoV2/V3 models - if they contain unsupported layers, they can’t be fully compiled for Hailo hardware at this time. We don’t currently have native support for partial offloading where some layers run on Hailo and others on CPU. Your only option would be to manually modify the model or split it yourself and handle the data transfer between CPU and Hailo.

Model Extraction Feature - This is the Way Forward

The model extraction feature using get_hailo_runtime_model() is actually the recommended approach for integrating your model with ONNX Runtime. Here’s how it works:

Once you’ve successfully compiled your model (assuming it uses only supported layers), you can use this feature to extract a new ONNX model where the Hailo-compiled subgraph is represented as a special node that the Hailo Execution Provider can execute. This essentially creates a hybrid ONNX model that can seamlessly offload the supported portions to Hailo hardware during inference.

The key requirement is that your model needs to be cleanly divisible into pre-processing, main, and post-processing sections, with the main section being fully Hailo-compatible. The extraction process then creates an ONNX model optimized for the Hailo Execution Provider.

You can see a working example of this workflow here: ONNX Runtime with Hailo Example.

Bottom line: For now, models need to use only supported layers to run on Hailo hardware. If you’re working with Dino models, you’d need to modify them to fit our supported layer set, then use the extraction feature to integrate with ONNX Runtime.

Hope this clarifies things!

More resources: