Hailo Emulation Failing

Has anyone been able to successfully run the emulator? I am getting a similar result to @user59 in this topic.

Basically, after build hailortcli with HAILO_BUILD_EMULATOR set to ON it fails with a HAILO_DRIVER_NOT_INSTALLED(64) error.

Steps to reproduce:

  1. On a clean ubuntu 22.04 system with no physical hailo card installed git clone the latest release: git clone --branch v4.21.0 https://github.com/hailo-ai/hailort.git (note the master branch seems to be broken - does not build)

  2. Run cmake with emulation set on: cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Release -DHAILO_BUILD_EMULATOR=ON -DHAILO_BUILD_EXAMPLES=ON

  3. Install hailortcli with: sudo cmake --build build --config release --target install

  4. Run hailortcli scan or execute any of the examples e.g. ./build/hailort/libhailort/examples/cpp/vstreams_example/cpp_vstreams_example

Result is:

[HailoRT] [error] Can't find hailort driver class. Can happen if the driver is not installed, if the kernel was updated or on some driver failure (then read driver dmesg log)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_NOT_INSTALLED(64) - Failed listing hailo devices
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_NOT_INSTALLED(64)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_NOT_INSTALLED(64)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_NOT_INSTALLED(64)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_NOT_INSTALLED(64)
Failed create vdevice, status = 64

Anyone manage to get either CPU or GPU emulation working?

We want to be able to run CI release tests for Hailo without our build systems needing to have a physical chip in them, which is not practical in some cases.

Also worth noting I tried to rebuild the hailo drivers from source with the “for internal use only” EMULATION flag set. This gets you a little bit further: instead of the driver error it now gets an HAILO_OUT_OF_PHYSICAL_DEVICES error …

I’m stuck on the same thing as you are. I didn’t try to enable the EMULATOR flag though. Looking at the source code it also doesn’t appear to do much besides changing some timeouts.

Are there any examples for how to emulate a device?

I thought it would be helpful to follow up on this post and described what I discovered regarding Hailo “emulation”.

What I was trying to achieve was to emulate the output of a Hailo processor running a HEF file using either a CPU or GPU on a system that does not have a physical Hailo processor. This is not possible.

What can be achieved through the Data Flow Compiler is to test a trained model at different stages of compilation using the Hailo Archive: HAR file. A HAR stores the model in 3 different states as it progresses from ONNX to the HEF: translation, optimization and then quantization. If you have the HAR then you can load it and run any one of the 3 models stages using the inference context and the three flags: SDK_NATIVE, SDK_FP_OPTIMIZED, SDK_QUANTIZED. There is a defined python interface to inference in those different modes from a HAR but there are no C++ samples. You cannot do this with a HEF as it only contains the final quantized format of the model.

Note that the HAILO_BUILD_EMULATOR flag in the HailoRT library is for internal use by the Hailo team: it allows them to build the runtime for testing with their internal FPGA. It is not related to HAR inference context described above.

Hey @julien.flack , @hgaiser ,

To use the Dataflow Compiler (DFC) Emulator in the Hailo toolchain, you can simulate model inference at different stages without needing physical Hailo hardware. Here’s a practical guide:

Emulator Modes Overview

The DFC emulator has three main modes:

  1. SDK_NATIVE:

    • Runs the original float32 model (TensorFlow/ONNX)
    • Good for validating that your parsed model matches the source
  2. SDK_FP_OPTIMIZED:

    • Applies model modifications like normalization or resizing
    • Still uses float32 precision
  3. SDK_QUANTIZED:

    • Simulates the quantized model
    • Gives you a good estimate of final hardware accuracy (not bit-exact though)

Basic Usage with Python API

from hailo_sdk_client import ClientRunner, InferenceContext

# Load your HAR file
runner = ClientRunner(har='path_to_model.har')

# Pick your emulation mode
with runner.infer_context(InferenceContext.SDK_NATIVE) as ctx:
    results = runner.infer(ctx, input_data)

Just swap SDK_NATIVE for SDK_FP_OPTIMIZED or SDK_QUANTIZED depending on what you want to test.

Typical Workflow

  1. Test Original Model (Native Emulator):

    with runner.infer_context(InferenceContext.SDK_NATIVE) as ctx:
        native_output = runner.infer(ctx, input_data_normalized)
    
  2. Test Model Modifications (FP Optimized Emulator):

    runner.optimize_full_precision()
    with runner.infer_context(InferenceContext.SDK_FP_OPTIMIZED) as ctx:
        modified_output = runner.infer(ctx, input_data)
    
  3. Check Quantized Accuracy:

    with runner.infer_context(InferenceContext.SDK_QUANTIZED) as ctx:
        quantized_output = runner.infer(ctx, input_data)
    

Quick Notes

  • The quantized emulator gives good accuracy estimates but isn’t bit-exact with hardware
  • Some input formats (like NV21 → YUV conversion) aren’t supported in emulator mode
  • Make sure your calibration dataset is high quality for good quantization results
1 Like

Thanks for your response @omria , I haven’t had time yet to look into this much, but I plan to do so soon.

I see from your response that this requires a .har file, instead of the .hef file used when deployed. Is there a way to use the .hef file directly?

Similarly, is it possible to use hailort instead of hailo-dataflow-compiler?

I want to develop the inference code locally and prefer to keep everything the same as much as possible. If I have to call a different code path for development vs deployment then I’m only partially testing my code.

Hey @hgaiser,

Unfortunately, there’s no way to run a HEF file directly in a CPU/GPU emulator like you can with HAR files in the DFC emulator.

Here’s the current situation:

HailoRT’s emulator build (HAILO_BUILD_EMULATOR) is only for internal FPGA testing at Hailo and isn’t available in public releases. It doesn’t provide CPU/GPU fallback for HEF files.

The only public software emulator is in the Dataflow Compiler (DFC), but it only works with HAR files (the intermediate format), not HEF files. You can use it through the DFC Python/C++ APIs or CLI with options like:

  • SDK_NATIVE
  • SDK_FP_OPTIMIZED
  • SDK_QUANTIZED

This gives you float-32, FP-optimized, or quantized validation on CPU/GPU.

Why HEF won’t work: A HEF is already the final quantized hardware blob, so there’s nothing for the emulator to “re-interpret” - it’s specifically compiled for Hailo hardware.

Bottom line: If you need CPU/GPU emulation for development and debugging, you’ll need to work with the HAR file before final compilation to HEF.

Hope this clarifies things!

Thanks again for your response @omria . Using the DFC in my inference application has two major downsides:

  1. It is an additional dependency that’s not normally required during deployment.
  2. The requirements of the DFC are quite strict, which makes using it in my inference code more difficult. For example:
    1. Requiring older and exact versions of for example protobuf==3.20.3, onnxruntime==1.18.0, etc.
    2. No whl for python3.11 (while this does exist for hailort).
    3. No whl for ARM (exists for hailort).
    4. No whl for Windows (exists for hailort).

I understand the HEF format is a binary blob that can’t easily be processed by CPU or GPU. If possible, I would consider HAR support in hailort a valid option as well. Even better if the VDevice would directly support loading HAR files (in which case it would execute the HAR file on CPU).

I’m not expecting a quick solution for this, but it would be great of course. Meanwhile I will figure out some other solution. If I have to run another code path during development I might as well work on ONNX files, so maybe I’ll go that route.