Custom ONNX models on H8L Raspberry

Hey there!
I want to inference my own trained models converted to ONNX format.
How do i convert ONNX to HEF for inference?
Do I need Data flow compiler and is it avaivable?

Also, what is the range of quantization, can i get fp16 on H8L?
And how do i measure model fps on inference, do i need for that HailoRT?

Hello @avoqun,

Welcome to the Hailo Community! To convert your ONNX model to HEF (Hailo Executable Format), you’ll need to use the DFC (Data Flow Compiler).
The DFC will be publicly available very soon.

Regarding quantization, you have options for either 8-bit or 4-bit precision on the weights and 8b/16b uint on the data/activation.

To measure your model’s performance in frames per second (FPS), you can use the HailoRT tool. Simply run the following command:

hailortcli run {HEF}


1 Like

Welcome to the Hailo Community!

The Hailo Dataflow Compiler contains tutorials that will teach you step by step on how to convert models from ONNX or TFLite to HEF. Use the following command inside the DFC to start the Jupyter Notebooks.

hailo tutorial

One goal of using the Hailo-8/8L AI accelerators is to run inference much more energy efficient than using a GPU to avoid active cooling, run in higher ambient temperature e.g., outside in the sun, allow battery operation and save cost.

One part to achieve this is to use integer computation instead of floating point. That is why we use integer computation only.

A second part is to have fewer external memory accesses. This can be achieved by quantizing data and weights and therefore keep more data on chip. Our hardware supports 4, 8 and 16 bit. This can be selected for each layer. By default 8-bit is used. For large models and layer we can quantize some layers to 4-bit and therefore save memory and if your input or output data needs a larger range you can use 16-bit layers.

The Hailo Dataflow Compiler uses different optimization algorithms to ensure the quantized model runs with minimal degradation.

As my colleague wrote you can use HailoRT CLI. Try --help to get additional options. You can run multiple models with run2, measure FPS, bandwidth, latency and more.
If you like to see FPS live try the HailoRT CLI monitor.
Open two terminals, run the monitor in one terminal

hailortcli monitor

and your app or hailortcli run/run2 in the other after setting the HAILO_MONITOR environment variable to 1

hailortcli run model.hef