Custom ONNX models on H8L Raspberry

avoqun · June 26, 2024, 9:15pm

Hey there!
I want to inference my own trained models converted to ONNX format.
How do i convert ONNX to HEF for inference?
Do I need Data flow compiler and is it avaivable?

Also, what is the range of quantization, can i get fp16 on H8L?
And how do i measure model fps on inference, do i need for that HailoRT?

omria · June 26, 2024, 10:40pm

Hello @avoqun,

Welcome to the Hailo Community! To convert your ONNX model to HEF (Hailo Executable Format), you’ll need to use the DFC (Data Flow Compiler).
The DFC will be publicly available very soon.

Regarding quantization, you have options for either 8-bit or 4-bit precision on the weights and 8b/16b uint on the data/activation.

To measure your model’s performance in frames per second (FPS), you can use the HailoRT tool. Simply run the following command:

hailortcli run {HEF}

Regards

KlausK · June 27, 2024, 1:03am

Welcome to the Hailo Community!

The Hailo Dataflow Compiler contains tutorials that will teach you step by step on how to convert models from ONNX or TFLite to HEF. Use the following command inside the DFC to start the Jupyter Notebooks.

hailo tutorial

One goal of using the Hailo-8/8L AI accelerators is to run inference much more energy efficient than using a GPU to avoid active cooling, run in higher ambient temperature e.g., outside in the sun, allow battery operation and save cost.

One part to achieve this is to use integer computation instead of floating point. That is why we use integer computation only.

A second part is to have fewer external memory accesses. This can be achieved by quantizing data and weights and therefore keep more data on chip. Our hardware supports 4, 8 and 16 bit. This can be selected for each layer. By default 8-bit is used. For large models and layer we can quantize some layers to 4-bit and therefore save memory and if your input or output data needs a larger range you can use 16-bit layers.

The Hailo Dataflow Compiler uses different optimization algorithms to ensure the quantized model runs with minimal degradation.

As my colleague wrote you can use HailoRT CLI. Try --help to get additional options. You can run multiple models with run2, measure FPS, bandwidth, latency and more.
If you like to see FPS live try the HailoRT CLI monitor.
Open two terminals, run the monitor in one terminal

hailortcli monitor

and your app or hailortcli run/run2 in the other after setting the HAILO_MONITOR environment variable to 1

export HAILO_MONITOR=1
hailortcli run model.hef

usrehman001 · July 30, 2024, 4:18am

you got any successful compilation of custom model into hef format? need some guidance

Topic		Replies	Views
reconciling different outputs between quantized HAR and compiled HEF General	12	123	June 4, 2025
Poor Inference Results of My MobileNetV2-UNet on Hailo8L General	10	112	February 27, 2025
Custom model Onnx to hef conversion General dfc , hailo8	1	137	April 5, 2025
Inference issue with finetuned yolov8n model General hailo8	4	83	January 27, 2025
translate and compile custom architecture General	1	22	April 11, 2025

Custom ONNX models on H8L Raspberry

Related topics