HailoRT minimal working example for Python and Hailo8

j.tschavoll · December 19, 2024, 2:34pm

Hi everybody,

Just recently, we started to experiment with the Hailo8 NPU on the official Raspberry Pi 5 AI HAT. We research machine learning methods for acoustic imaging (yes, this is related) and want to test them on smaller machines. Since NVIDIA embedded devices are very clunky and the Google edge devices are EOL, we were driven to check out the Hailo devices.

While documentation regarding the SDK and the dataflow compiler is very good, we had a hard time getting started with the HailoRT inference Python wrapper. While the model zoo is rich in examples and TAPPAS exists, the provided applications are focused heavily on well established, pre-trained models, transfer learning and a usage of the CLI. Since we have specialized models, none of the official model workflows were applicable to our problems. We couldn’t find a bare-bones Hello World example that lets us isolate the actual driver code that makes the hardware calls, which cost us some time. For this reason, we want to share a minimal code snippet that runs HailoRT in a streaming context without additional modules, specific models or optional hardware to help other developers get started quicker.

A minimal working example

# running on HailoRT v4.19.0, Raspberry Pi 5 AI HAT (Hailo8, python 3.10)
import numpy as np
import hailo_platform as hpf

hef = hpf.HEF("my_model.hef")

with hpf.VDevice() as target:
    configure_params = hpf.ConfigureParams.create_from_hef(hef, interface=hpf.HailoStreamInterface.PCIe)
    network_group = target.configure(hef, configure_params)[0]
    network_group_params = network_group.create_params()

    input_vstream_info = hef.get_input_vstream_infos()[0]
    output_vstream_info = hef.get_output_vstream_infos()[0]

    input_vstreams_params = hpf.InputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=hpf.FormatType.FLOAT32)
    output_vstreams_params = hpf.OutputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=hpf.FormatType.FLOAT32)

    input_shape = input_vstream_info.shape
    output_shape = output_vstream_info.shape

    print(f"Input shape: {input_shape}, Output shape: {output_shape}")

    with network_group.activate(network_group_params):
        with hpf.InferVStreams(network_group, input_vstreams_params, output_vstreams_params) as infer_pipeline:
            for _ in range(10):
                random_input = np.random.rand(*input_shape).astype(np.float32)
                input_data = {input_vstream_info.name: np.expand_dims(random_input, axis=0)}
                results = infer_pipeline.infer(input_data)
                output_data = results[output_vstream_info.name]
                print(f"Inference output: {output_data}")

Edits are welcome.

While we understand that other Hailo products and modules are efficient when used together, we don’t understand how it was so difficult to find this example. Is there any good reason to not have this code as an introductory example? Compare this to the Google Coral-TPU inference documentation, which is incredibly easy to find and set up. Since the code snippet above is also incredibly short and readable, we truly wonder why it is not hosted on one of your GitHub pages.

omria · December 29, 2024, 8:58pm

Hey @j.tschavoll,

First off, thanks a ton for your contribution! Your minimal working example (MWE) is awesome and really drives home the importance of having simple, clear, and beginner-friendly examples for developers to use as a starting point when working with Hailo devices.

I wanted to let you know that we’ve got some inference working examples ready to go in our GitHub repo. Check them out here:

github.com

hailo-ai/Hailo-Application-Code-Examples/blob/main/runtime/python/utils.py

from typing import List, Generator, Optional, Tuple, Dict
from pathlib import Path
from functools import partial
import queue
from loguru import logger
import numpy as np
from PIL import Image
from hailo_platform import (HEF, VDevice,
                            FormatType, HailoSchedulingAlgorithm)
IMAGE_EXTENSIONS: Tuple[str, ...] = ('.jpg', '.png', '.bmp', '.jpeg')


class HailoAsyncInference:
    def __init__(
        self, hef_path: str, input_queue: queue.Queue,
        output_queue: queue.Queue, batch_size: int = 1,
        input_type: Optional[str] = None, output_type: Optional[Dict[str, str]] = None,
        send_original_frame: bool = False) -> None:
        """
        Initialize the HailoAsyncInference class with the provided HEF model

This file has been truncated. show original

Feel free to take a look and let us know what you think. Your feedback is super valuable in helping us make our resources even better for the developer community.

Thanks again for your contribution and for helping us improve the Hailo experience for everyone!

Best Regards,
Omria

shashi · January 1, 2025, 4:52pm

@j.tschavoll and @omria
Here is our attempt to have simple python code to get started with Hailo hardware. We developed a python SDK wrapper over HailoRT that makes working with Hailo devices easier: DeGirum/hailo_examples. We added jupyter notebooks illustrating various use cases: running different models (like detection, classification, pose detection, and segmentation), pipelining two models, running two models in parallel, tracking, and zone counting.

Here are some code snippets.

Working with images:

import degirum as dg, degirum_tools
inference_host_address = "@cloud" # set to @local if you want to run inference on your local machine
zoo_url = "degirum/models_hailort" 
token = degirum_tools.get_token() # paste your token here or leave it empty if running on local machine
# set model name, inference host address, zoo url, token, and image source
model_name = "yolov8n_relu6_coco--640x640_quant_hailort_hailo8l_1"
image_source='../assets/ThreePersons.jpg' # source can be image url, file path, numpy array

# load AI model
model = dg.load_model(
    model_name=model_name,
    inference_host_address=inference_host_address,
    zoo_url=zoo_url,
    token=token
)

# perform AI model inference on given image source
print(f" Running inference using '{model_name}' on image source '{image_source}'")
inference_result = model(image_source)

# print('Inference Results \n', inference_result)  # numeric results
print(inference_result)
print("Press 'x' or 'q' to stop.")

# show results of inference
with degirum_tools.Display("AI Camera") as output_display:
    output_display.show_image(inference_result)

Working with video streams:

import degirum as dg, degirum_tools
inference_host_address = "@cloud" # set to @local if you want to run inference on your local machine
zoo_url = "degirum/models_hailort" 
token = degirum_tools.get_token() # paste your token here or leave it empty if running on local machine
# set model name and video source
model_name = "yolov8n_relu6_coco--640x640_quant_hailort_hailo8l_1"
video_source = '../assets/Traffic.mp4' # source can be web camera, rtsp url or video file

# load AI model
model = dg.load_model(
    model_name=model_name,
    inference_host_address=inference_host_address,
    zoo_url=zoo_url,
    token=token
)

# run inference on video stream and display results
with degirum_tools.Display("AI Camera") as output_display:
    for inference_result in degirum_tools.predict_stream(model, video_source):
        output_display.show(inference_result)```

j.tschavoll · January 6, 2025, 10:55am

Thank you for the link, this is indeed more generally usable than the specific model/architecture examples. However, I see some issues from a didactic perspective:

A new user might miss these essential tools because they sit in a file called utils.py, which does not sound as if it contained such essentials. Consider referencing it in the getting started guide.
The contents of utils.py are still too much for a hello world-esque example. I can see from the way that the HailoAsyncInference class is set up that you are using Python in a style that is more commonly found on embedded systems and in C++ language (queues, callbacks, etc.). I assume, this is to increase efficiency. While this should be put into practice, consider adding a variant of my given MWE and clearly label it as “Hello World” or some pun like “Hailo World!”. This seems too important to skip and should be the first point of contact for new users IMO. Again, I have to mention the simplicity of the (now) LiteRT (RIP TFLite) guide. I think the most valuable way would be to introduce a new user to the “Hello World” application, and then immediately follow up by a more advanced example that then describes why it exists and why it is better. For example: “While it would be easy to call Hello World in a loop, consider the efficiency increase when using streams!”

Ultimately, you decide. For us, a framework/library/driver/API/engine increases in usefulness the easier it is to work with. Getting started is a part of that and I believe that more people feel the same. While I understand the importance of established models, an inference engine should always demonstrate how to use it with “my_model.x” before giving examples with neatly wrapped and managed public models.

Again, I have to mention that your given example seems perfect for us, regardless of how well it works for new users. Keep up the good work! Hailo seems to fill the hole that the Coral TPU discontinuation left!

Douglas_Silva · February 27, 2025, 11:45am

Hey @omria

Is there any minimal working example like this one but using Hailo CPP API ? I would like to run something very similar but using the speed C++ Hailo API provides, I found infer_pipeline_example.cpp to be similar to this code, but I could not figure out where to place an input_image there.

I would like to see if I can achieve the same FPS performance as hailo benchmark or hailortcli commands. This would be a second question Is there a way of proving an input image file to "hailortcli run hef model.hef " and run it ?

Best regards,
Douglas Silva.

John_Doe · May 14, 2025, 8:56am

Hi @j.tschavoll,
I see that you gave inut and output as float32 in your example. I also have an onnx model with input and output as float32, I can’t convert it to float32.

Can you share the “my_model.hef” conversion steps?

Topic		Replies	Views
Working python hailo inference General	15	710	March 18, 2025
Working Python inference example with .hef on Raspberry Pi (HailoRT 4.21.0, critical request) General hailort , raspberry-pi , hailo8	6	200	May 22, 2025
Configuring the environment on Raspberry Pi 5 to run the “Hailo Application Code” Examples General hailort , raspberry-pi , installation , hailo8	2	108	March 26, 2025
Python api support in 'hailo-all' General hailort , raspberry-pi , hailo8	6	1306	November 4, 2024
Python inference for Raspberry Pi 5 General raspberry-pi	8	1260	August 24, 2024

HailoRT minimal working example for Python and Hailo8

A minimal working example

Related topics