HailoRT minimal working example for Python and Hailo8

Hi everybody,

Just recently, we started to experiment with the Hailo8 NPU on the official Raspberry Pi 5 AI HAT. We research machine learning methods for acoustic imaging (yes, this is related) and want to test them on smaller machines. Since NVIDIA embedded devices are very clunky and the Google edge devices are EOL, we were driven to check out the Hailo devices.

While documentation regarding the SDK and the dataflow compiler is very good, we had a hard time getting started with the HailoRT inference Python wrapper. While the model zoo is rich in examples and TAPPAS exists, the provided applications are focused heavily on well established, pre-trained models, transfer learning and a usage of the CLI. Since we have specialized models, none of the official model workflows were applicable to our problems. We couldn’t find a bare-bones Hello World example that lets us isolate the actual driver code that makes the hardware calls, which cost us some time. For this reason, we want to share a minimal code snippet that runs HailoRT in a streaming context without additional modules, specific models or optional hardware to help other developers get started quicker.

A minimal working example

# running on HailoRT v4.19.0, Raspberry Pi 5 AI HAT (Hailo8, python 3.10)
import numpy as np
import hailo_platform as hpf

hef = hpf.HEF("my_model.hef")

with hpf.VDevice() as target:
    configure_params = hpf.ConfigureParams.create_from_hef(hef, interface=hpf.HailoStreamInterface.PCIe)
    network_group = target.configure(hef, configure_params)[0]
    network_group_params = network_group.create_params()

    input_vstream_info = hef.get_input_vstream_infos()[0]
    output_vstream_info = hef.get_output_vstream_infos()[0]

    input_vstreams_params = hpf.InputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=hpf.FormatType.FLOAT32)
    output_vstreams_params = hpf.OutputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=hpf.FormatType.FLOAT32)

    input_shape = input_vstream_info.shape
    output_shape = output_vstream_info.shape

    print(f"Input shape: {input_shape}, Output shape: {output_shape}")

    with network_group.activate(network_group_params):
        with hpf.InferVStreams(network_group, input_vstreams_params, output_vstreams_params) as infer_pipeline:
            for _ in range(10):
                random_input = np.random.rand(*input_shape).astype(np.float32)
                input_data = {input_vstream_info.name: np.expand_dims(random_input, axis=0)}
                results = infer_pipeline.infer(input_data)
                output_data = results[output_vstream_info.name]
                print(f"Inference output: {output_data}")

Edits are welcome.

While we understand that other Hailo products and modules are efficient when used together, we don’t understand how it was so difficult to find this example. Is there any good reason to not have this code as an introductory example? Compare this to the Google Coral-TPU inference documentation, which is incredibly easy to find and set up. Since the code snippet above is also incredibly short and readable, we truly wonder why it is not hosted on one of your GitHub pages.

2 Likes

Hey @j.tschavoll,

First off, thanks a ton for your contribution! Your minimal working example (MWE) is awesome and really drives home the importance of having simple, clear, and beginner-friendly examples for developers to use as a starting point when working with Hailo devices.

I wanted to let you know that we’ve got some inference working examples ready to go in our GitHub repo. Check them out here:

Feel free to take a look and let us know what you think. Your feedback is super valuable in helping us make our resources even better for the developer community.

Thanks again for your contribution and for helping us improve the Hailo experience for everyone!

Best Regards,
Omria

@j.tschavoll and @omria
Here is our attempt to have simple python code to get started with Hailo hardware. We developed a python SDK wrapper over HailoRT that makes working with Hailo devices easier: DeGirum/hailo_examples. We added jupyter notebooks illustrating various use cases: running different models (like detection, classification, pose detection, and segmentation), pipelining two models, running two models in parallel, tracking, and zone counting.

Here are some code snippets.

Working with images:

import degirum as dg, degirum_tools
inference_host_address = "@cloud" # set to @local if you want to run inference on your local machine
zoo_url = "degirum/models_hailort" 
token = degirum_tools.get_token() # paste your token here or leave it empty if running on local machine
# set model name, inference host address, zoo url, token, and image source
model_name = "yolov8n_relu6_coco--640x640_quant_hailort_hailo8l_1"
image_source='../assets/ThreePersons.jpg' # source can be image url, file path, numpy array

# load AI model
model = dg.load_model(
    model_name=model_name,
    inference_host_address=inference_host_address,
    zoo_url=zoo_url,
    token=token
)

# perform AI model inference on given image source
print(f" Running inference using '{model_name}' on image source '{image_source}'")
inference_result = model(image_source)

# print('Inference Results \n', inference_result)  # numeric results
print(inference_result)
print("Press 'x' or 'q' to stop.")

# show results of inference
with degirum_tools.Display("AI Camera") as output_display:
    output_display.show_image(inference_result)

Working with video streams:

import degirum as dg, degirum_tools
inference_host_address = "@cloud" # set to @local if you want to run inference on your local machine
zoo_url = "degirum/models_hailort" 
token = degirum_tools.get_token() # paste your token here or leave it empty if running on local machine
# set model name and video source
model_name = "yolov8n_relu6_coco--640x640_quant_hailort_hailo8l_1"
video_source = '../assets/Traffic.mp4' # source can be web camera, rtsp url or video file

# load AI model
model = dg.load_model(
    model_name=model_name,
    inference_host_address=inference_host_address,
    zoo_url=zoo_url,
    token=token
)

# run inference on video stream and display results
with degirum_tools.Display("AI Camera") as output_display:
    for inference_result in degirum_tools.predict_stream(model, video_source):
        output_display.show(inference_result)```

Thank you for the link, this is indeed more generally usable than the specific model/architecture examples. However, I see some issues from a didactic perspective:

  1. A new user might miss these essential tools because they sit in a file called utils.py, which does not sound as if it contained such essentials. Consider referencing it in the getting started guide.
  2. The contents of utils.py are still too much for a hello world-esque example. I can see from the way that the HailoAsyncInference class is set up that you are using Python in a style that is more commonly found on embedded systems and in C++ language (queues, callbacks, etc.). I assume, this is to increase efficiency. While this should be put into practice, consider adding a variant of my given MWE and clearly label it as “Hello World” or some pun like “Hailo World!”. This seems too important to skip and should be the first point of contact for new users IMO. Again, I have to mention the simplicity of the (now) LiteRT (RIP TFLite) guide. I think the most valuable way would be to introduce a new user to the “Hello World” application, and then immediately follow up by a more advanced example that then describes why it exists and why it is better. For example: “While it would be easy to call Hello World in a loop, consider the efficiency increase when using streams!”

Ultimately, you decide. For us, a framework/library/driver/API/engine increases in usefulness the easier it is to work with. Getting started is a part of that and I believe that more people feel the same. While I understand the importance of established models, an inference engine should always demonstrate how to use it with “my_model.x” before giving examples with neatly wrapped and managed public models.

Again, I have to mention that your given example seems perfect for us, regardless of how well it works for new users. Keep up the good work! Hailo seems to fill the hole that the Coral TPU discontinuation left!