How to use many models on one rpi5 video stream?

Hi,

I was finally able to convert YOLO models to HEF format for RPi Hailo8l. Now, I need your help to understand how we can run multiple models on one input. On RPi5, to use a model, we should create the device and then use it. To run that HEF model on an image, I still need to create a Hailo device and run it. But what if I have many models, like for an ALPR system, to detect a car, then a plate, then OCR? Can you suggest the steps on how to use that on RPi? And to use those models on only one video input?

Thanks

Hi DimaR,

Nice Work!
Here are several options for running multiple models on one input with Hailo-8L on Raspberry Pi:

  1. Sequential Execution:

    • Run one model after another in a sequence (car detection → plate detection → OCR) using a single Hailo device. This is straightforward and doesn’t require complex handling.
  2. Multi-threading / Parallel Execution:

    • Use Python’s threading or multiprocessing to run each model in parallel. Each thread/process can handle a different model, and they can all share the same video input. Suitable if your workload requires more concurrency.
  3. Scheduler with Context Switching:

    • If you’re using a scheduler, it can manage multiple models by switching between tasks based on priorities or timing. Python schedulers like schedule or APScheduler can help manage when each model runs. It’s useful for more complex systems.
  4. Model Pipelining:

    • Use a pipelined approach where each model takes the output from the previous one. For example, feed the detected car output into the plate detection model, then pass the result to OCR. This can be done in a single loop using a scheduler or manually coded pipeline.
  5. HailoRT Scheduler:

    • The Hailo runtime offers its own built-in scheduler, which can optimize the execution of multiple models on a single device. This is a more advanced option that takes advantage of Hailo’s parallelism.

You can choose the method that best fits your performance and complexity requirements.

Let me know which one interests you, and I can help with the details!

Hey,

if possible, can you provide an example of how to run several models on one input for option 1 and option 2? For option 1, running the models one by one, and for option 2, running one model continuously, like car detection, and then only when a car is detected, running plate and OCR recognition in another process without stopping the main loop.

I wasn’t able to find an option to load and run a model on an image/stream on the RPi. Maybe I’m missing something and it’s super simple, but if possible, please point me in the right direction. Thanks.

To get started, I recommend checking out the Hailo Application Code Examples on GitHub:

These examples demonstrate how to use the Hailo API across various use cases and frameworks. While they haven’t been specifically optimized for Raspberry Pi, they provide a great foundation that you can build upon and adapt to your needs.

To give you a head start, here’s some pseudo-code illustrating a couple potential approaches. Keep in mind this is just a demo - you’ll need to modify it to fit your specific use case and environment.


1. Sequential Execution Example:

In this method, you run models one after the other in sequence. Here’s how you can do it using a single input stream on a Raspberry Pi:

from hailo_platform import Device, HEF

# Initialize the Hailo device
device = Device()

# Load each model (YOLO, plate detection, OCR)
yolo_model = HEF("yolo_model.hef")
plate_model = HEF("plate_model.hef")
ocr_model = HEF("ocr_model.hef")

# Run models sequentially on the same input
input_frame = capture_video_frame()

yolo_output = device.infer(yolo_model, input_frame)
plate_output = device.infer(plate_model, yolo_output)
ocr_output = device.infer(ocr_model, plate_output)

# Process and use the final OCR output
process_results(ocr_output)

In this case, the output of one model can be passed as the input to the next model.


2. Multi-threading / Parallel Execution Example:

Here, you use Python’s threading to run each model in parallel on the same input.

import threading
from hailo_platform import Device, HEF

# Initialize the Hailo device
device = Device()

# Load multiple models
yolo_model = HEF("yolo_model.hef")
plate_model = HEF("plate_model.hef")
ocr_model = HEF("ocr_model.hef")

# Define function for running each model
def run_model(model, input_frame):
    return device.infer(model, input_frame)

# Capture input frame
input_frame = capture_video_frame()

# Run models in parallel
yolo_thread = threading.Thread(target=run_model, args=(yolo_model, input_frame))
plate_thread = threading.Thread(target=run_model, args=(plate_model, input_frame))
ocr_thread = threading.Thread(target=run_model, args=(ocr_model, input_frame))

# Start threads
yolo_thread.start()
plate_thread.start()
ocr_thread.start()

# Join threads (wait for all to finish)
yolo_thread.join()
plate_thread.join()
ocr_thread.join()

# Process results
process_results()

In this example, each model runs in its own thread, and they all process the input in parallel.

Thanks,
but there are still some problems. I tried the examples, and some of them worked, but only with one model, not several. I wasn’t able to create a working example with just an image. So, can you please create an extremely simple working example where we have an input image, run two models on that image, and just print the results? Nothing more. After that, I think I will be able to understand the situation better.

1 Like

For example, here is one option how I was able to run with one code all 3 detections. But it looks like a very weird implementation. Can you suggest or show some more correct examples?


Code id just a working example without any cleaning or documentation for testing purposes

import cv2
import numpy as np
from picamera2.devices import Hailo

def letterbox_image(img, size):
    iw, ih = img.shape[1], img.shape[0]
    w, h = size
    scale = min(w / iw, h / ih)
    nw, nh = int(iw * scale), int(ih * scale)
    image_resized = cv2.resize(img, (nw, nh), interpolation=cv2.INTER_LINEAR)
    image_padded = np.full((h, w, 3), 128, dtype=np.uint8)
    top = (h - nh) // 2
    left = (w - nw) // 2
    image_padded[top:top + nh, left:left + nw, :] = image_resized
    return image_padded, scale, left, top

def extract_detections(hailo_output, original_w, original_h, padded_w, padded_h, scale, pad_left, pad_top, class_names, threshold=0.5):
    results = []
    for class_id, detections in enumerate(hailo_output):
        for detection in detections:
            score = detection[4]
            if score >= threshold:
                y0, x0, y1, x1 = detection[:4]
                x0_pixel = int((x0 * padded_w - pad_left) / scale)
                y0_pixel = int((y0 * padded_h - pad_top) / scale)
                x1_pixel = int((x1 * padded_w - pad_left) / scale)
                y1_pixel = int((y1 * padded_h - pad_top) / scale)
                bbox = (x0_pixel, y0_pixel, x1_pixel, y1_pixel)
                results.append([class_names[class_id], bbox, score])
    return results

def draw_detections(img, detections):
    for class_name, bbox, score in detections:
        x0, y0, x1, y1 = bbox
        label = f"{class_name}"
        cv2.rectangle(img, (x0, y0), (x1, y1), (0, 255, 0), 2)
        cv2.putText(img, label, (x0, y0 - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
    return img

if __name__ == "__main__":
    with open("coco.txt", 'r', encoding="utf-8") as f:
        class_names_yolo = f.read().splitlines()

    hef_one = "/usr/share/hailo-models/yolov8s_h8l.hef"
    license_plate_model = "./license_plate_model.hef"
    hef_model_ocr = "./license_plate_ocr_new.hef"
    class_names = list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')

    with Hailo(hef_one) as hailo:
        img = cv2.imread("./cars/car_17.jpg")
        original_w, original_h = img.shape[1], img.shape[0]
        input_size = (640, 640)
        padded_img, scale, pad_left, pad_top = letterbox_image(img, input_size)
        frame = cv2.resize(padded_img, (hailo.get_input_shape()[1], hailo.get_input_shape()[0]))
        results = hailo.run(frame)
        car_detections = extract_detections(results[0], original_w, original_h, input_size[0], input_size[1], scale, pad_left, pad_top, class_names_yolo, threshold=0.5)
        
        if car_detections:
            car_x0, car_y0, car_x1, car_y1 = car_detections[0][1]
            cropped_car_img = img[car_y0:car_y1, car_x0:car_x1]
            cv2.imwrite("detected_car.jpg", cropped_car_img)

    plate_detections = []
    if car_detections:
        with Hailo(license_plate_model) as hailo:
            img_car = cv2.imread("./detected_car.jpg")
            car_crop_w, car_crop_h = img_car.shape[1], img_car.shape[0]
            input_size = (640, 640)
            padded_img, scale, pad_left, pad_top = letterbox_image(img_car, input_size)
            frame = cv2.resize(padded_img, (hailo.get_input_shape()[1], hailo.get_input_shape()[0]))
            results = hailo.run(frame)
            plate_detections = extract_detections(results[0], car_crop_w, car_crop_h, input_size[0], input_size[1], scale, pad_left, pad_top, ['license plate'], threshold=0.5)
            
            if plate_detections:
                plate_x0, plate_y0, plate_x1, plate_y1 = plate_detections[0][1]
                cropped_plate_img = cropped_car_img[plate_y0:plate_y1, plate_x0:plate_x1]
                cv2.imwrite("detected_plate.jpg", cropped_plate_img)

    ocr_detections = []
    if plate_detections:
        with Hailo(hef_model_ocr) as hailo:
            img_plate = cv2.imread("./detected_plate.jpg")
            plate_crop_w, plate_crop_h = img_plate.shape[1], img_plate.shape[0]
            input_size = (640, 640)
            padded_img, scale, pad_left, pad_top = letterbox_image(img_plate, input_size)
            frame = cv2.resize(padded_img, (hailo.get_input_shape()[1], hailo.get_input_shape()[0]))
            results = hailo.run(frame)
            ocr_detections = extract_detections(results[0], plate_crop_w, plate_crop_h, input_size[0], input_size[1], scale, pad_left, pad_top, class_names, threshold=0.5)
            
            if ocr_detections:
                detections_sorted = sorted(ocr_detections, key=lambda x: x[1][0])
                license_plate_text = ''.join([det[0] for det in detections_sorted])
                print(f"License Plate: {license_plate_text}")

    final_img = img.copy()
    final_img = draw_detections(final_img, car_detections)
    if plate_detections:
        plate_x0, plate_y0, plate_x1, plate_y1 = plate_detections[0][1]
        adjusted_plate_detections = [[det[0], 
                                     (car_x0 + plate_x0 + det[1][0], car_y0 + plate_y0 + det[1][1], 
                                      car_x0 + plate_x0 + det[1][2], car_y0 + plate_y0 + det[1][3]), 
                                     det[2]] for det in ocr_detections]
        final_img = draw_detections(final_img, adjusted_plate_detections)

    cv2.imwrite("output_with_all_detections.jpg", final_img)

And some results how it works, if someone is interested



But sometimes I get this error, maybe someone knows something about it too?

terminate called after throwing an instance of 'std::system_error'
  what():  Resource deadlock avoided
Aborted

First of all, great job!

However, it seems you’re using the Pi Camera with Hailo, but you should be using our native Python API. I recommend you take a look at how we utilize the Python API by referring to the generic inference class in the following file:

Generic Inference Class - Hailo Example

This class simplifies the process, making it easier to work with our API.

Below is an example of how you can use the Python API with the infer_image() method from that utility:

import hailo_platform as hp
import numpy as np
from PIL import Image
from utils import infer_image  # Import the generic inference method

# Paths to your models (HEF files)
model1_path = 'path_to_model1.hef'
model2_path = 'path_to_model2.hef'

# Load and preprocess the input image
input_image_path = 'input_image.png'
image = Image.open(input_image_path).resize((224, 224))  # Adjust to the model's input size
input_data = np.array(image).astype(np.float32) / 255.0  # Normalize input image data

# Initialize the Hailo device
device = hp.Device()

# Load both models (HEF)
hef1 = hp.HEF(model1_path)
hef2 = hp.HEF(model2_path)

# Configure network groups for both models
network_group_1 = device.configure(hef1)
network_group_2 = device.configure(hef2)

# Prepare input data for inference
input_data = np.expand_dims(input_data, axis=0)  # Add batch dimension

# Perform inference for both models using the `infer_image()` function
with network_group_1, network_group_2:
    # Inference for Model 1
    results_model_1 = infer_image(device, hef1, input_data)
    
    # Inference for Model 2
    results_model_2 = infer_image(device, hef2, input_data)

# Print the results
print("Results from Model 1:", results_model_1)
print("Results from Model 2:", results_model_2)

Make sure to adjust the paths and input sizes as per your models’ requirements. This should align better with our native API and provide a cleaner, more efficient inference workflow.

Let me know if you need any further clarifications!

To get a better understanding

from utils import infer_image # Import the generic inference method

this infer_image method doesnt exists in the utils file, so we should create it? Or I am missing somethign or the url pointing me to the wrong utils file?

Hello this is may a basic questions but when I looked from the hailo model zoo, I found that there is no model for plate finder and ocr, and I wondered if you exported the models for 8l or used ready-made models to use the plate finder and ocr model in hailo8l.

My apologies! What I meant is that you can use the run function from the utils if it suits your needs, or alternatively, you can create a custom image_infer function based on how the run function works.

Hey, I trained yolo models by my license plate and ocr for license plates own and then converted from pt → onnx → hef with model zoo.

But it wasnt super straight forward

1 Like

Hi @omria,

I spent some time investigating your previous suggestion about using the generic inference class. I compared it with the PiCamera HAILO class, and they seem very similar, if not the same in logic. With that in mind, I was able to modify the official HAILO class slightly. Instead of creating the device each time, I passed the device into the class, allowing me to create and use different models in a single script, like this:

mod1= "model1.hef"
mod2= "model2hef"
mod3= "model3hef"

params = VDevice.create_params()
params.scheduling_algorithm = HailoSchedulingAlgorithm.ROUND_ROBIN
vdevice = VDevice(params)

hailo_yolov8 = Hailo(vdevice, mod1)
hailo_license_plate = Hailo(vdevice, mod2)
hailo_ocr = Hailo(vdevice, mod3)

Then, I integrated it with Picamera2 like this:

Initialize camera with desired resolution and settings
Start the camera stream

Set camera controls (e.g., exposure)

While capturing frames from the camera:
    Capture a low-resolution frame (lores_frame)
    Capture a high-resolution frame (main_frame)

    Try:
        Prepare the lores_frame for YOLO model input (resize, pad, etc.)
        Run YOLO model to detect cars in the frame
        Extract car detection results

        If any cars are detected:
            Track detected cars
            For each detected car:
                Crop the car image from the lores_frame
                Prepare the cropped car image for license plate detection model input
                Run the license plate detection model

                If any license plates are detected:
                    Crop the plate image from the car image
                    Prepare the cropped plate image for OCR model input
                    Run the OCR model to recognize the plate characters

                    If characters are detected:
                        Sort the characters based on their position
                        Combine detected characters to form the license plate text

                        Display the license plate text on the frame at the detected location
                        Optionally, send the frame and license plate info to external services (e.g., via Telegram)

            Draw tracking info (bounding boxes, labels) on the frame

    Catch and log any errors during the detection process

    Lock and update the output frame for display or further processing

Technically, this works, and I even tested it in a real-world scenario, and the results were promising. However, I have several questions and concerns:

  1. Single-loop processing: Right now, everything runs in one loop, which feels inefficient. Ideally, I’d prefer to create a separate API that accepts an image and returns all detected cars, plates, and plate information with bounding boxes and results. My idea is to run the PiCamera and basic YOLO model on one script to detect cars. When a car is detected, I’d send the frame to a separate OCR API where the plate detection and recognition happen.
  2. API Approach: This approach would help avoid resource restrictions. We could even create a server queue for extracting plate numbers. For example, if we’re detecting thousands of cars on a highway, we could populate the queue and get the results slightly later, with detailed plate numbers. It wouldn’t be fully real-time but could be highly useful for future projects.
  3. Main issue: My biggest problem is separating the logic. When using the Hailo class, I need to provide the device. Unfortunately, the device cannot be used more than once simultaneously, which means I can’t run two detection scripts in parallel. This is confusing to me. Could you elaborate on what this “device” is, why we need it, and why I cannot run the same Hailo object detection script multiple times concurrently?
  4. After spending many hours struggling, I still haven’t been able to create a simple, working solution to load a model and use it. I’m wondering if I’m doing something completely wrong, or if this approach isn’t intended to work like this? Right now, the whole generic inference class seems unnecessarily complex for simple testing. At the same time, I find it too confusing to extract just the parts I need from it.
    If possible, could you create or show a custom image_infer function? Ideally, this function would allow us to load a model and run inference without all the extra features like queues, async runs, and other complexities.

Thanks in advance for your help!

I can share complete code if to answer my questions you need to understand what I have done exactly

One more image of the created solution to show how it works, still excited about it :slight_smile:

@DimaR
Thank you for starting this discussion. From last week i am trying to find details abot how to work with custom models. I am currently exploring and this is going to help me a lot.
It would be really great if you could little bit explain compilation stages.(if possible in detail). I have everything setup but not able to compile any basic till date. Now i am just trying to convert a simple yolov8s model(no customization). Although my end goal is to convert a custom pose model into hef format.
I am not sure if which part of information i am missing. I am new to NN. And unclear to me if hailo does not supports some functionality. I don’t exactly remember but i read somewhere, hailo does not support a few transformers (may be Reshape)? So can you also share some useful resources that can help to actual understand requirements and problems of hailo. And Why it is not easy to compile.

I do have a few more question about when we convert onnx model to har. How to deal if something is not supported on hailo?


This was the error when i was trying to compile to hef.

Hey @DimaR

This is a simple examples of infer on image , but if you are trying to use object detection on pictures with custom model , you can start by running you’re model in here Hailo-Application-Code-Examples/runtime/python/object_detection at main · hailo-ai/Hailo-Application-Code-Examples · GitHub ( and just adjusting the post process )

import hailo_platform
import numpy as np
import cv2
from hailo_platform.pyhailort import InferVDevice, VDeviceParams, FormatType

def image_infer(hef_path, image_path):
    # Create a Virtual Device (VDevice)
    params = VDeviceParams()
    vdevice = InferVDevice(params)
    
    # Load the HEF (Hailo Executable File)
    network_group = vdevice.configure(hef_path)
    network_group.activate()

    # Preprocess the input image
    image = cv2.imread(image_path)
    image_resized = cv2.resize(image, (224, 224))
    input_data = image_resized.astype(np.uint8)

    # Write input to the device
    input_stream = network_group.get_input_streams()[0]
    input_stream.write(input_data)

    # Run Inference (Synchronous)
    network_group.wait_for_completion()

    # Read and process the output
    output_stream = network_group.get_output_streams()[0]
    output_data = np.zeros(output_stream.get_frame_size(), dtype=np.uint8)
    output_stream.read(output_data)

    # Post-process the results (e.g., decode bounding boxes or classes)
    print("Inference result:", output_data)

    # Deactivate and release the resources
    network_group.deactivate()
    vdevice.release()

    return output_data

# Example usage:
if __name__ == "__main__":
    hef_path = "path_to_your_model.hef"
    image_path = "path_to_input_image.jpg"
    result = image_infer(hef_path, image_path)

Hi, I am working with two cameras, each with its own model for processing. How can I run the processing for both cameras in parallel, ensuring that both models operate independently?