Inference using Arducam frames

import argparse
import cv2
from picamera2 import MappedArray, Picamera2, Preview
from picamera2.devices import Hailo
import numpy as np
from control_settings_in_yaml import generate_controls_from_yaml


def extract_detections(hailo_output, w, h, class_names, threshold=0.5):
    """Extract detections from the HailoRT-postprocess output."""
    results = []
    for class_id, detections in enumerate(hailo_output):
        for detection in detections:
            detection_array = np.array(detection)
            score = detection_array[4]
            if score >= threshold:
                y0, x0, y1, x1 = detection_array[:4]
                bbox = (int(x0 * w), int(y0 * h), int(x1 * w), int(y1 * h))
                results.append([class_names[class_id], bbox, score])
                print(
                    f"Detection(s) found for class '{class_names[class_id]}', Score: {score:.2f}"
                )
    return results


def draw_objects(request):
    current_detections = detections
    if current_detections:
        with MappedArray(request, "main") as m:
            for class_name, bbox, score in current_detections:
                x0, y0, x1, y1 = bbox
                label = f"{class_name} %{int(score * 100)}"
                cv2.rectangle(m.array, (x0, y0), (x1, y1), (0, 255, 0), 2)
                cv2.putText(
                    m.array,
                    label,
                    (x0 + 5, y0 + 15),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.5,
                    (0, 255, 0),
                    1,
                    cv2.LINE_AA,
                )


if __name__ == "__main__":

    parser = argparse.ArgumentParser(
        description="Record a video with Picamera2 and perform object detection."
    )
    parser.add_argument("--width", type=int, default=1080,
                        help="Width of the video")
    parser.add_argument("--height", type=int, default=720,
                        help="Height of the video")
    parser.add_argument(
        "--config_file_path",
        type=str,
        default="config.yaml",
        help="Configuration file path",
    )
    parser.add_argument(
        "-m", "--model", help="Path for the HEF model.", default="yolov8n.hef"
    )
    parser.add_argument(
        "-l",
        "--labels",
        default="coco_1.txt",
        help="Path to a text file containing labels.",
    )
    parser.add_argument(
        "-s",
        "--score_thresh",
        type=float,
        default=0.5,
        help="Score threshold, must be a float between 0 and 1.",
    )

    args = parser.parse_args()
    video_w = args.width
    video_h = args.height
    score_thresh = args.score_thresh
    labels = args.labels
    model = args.model

    # Get the Hailo model, the input size it wants, and the size of our preview stream.
    with Hailo(model) as hailo:
        model_h, model_w, _ = hailo.get_input_shape()

        # Load class names from the labels file
        with open(labels, "r", encoding="utf-8") as f:
            class_names = f.read().splitlines()

        # The list of detected objects to draw.
        detections = None

        with Picamera2() as picam2:

            # Configure and start Picamera2.
            picam2.video_configuration.main.size = (video_w, video_h)
            main = {'size': (video_w, video_h), 'format': 'XBGR8888'}
            lores = {'size': (model_w, model_h), 'format': 'YUV420'}
            config = picam2.create_preview_configuration(main, lores=lores)
            picam2.configure(config)

            # Generate control dictionary from yaml file
            camera_control_dict = generate_controls_from_yaml(
                args.config_file_path)
            picam2.set_controls(camera_control_dict)

            picam2.start_preview(Preview.QTGL, x=0, y=0,
                                 width=video_w, height=video_h)

            picam2.start()

            picam2.pre_callback = draw_objects

            while True:
                frame = picam2.capture_array('lores')
                rgb = cv2.cvtColor(frame, cv2.COLOR_YUV420p2RGB)
                results = hailo.run(rgb)
                detections = extract_detections(
                    results, video_w, video_h, class_names, score_thresh
                )

I am running inference on a YOLOv8n model using a Raspberry Pi CM4 with Hailo. I am using the following script and the ‘lores’ camera flow for inference.

I have tested passing the frame directly in all available image formats in Picamera2 (XBGR8888, XRGB8888, RGB888, BGR888, and YUV420) to hailo.run, but they all throw errors. I have to convert them first using OpenCV, and I am concerned that this might be affecting latency.

In the visualization, there is flickering in the bounding boxes, and the model is not detecting properly. The model was compiled using Hailo Dataflow Compiler with optimization level 2, and I tested it with a dataset where it performed very well, so the issue might be in the inference script.

Could you suggest what might be wrong in this script?

Thanks!


This is the error when trying to pass a frame in format YUV420 directly to hailo.run.

I had to comment out your ‘control-file’ things because, I don’t know, what you had there. But otherwise, things work for me on a Pi5, Hailo 4.20.1 & Arducam ‘PiCam v3-w-AF’ :wink:

ubuntu@ubuntu-2404-pi5b:/tmp$ python3 foo.py 
[12:48:10.661803559] [7763]  INFO Camera camera_manager.cpp:327 libcamera v0.4.0+53-29156679
[12:48:10.684896543] [7778]  INFO RPI pisp.cpp:720 libpisp version v1.0.7 28196ed6edcf 14-03-2025 (00:01:06)
[12:48:10.779826451] [7778]  INFO RPI pisp.cpp:1179 Registered camera /base/axi/pcie@120000/rp1/i2c@88000/imx708@1a to CFE device /dev/media0 and ISP device /dev/media1 using PiSP variant BCM2712_C0
[12:48:10.782920553] [7763]  WARN V4L2 v4l2_pixelformat.cpp:346 Unsupported V4L2 pixel format RPBP
[12:48:10.783121145] [7763]  WARN V4L2 v4l2_pixelformat.cpp:346 Unsupported V4L2 pixel format RPBP
[12:48:10.783802179] [7763]  INFO Camera camera.cpp:1202 configuring streams: (0) 1080x720-XBGR8888 (1) 640x640-YUV420 (2) 1536x864-BGGR_PISP_COMP1
[12:48:10.783947716] [7778]  INFO RPI pisp.cpp:1484 Sensor: /base/axi/pcie@120000/rp1/i2c@88000/imx708@1a - Selected sensor format: 1536x864-SBGGR10_1X10 - Selected CFE format: 1536x864-PC1B
Detection(s) found for class 'person', Score: 0.88
Detection(s) found for class 'person', Score: 0.87
Detection(s) found for class 'person', Score: 0.87
Detection(s) found for class 'person', Score: 0.88
Detection(s) found for class 'person', Score: 0.86
Detection(s) found for class 'person', Score: 0.84
Detection(s) found for class 'person', Score: 0.85

Thank you, Marco, for the feedback. It’s working for me too, but I’m experiencing latencies. Is your detection also experiencing latencies? What detection thresholds did you set?

@omria Sorry for tagging you. Could you please take a look at this code? The hailo.run function does not accept the frame directly from the camera (I tested all formats), and I have to convert it first using OpenCV. Do you know why? In this example picamera2/examples/hailo/detect.py at main · raspberrypi/picamera2 they pass the frame directly without conversion but when I tested it raised the error in the screenshot.

Actually, with

            # lores = {'size': (model_w, model_h), 'format': 'YUV420'}
            lores = {'size': (model_w, model_h), 'format': 'RGB888'}

it works for me;-)

                frame = picam2.capture_array('lores')
                # rgb = cv2.cvtColor(frame, cv2.COLOR_YUV420p2RGB)
                # results = hailo.run(rgb)
                results = hailo.run(frame)

But even with the conversion in the mix, I did not see noticeable lag?! I did not really measure it and just me/person and a few other things as test;-)

1 Like

I’m using a RPi CM4, and it only works converting first. Thank you!

What version of Hailo* ?

I’m using Hailo 4.20.1

At least we agree there;-)

1 Like