Hailo8 Object Detection, Distance Estimation

I’ve been working with the Hailo-8 AI accelerator on a driver-warning system for the past few months. The goal is to detect objects of interest (cars, pedestrians, etc.) with two Raspberry Pi Camera Module 3 units, estimate their distance, and sound a buzzer if they encroach on a safe following distance (we calculate that as half the vehicle’s speed in meters, fed from an OBD-II reader).

I considered two depth-estimation methods:

  1. Stereo inference within a single frame (e.g. StereoNet), but its ~10 fps throughput is too low for real-time warning.
  2. Detecting the same object in two independent camera streams and computing distance from the midpoint of their bounding boxes.

So far I’ve implemented single-camera detection in custom Python code (haven’t yet mastered GStreamer), but I haven’t found a model for multi-stream detection . Since I’ll be running multiple inference modules concurrently, I also need a Python-based scheduling framework—any pointer to good documentation would be welcome.

Hardware: Raspberry Pi 5, Hailo-8 (upgraded from Hailo-8L), buzzer.

Hi @Furkan_Ayan
Welcome to the Hailo community. Your project sounds exciting. For your purpose, our PySDK package will be a good fit. Here are some resources:

  1. Simplifying Edge AI Development with DeGirum PySDK and Hailo
  2. DeGirum/hailo_examples: DeGirum PySDK with Hailo AI Accelerators
  3. Start Here | DeGirum Docs
  4. Running multiple models independently

Hi shashi,

I already work with degirum one of my project (unlocking&locking door usingface det and rec) However i couldn’t find anything for this project. I need the track the same object in 2 separate frame. Or any other method for this project.

@Vlad_Klimov
Can you please help @Furkan_Ayan ?

@Furkan_Ayan,

We have dgstreams (aka gizmos) lightweight Python-based scheduling framework in our degirum_tools package (this is not PySDK, it is extension of PySDK).
We just released documentation: Streams | DeGirum Docs

But for your task I think the best approach would be just to use zip of two predict_batch() calls. Something like this:

import degirum as dg
import degirum_tools

zoo = dg.connect(dg.CLOUD, "degirum/models_hailort", "<your token>")
# or you can use local inference: zoo = dg.connect(dg.LOCAL, zoo_path)

videos = ["images/Traffic.mp4", "images/Traffic.mp4"]
# or you can have rtsp URL or integer indexes of local cams here

# Load an object detection models, one per video source; use the same model
models = [
    zoo.load_model("yolov8n_relu6_car--640x640_quant_hailort_hailo8_1")
    for _ in range(len(videos))
]
# adjust model name as necessary

for r1, r2 in zip(
    degirum_tools.predict_stream(models[0], videos[0]),
    degirum_tools.predict_stream(models[1], videos[1]),
):
    pass
    # here you have r1.results and r2.results -- lists of detected objects with bboxes and scores:
    # r1.results[i]["bbox"] is the [x1,y1,x2,y2] list of bbox coordinates
    # you need to match them between images and make your calculations

But please be advised that edge-quality object detection models are noisy, meaning that the same object (even not moving) on two consecutive frames may be detected with different bbox coordinates or even not detected at all (score may fall below confidence threshold). This means that you need to implement robust bbox matching and smoothing. Approaches are:

  1. Do temporal smoothing/filtering of bbox coordinates using either Kalman filter or some low-pass filter. This helps reject frame-to-frame noise. To do this you need to apply object tracking first (like BYTETrack etc.)
  2. Average over region: instead of just taking the center point of bbox, match several pixels inside the matched bounding boxes then compute disparity per sample then average or median.
  3. Discard far away bboxes (with small disparity)
  4. Use most consistent region (like the bottom center for cars on the road)