I’ve been working with the Hailo-8 AI accelerator on a driver-warning system for the past few months. The goal is to detect objects of interest (cars, pedestrians, etc.) with two Raspberry Pi Camera Module 3 units, estimate their distance, and sound a buzzer if they encroach on a safe following distance (we calculate that as half the vehicle’s speed in meters, fed from an OBD-II reader).
I considered two depth-estimation methods:
- Stereo inference within a single frame (e.g. StereoNet), but its ~10 fps throughput is too low for real-time warning.
- Detecting the same object in two independent camera streams and computing distance from the midpoint of their bounding boxes.
So far I’ve implemented single-camera detection in custom Python code (haven’t yet mastered GStreamer), but I haven’t found a model for multi-stream detection . Since I’ll be running multiple inference modules concurrently, I also need a Python-based scheduling framework—any pointer to good documentation would be welcome.
Hardware: Raspberry Pi 5, Hailo-8 (upgraded from Hailo-8L), buzzer.
Hi shashi,
I already work with degirum one of my project (unlocking&locking door usingface det and rec) However i couldn’t find anything for this project. I need the track the same object in 2 separate frame. Or any other method for this project.
@Furkan_Ayan,
We have dgstreams (aka gizmos) lightweight Python-based scheduling framework in our degirum_tools package (this is not PySDK, it is extension of PySDK).
We just released documentation: Streams | DeGirum Docs
But for your task I think the best approach would be just to use zip of two predict_batch() calls. Something like this:
import degirum as dg
import degirum_tools
zoo = dg.connect(dg.CLOUD, "degirum/models_hailort", "<your token>")
# or you can use local inference: zoo = dg.connect(dg.LOCAL, zoo_path)
videos = ["images/Traffic.mp4", "images/Traffic.mp4"]
# or you can have rtsp URL or integer indexes of local cams here
# Load an object detection models, one per video source; use the same model
models = [
zoo.load_model("yolov8n_relu6_car--640x640_quant_hailort_hailo8_1")
for _ in range(len(videos))
]
# adjust model name as necessary
for r1, r2 in zip(
degirum_tools.predict_stream(models[0], videos[0]),
degirum_tools.predict_stream(models[1], videos[1]),
):
pass
# here you have r1.results and r2.results -- lists of detected objects with bboxes and scores:
# r1.results[i]["bbox"] is the [x1,y1,x2,y2] list of bbox coordinates
# you need to match them between images and make your calculations
But please be advised that edge-quality object detection models are noisy, meaning that the same object (even not moving) on two consecutive frames may be detected with different bbox coordinates or even not detected at all (score may fall below confidence threshold). This means that you need to implement robust bbox matching and smoothing. Approaches are:
- Do temporal smoothing/filtering of bbox coordinates using either Kalman filter or some low-pass filter. This helps reject frame-to-frame noise. To do this you need to apply object tracking first (like BYTETrack etc.)
- Average over region: instead of just taking the center point of bbox, match several pixels inside the matched bounding boxes then compute disparity per sample then average or median.
- Discard far away bboxes (with small disparity)
- Use most consistent region (like the bottom center for cars on the road)