I am using Raspberry Pi 5 + Hailo-8L for a robotics project.
My pipeline needs two detections per cycle:
1. material detection on the original image
2. reference-point detection on the undistorted image
Both currently use HEF models on the same Hailo device.
Single-model benchmark result:
- ~107 FPS
- ~8.4 ms HW latency
But in the real application, when I run two detections in the same loop, each inference becomes much slower. Typical logs are:
[HailoYOLO] infer:17ms total:24ms
[HailoYOLO] infer:19ms total:25ms
FPS:22.8 | Total:44ms
Sometimes inference goes above 20 ms or even 30 ms.
I already tried:
- GStreamer camera pipeline
- async inference
- reducing logic/postprocess time
- UINT8 output instead of FLOAT32
- smaller input size (but accuracy dropped)
My question:
Is this slowdown expected when running two detections on the same Hailo-8L?
If dual detection is required, what is the recommended low-latency architecture?
You can use batched inference to speedup the inference part.
You can use the hailo cropper to manage the croppinh for you on the gstreamer pipeline.
The cropper can crop the the full frame and the smaller crop using a detection bbox passed to it. Ask you coding agent to use the hailocropper to effectively crop your requested roi.