I am using Raspberry Pi 5 + Hailo-8L for a robotics project.
My pipeline needs two detections per cycle:
1. material detection on the original image
2. reference-point detection on the undistorted image
Both currently use HEF models on the same Hailo device.
Single-model benchmark result:
- ~107 FPS
- ~8.4 ms HW latency
But in the real application, when I run two detections in the same loop, each inference becomes much slower. Typical logs are:
[HailoYOLO] infer:17ms total:24ms
[HailoYOLO] infer:19ms total:25ms
FPS:22.8 | Total:44ms
Sometimes inference goes above 20 ms or even 30 ms.
I already tried:
- GStreamer camera pipeline
- async inference
- reducing logic/postprocess time
- UINT8 output instead of FLOAT32
- smaller input size (but accuracy dropped)
My question:
Is this slowdown expected when running two detections on the same Hailo-8L?
If dual detection is required, what is the recommended low-latency architecture?