how to increase the speed of using Raspberry Pi 5 + Hailo-8L to detect twice within one frame

I am using Raspberry Pi 5 + Hailo-8L for a robotics project.

My pipeline needs two detections per cycle:
1. material detection on the original image
2. reference-point detection on the undistorted image

Both currently use HEF models on the same Hailo device.

Single-model benchmark result:
- ~107 FPS
- ~8.4 ms HW latency

But in the real application, when I run two detections in the same loop, each inference becomes much slower. Typical logs are:

[HailoYOLO] infer:17ms total:24ms
[HailoYOLO] infer:19ms total:25ms
FPS:22.8 | Total:44ms

Sometimes inference goes above 20 ms or even 30 ms.

I already tried:
- GStreamer camera pipeline
- async inference
- reducing logic/postprocess time
- UINT8 output instead of FLOAT32
- smaller input size (but accuracy dropped)

My question:
Is this slowdown expected when running two detections on the same Hailo-8L?
If dual detection is required, what is the recommended low-latency architecture?