Please have a look at the following post I wrote a while back for some basic insights.
Hailo Community - My model runs slower than expected
You can measure the latency using the HailoRT CLI run command.
hailortcli run model.hef --measure-latency
If you are using the YOLOv11m HEF from the Model Zoo, the expected throughput is around ~50 FPS on a Hailo-8.
Mostly yes.
Benchmark FPS numbers assume that the application feeds frames asynchronously and does not block on the result of each frame. When a network is compiled into a single context, all layers are executed in parallel, and achieving maximum FPS requires the application to keep the pipeline full.
For networks compiled into multiple contexts, this matters less, as context switching introduces a significant overhead either way, which limits achievable throughput regardless of application behavior.
If you’re measuring end-to-end latency (including preprocessing, postprocessing, or CPU waits), values can be significantly higher than the accelerator-only latency.