Official FPS benchmark on HAILO 8 using raspberry pi 5.

I tried to run hailo benchmark path_to_hef on my 16 GB raspberry pi 5 and for any model, lets say yolov11n, the value of fps that I get is lower than what is mentioned on the official HAILO MODEL ZOO github repository or MODEL explorer. One of the reason I got to know was this is because the PCIe Gen3 x1 which uses just one lane while the official benchmarks are recorded while running on four lane PCIe interface. Is this the right reason ? and also in that case is there any officially recorded benchmarks for maximum fps that can be acheieved using hailo benchmark path_to_hef so that I can be sure I am able to achieve the maximum fps that I should achieve or in other words my AI accelerator is working fine.

Hi @Sameer_Nilkhan ,

You’re correct that PCIe configuration is a major reason why your measured FPS on Raspberry Pi 5 is lower than the official Hailo Model Zoo.

We have the detection simple app: hailo-apps/hailo_apps/python/pipeline_apps/detection_simple at main · hailo-ai/hailo-apps · GitHub
which is a good place to measure FPS.

Can you please share what are the FPS numbers you see in your hailo benchmark tests?

Thabks,

2 Likes

=======

Summary

FPS (hw_only) = 104.907
(streaming) = 104.901
Latency (hw) = 7.7544 ms

Also I have a doubt that in the model explorer vision, they have given FPS with respect to input resolution 640 x 640, so when we run benchmark inference does it also run inference on 640 x 640 input resolution or some other resolution ?

Also although I am claiming that the reason is PCIe 1 lane and 4 lanes, its not direct scaling as 4 lanes should be 4 times faster than 1 lane, so is it officially stated somewhere that the benchmarks are using 4 lanes for the numbers they have posted on hailo model explorer ?

One more thing I observed is that, while running the benchmark code when I observed the hailo monitor screen I saw that NPU is being utilised 100% and the pipeline at different layers or stages in our graph are completely saturated so will PCIe still be a bottleneck in this case because anyways our npu is completely packed so even if we increase the bandwidth of data transfer to npu FPS won’t increase ?

Hi @Sameer_Nilkhan .

hailo benchmark runs inference at the resolution compiled into the HEF, which for the Model Zoo YOLOv11n is 640x640 - the same as what’s listed on Model Explorer.
The official Model Zoo benchmarks are measured on a system with PCIe Gen3 x2 (two lanes), not four, so RPi5’s Gen3 x1 gives you roughly half the host-to-device bandwidth. Your ~105 FPS for YOLOv11n on RPi5 is in the expected range for a x1 configuration. Regarding your excellent observation about 100% NPU utilization: the monitor showing 100% doesn’t necessarily mean the NPU is compute-bound - it means the NPU is fully busy processing whatever data arrives through the available bandwidth, so with a narrower PCIe pipe (x1) the NPU stays 100% active but at lower throughput, whereas with x2 it would still show 100% utilization but process more frames per second since data arrives faster.
In short, your Hailo-8 is working correctly and performing as expected for a PCIe Gen3 x1 setup.

1 Like