I can't reach as max fps as I expected

What I use:

  • custom object detection model based on yolov8n. Input layer 96x96.
  • RPI5 with AI Kit
  • Hailo8L
  • CPU Cortex-A76

My goal: increase fps when i use my model with video.mp4

Explanation: I have been integrating your npu into our project for about 3 weeks now. I see the potential in this technology and I really want to implement it.

The fact is that we have already tried to use rpi5 but without Hailo NPU. And get ~110 fps.

Now, I’ve translated the same model from onnx to hef. And I tried to run it using this command:

python basic_pipelines/detection.py --labels-json resources/mylabels.json --hef-path resources/mymodel.hef --input resources/myvid.mp4 --disable-sync

I was hoping to see an increase in fps of at least 1.5 times, but I only got 120-125 fps on average. I hoped to see greater increase, because I benchmarked my hef with result 250fps (batchsize = 1)

After that, I started checking the npu and cpu usage. To find out where the bottleneck might be.


I used htop and Hailo Monitor for this purpose. So, I got
npu utilization = 44%
cpu usage = about 100%

Also I read some topics here about PCIe. Used sudo lspci -vvv and get LnkSta: Speed 8GT/s, Width x1 (downgraded)

I think it all comes down to two problems. The problem is a weak processor, or the fact that I have only 1 lane of PCIe.

So that’s where I left off. And to be honest, I do not know where to go next. I know that can also use DFC in a more advanced way and then I will get an increase in fps, but it seems to me it will be a small increase. I have a feeling that I’m doing something wrong and I can fix it to get a significant increase in fps.

Hi @Danil_Plokhikh
Welcome to the Hailo community. Since the hailortcli benchmark shows 250FPS but you are unable to get this number, it is most likely due to the weaker CPU. Specifically, since your input size is 96x96, CPU resources could be spent on video decode and resize. Your resource usage also points to this.

Thank you so much for your reply. it really helped me a lot in my research.
It turns out that if I increase the processor power, then the fps will grow linearly or exponentially? The fact is that if the FPS growth is linear, then it will not be so profitable for us, because the contribution of npu will not be felt.

What do you think about 1 lane PCIe. Can it be connected with my problem?

Hi @Danil_Plokhikh
The FPS can be higher on a 4 lane device but that could simply be due to the fact that CPUs that have 4 lanes of PCIe are also more powerful. So the combination of more powerful CPU and higher baseline performance due to 4 lanes of PCIe can give higher performance. For example, on some x86 machines we see FPS>900 for yolov8n models.