I can't reach as max fps as I expected

Danil_Plokhikh · March 4, 2025, 8:12pm

What I use:

custom object detection model based on yolov8n. Input layer 96x96.
RPI5 with AI Kit
Hailo8L
CPU Cortex-A76

My goal: increase fps when i use my model with video.mp4

Explanation: I have been integrating your npu into our project for about 3 weeks now. I see the potential in this technology and I really want to implement it.

The fact is that we have already tried to use rpi5 but without Hailo NPU. And get ~110 fps.

Now, I’ve translated the same model from onnx to hef. And I tried to run it using this command:

python basic_pipelines/detection.py --labels-json resources/mylabels.json --hef-path resources/mymodel.hef --input resources/myvid.mp4 --disable-sync

I was hoping to see an increase in fps of at least 1.5 times, but I only got 120-125 fps on average. I hoped to see greater increase, because I benchmarked my hef with result 250fps (batchsize = 1)

After that, I started checking the npu and cpu usage. To find out where the bottleneck might be.

I used htop and Hailo Monitor for this purpose. So, I got
npu utilization = 44%
cpu usage = about 100%

Also I read some topics here about PCIe. Used sudo lspci -vvv and get LnkSta: Speed 8GT/s, Width x1 (downgraded)

I think it all comes down to two problems. The problem is a weak processor, or the fact that I have only 1 lane of PCIe.

So that’s where I left off. And to be honest, I do not know where to go next. I know that can also use DFC in a more advanced way and then I will get an increase in fps, but it seems to me it will be a small increase. I have a feeling that I’m doing something wrong and I can fix it to get a significant increase in fps.

shashi · March 4, 2025, 8:35pm

Hi @Danil_Plokhikh
Welcome to the Hailo community. Since the hailortcli benchmark shows 250FPS but you are unable to get this number, it is most likely due to the weaker CPU. Specifically, since your input size is 96x96, CPU resources could be spent on video decode and resize. Your resource usage also points to this.

Danil_Plokhikh · March 7, 2025, 2:44pm

Thank you so much for your reply. it really helped me a lot in my research.
It turns out that if I increase the processor power, then the fps will grow linearly or exponentially? The fact is that if the FPS growth is linear, then it will not be so profitable for us, because the contribution of npu will not be felt.

What do you think about 1 lane PCIe. Can it be connected with my problem?

shashi · March 7, 2025, 2:52pm

Hi @Danil_Plokhikh
The FPS can be higher on a 4 lane device but that could simply be due to the fact that CPUs that have 4 lanes of PCIe are also more powerful. So the combination of more powerful CPU and higher baseline performance due to 4 lanes of PCIe can give higher performance. For example, on some x86 machines we see FPS>900 for yolov8n models.

Danil_Plokhikh · April 3, 2025, 3:10pm

Couple months ago, I have found out what was the problem. The answer was quite simple, but I overlooked it. I will try to attach the screenshot with explanation.

In line 108 & 109. Width and height of model. As soon I use 96x96 model I need to change here. This extremely increased my fps (about 3,5 times). And I got such fps which I have seen in benchmark.

Hope this will help someone with your custom models.

Topic		Replies	Views
Benchmark Performance General raspberry-pi , hailo8	7	819	October 9, 2024
How to raise fps of rpi camera module 3 General gstreamer , raspberry-pi	4	107	April 3, 2025
Model HEF FPS parameters on differents devices General	5	94	February 28, 2025
Issue with YOLO FPS on Raspberry Pi 5 General hailort , raspberry-pi	3	309	February 11, 2025
Poor performance of Hailo8L and Rpi5 General raspberry-pi , performance	6	749	March 20, 2025

I can't reach as max fps as I expected

Related topics