What I use:
- custom object detection model based on yolov8n. Input layer 96x96.
- RPI5 with AI Kit
- Hailo8L
- CPU Cortex-A76
My goal: increase fps when i use my model with video.mp4
Explanation: I have been integrating your npu into our project for about 3 weeks now. I see the potential in this technology and I really want to implement it.
The fact is that we have already tried to use rpi5 but without Hailo NPU. And get ~110 fps.
Now, I’ve translated the same model from onnx to hef. And I tried to run it using this command:
python basic_pipelines/detection.py --labels-json resources/mylabels.json --hef-path resources/mymodel.hef --input resources/myvid.mp4 --disable-sync
I was hoping to see an increase in fps of at least 1.5 times, but I only got 120-125 fps on average. I hoped to see greater increase, because I benchmarked my hef with result 250fps (batchsize = 1)
After that, I started checking the npu and cpu usage. To find out where the bottleneck might be.
I used htop and Hailo Monitor for this purpose. So, I got
npu utilization = 44%
cpu usage = about 100%
Also I read some topics here about PCIe. Used sudo lspci -vvv
and get LnkSta: Speed 8GT/s, Width x1 (downgraded)
I think it all comes down to two problems. The problem is a weak processor, or the fact that I have only 1 lane of PCIe.
So that’s where I left off. And to be honest, I do not know where to go next. I know that can also use DFC in a more advanced way and then I will get an increase in fps, but it seems to me it will be a small increase. I have a feeling that I’m doing something wrong and I can fix it to get a significant increase in fps.