Low FPS with YOLOv8m @ 640×640 on Raspberry Pi 5 – Only 16 FPS vs 130+ Expected

Hi everyone,

I am running YOLOv8m (input size 640×640) on a Raspberry Pi 5 with Hailo accelerator. According to the Hailo documentation, the expected performance is around 130–150 FPS, but I am only getting about 16 FPS in my setup.

I would like to understand how I can optimize my pipeline to improve performance. Specifically:

  • Should I use INT8 quantization with a proper calibration dataset instead of FP16 to achieve higher FPS?

  • Does reducing the input resolution (e.g., 512×512 or 416×416) significantly boost FPS without much accuracy loss?

  • How can I tune batch size to increase throughput while keeping latency acceptable?

  • Could NMS (non-max suppression) post-processing on the CPU be causing the bottleneck, and how can I move it to the accelerator?

  • Are there any reference scripts or best practices for running YOLOv8m efficiently on Raspberry Pi 5 with Hailo?

My goal is to achieve >40 FPS with YOLOv8m while maintaining good accuracy. Any guidance, tips, or examples would be highly appreciated.

Thanks in advance!

Here is a general topic that will explain a few things. Please have a look.

Hailo Community - My model runs slower than expected

That is on a x86 machine with 4 PCIe lanes and batch-size 8.

The Hailo device always uses integer operations. That is one part why our architecture is more power efficient.

Every bit helps. If you need a significant boos have a look at the yolov8n model. You can use the Model Explorer in the Developer Zone to compare models on accuracy vs speed.

Hailo Model Explorer Vision

See guide above about Multi Context models

Yes, on weak CPUs pre- and post-processing can be the bottleneck. The Hailo accelerators have been designed to do the heavy compute of inference at very high efficiency. But they cannot do everything.

We do have our application code examples and Raspberry Pi examples as a starting point. Have a look into our GitHub repositories.

https://github.com/hailo-ai

With a batch-size of 2 I measured 49 FPS on a Raspberry Pi 5 with Hailo-8.

Hey, facing the same issue. I am working on iMX-8 plus with hailo 8 and i am getting 22 FPS with image size 640x512, batch size = 1. It is a real time inference device.

  1. My goal is to reach 50 FPS too. How can i optimize it ?
  2. Batch size which you guys have mentioned is it related to tiling technique or simply 2 frames it will take together for inference ?

Hi @Suraj_Upadhyay

Batch size is related to sending multiple inputs together to inference engine to get higher throughput. But batching is handled by HailoRT at runtime. See Custom batch_size for models. - General - Hailo Community for explanations by @KlausK .

In addition to what shashi mentioned, if you compiled without the highest compiler optimization level, this could be a limiting factor for the FPS you are seeing.

Hi @shashi @lawrence ,
As my use case is real time detection with single camera only, with above explanations I think batch size is useless for me.
@lawrence actually I have kept the performance flag disabled during compilation because due to that my accuracy was dropping a lot.

Is there a way by which I can increase FPS even by a small number from my current situation?

The performance flag should not change accuracy at all. Make sure you are looking at compiler optimization level and not the optimization level. The optimization level changes quantization settings, the compiler optimization level simply optimizes how the layers are split into contexts.

1 Like