Yolov8s performance results.

Hey there,

I’m currently diving into some projects with a Raspberry Pi Compute Module 5 (CM5) that has 8 GB of RAM, and I’ve been putting the YOLOv8s model to the test using hailortcli. Here’s what I’ve found in terms of performance:

YOLOv8s – CM5 8GB (measured through hailortcli run):

PCIe Gen 2: 28.59 FPS

PCIe Gen 3: 55.73 FPS

From what I’ve gathered in various community discussions (check out the link below), it seems that higher FPS values have been reported under similar setups.
Forum link:

I have a couple of questions:

Are there any hardware or architectural differences between the Raspberry Pi 5 and the Raspberry Pi Compute Module 5 that might explain this performance gap?

Could you shed some light on how hailortcli works behind the scenes when running a model?
For instance:
Does it utilize synthetic input frames?
Does it implement an optimal pipeline for maximum throughput?
Are there specific settings or flags that could impact performance?

Is there anything I should tweak, check, or optimize on my system to ensure I’m getting the expected performance from the CM5?

Any insights or clarifications would be hugely appreciated.

Thanks a bunch!

I do not expect any significant difference between the Raspberry Pi 5 and the Compute module 5.

Your FPS number look like you used a Hailo-8L. Yolov8s on a Hailo-8 can be compiled into a single context and runs around 490 FPS.

HailoRT CLI run just sends random data to measure performance without a whole video pipeline.

No, see above.

The Hailo Dataflow Compiler has a performance mode in which the compiler will try as hard as it can to find a solution that will fit in a single context, with the highest performance. This method of compilation will require significantly longer time to complete.
Additionally, during runtime a model which has multiple contexts can achieve a higher throughput by using the batch-size parameter. This will reduce the context switching overhead by computing batches of images.
For single context models the batch size does not affect FPS.

The biggest impact on performance is usually the software framework e.g. C++ vs Python and whether you make use of hardware accelerators (e.g. video decoding) on weaker host CPUs.