The performance on the Raspberry Pi 5 with the HAILO-8 chip seems not good as he official results

_Mab · September 4, 2025, 9:36am

On the Raspberry Pi 5 with the HAILO-8 chip, we have verified that every step is correct. We set the batch size to 1, input size is 640， and the test results on YOLOv8 are as follows:

YOLOv8n: 431 FPS
YOLOv8s: 491 FPS
YOLOv8m: 31 FPS
YOLOv8l: 19 FPS

I urgently want to know why the performance of YOLOv8s exceeds that of YOLOv8n, and why the performance of YOLOv8m and YOLOv8l is far lower than the official results run on the Intel® Core™ i5-9400 CPU. We hope to confirm whether our test results are correct and the reasons for this phenomenon.

omria · September 4, 2025, 11:10am

Hey @_Mab,

How did you run these models? Where did you get them from - are they from the model zoo?

Also, I’m wondering if yolov8n might have post-processing built in while yolov8s doesn’t. That could explain the difference. Could you share more details about your setup? It would help us figure out what’s affecting the performance testing.

Kristijan_Vrban · September 4, 2025, 9:04pm

I have the same setup and same issue as @_Mab

i use the models found on: hailo_model_zoo/docs/public_models/HAILO8/HAILO8_object_detection.rst at master · hailo-ai/hailo_model_zoo · GitHub

hailortcli run resources/yolov11m.hef --batch-size 1
Running streaming inference (resources/yolov11m.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network yolov11m/yolov11m: 100% | 121 | FPS: 24.17 | ETA: 00:00:00
> Inference result:
 Network group: yolov11m
    Frames count: 121
    FPS: 24.17
    Send Rate: 237.59 Mbit/s
    Recv Rate: 236.11 Mbit/s

So for yolov11m i would expect 50FPS, but get only half.

I enabled PCI gen3, which had zero effect.

lspci -vv:

0001:01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
	Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 38
	Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=16K]
	Region 2: Memory at 1800008000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at 1800004000 (64-bit, prefetchable) [size=16K]
	Capabilities: [80] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x1 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [f8] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
	Capabilities: [108 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [110 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [200 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [300 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Kernel driver in use: hailo
	Kernel modules: hailo_pci

hailortcli fw-control identify:

Executing on device: 0001:01:00.0
Identifying board
Control Protocol Version: 2
Firmware Version: 4.20.0 (release,app,extended context switch buffer)
Logger Version: 0
Board Name: Hailo-8
Device Architecture: HAILO8
Serial Number: <N/A>
Part Number: <N/A>
Product Name: <N/A>

_Mab · September 5, 2025, 1:19am

In fact, those models don’t have any post-processing, I use the command “hailortcli run yolov8s/n.hef –batch-size 1“, those hefs are from the offically information,here is the net work https://hailo.ai/products/hailo-software/model-explorer-vision/. That is wired, huh.

_Mab · September 5, 2025, 1:48am

I test yolov11m.hef on my hailo8 + Radpberry Pi5 and get the same result as you, the command “hailortcli run” and my cpp code also have the same result. I don’t think the problem is with PCIe.
The physical layer rate of PCIe Gen3 is 8GT/s (Gigatransfers per second), and it adopts 128b/130b encoding (with a coding efficiency of approximately 98.46%). The effective bandwidth of the x1 link (bidirectional full-duplex, enabling simultaneous transmission and reception) is calculated as follows: Unidirectional effective bandwidth (bit/s) = 8GT/s × (128/130) ≈ 7.877 Gbit/s ≈ 7877 Mbit/s (approximately 984.6 MB/s).
As you see, the send rate is 238Mbit/s and the send rate is 237Mbit/s when the fps is 24, that means 600Mbit/s is enough when the fps is 50, Although this is an ideal scenario, I believe that an additional 6% of utilization is unlikely to cause a PCIe transmission bottleneck.

_Mab · September 5, 2025, 2:10am

When I use the cpp code to run those hefs(no post-processing and forward-processing), yolov8m and yolov8l get the same fps as the command ”hailortcli run”, but the yolov8n gets 144fps and the yolov8s get 105fps, those results are different from the command, but the phenomenon (yolov8n’fps>yolov8s’fps) is right.
At last, I want to know whether the fps results tested on Raspberry Pi5 with hailo8 are right or not.

omria · September 8, 2025, 11:28am

Hey @_Mab, @Kristijan_Vrban,

Regarding the large models with more than 3-4 contexts: this issue might be related to the Raspberry Pi 5’s single-lane PCIe configuration, which differs from the x86 systems used in the ModelZoo tests.

Thanks for providing that information! I’ve opened a ticket to address the model issue. This appears to be caused by an incorrect model URL (not pointing to the performance-optimized version).

I’ll test both configurations and provide updates on the results.

Kristijan_Vrban · September 8, 2025, 12:41pm

Thank you for addressing this issue @omria If it can be resolved by optimizing the models, that would be good news.

I am available if there is anything that needs to be tested.

_Mab · September 9, 2025, 1:33am

@omria Thanks! This matches what I thought earlier—it’s somewhat related to the optimized architecture.

I am available if there is anything that needs to be tested too.

Kristijan_Vrban · September 18, 2025, 8:11pm

@omria

Have you been able to find any new information on this topic in the meantime? Otherwise, I would have to switch from the RPI 5 to a different platform, as the performance difference is simply too great.

Roman_Sovgyria · September 19, 2025, 8:30am

What other platforms will be similar in terms of performance-price?

Kristijan_Vrban · September 19, 2025, 9:03am

I would test the Rockchip RK3588 based SBCs. Iike the OrangePi 5. they have a M.2 M-KEY slot (PCIe 3.0 4Lane) which should (in theory) work with the Hailo-8 M.2

Roman_Sovgyria · September 19, 2025, 2:31pm

The key phrase here is “in theory.” In practice, you will have many problems, and there is no guarantee that it will work. And what’s the point? NPU is the same thing.
If you’re going to change it, then change it to some kind of Jetson.

Kristijan_Vrban · September 19, 2025, 8:54pm

You mean another NPU in the same performance class? Yes, either something based on Jetson Orin Nano.

Otherwise, the latest Intel and AMD mobile CPUs now all have a >40 TOPS NPU built in for Copilot+.

An SBC with the AMD Ryzen™ AI 5 330 with its 50 TOPS NPU and set to 15W TDP would be my preferred platform.

The Hailo accelerators would be ideal if they delivered their full performance with an RPI. But if you need a desktop PC for that, I might as well use an Nvidia GPU and have all the CUDA conveniences.

Kristijan_Vrban · October 9, 2025, 3:43pm

@omria

A whole month. No feedback? Please

Kristijan_Vrban · November 10, 2025, 1:06pm

Have you been able to test this in the last two months?

It would be good to know whether this is something specific to RPI or whether the model is simply not optimized.

Roman_Sovgyria · November 10, 2025, 1:43pm

I also encountered lower than expected performance on RPi5. After analyzing the situation, I realized that only the yolov8s model can actually work in real time. This is because the hailo compiler is capable of fitting this model into a single context. The yolo11s model is smaller and simpler than yolov8s, but the compiler breaks it down into several contexts, and the speed drops dramatically. It seems that the hailo team needs to work on their compiler.

omria · November 10, 2025, 1:48pm

Hey @_Mab @Roman_Sovgyria @Kristijan_Vrban,

We’ve gone over all our models in the DFC and MZ. You can check out the latest release in 2.17.

Regarding the performance on RPi - this is blocked by the single lane issue. When I test on x86 using Type-C to H8 M.2, I get the results shown in the model zoo in v2.17.

Also regarding Orange Pi, this platform isn’t officially supported, but a lot of community users work with it.
Regarding the models that have lower FPS that doesn’t make sense - there was a wrong URL pointing to models not compiled with performance mode. This issue has been fixed.

We’re also working on compiling models for RPi (using performance mode with lower clusters).

Kristijan_Vrban · November 11, 2025, 9:30am

It would be interesting to test what results you get when you connect a USB 3.0 to H8 M.2 to the RPI 5.

This test should verify whether it really is due to the PCI single lane.

Roman_Sovgyria · November 14, 2025, 6:18pm

Dear @omria, is there a chance that your team will update the compiler so that it can fit the yolo11s model into a single context?

Topic		Replies	Views
Poor performance of Hailo8L and Rpi5 General raspberry-pi , performance	6	1158	March 20, 2025
Yolov8s performance results. General raspberry-pi , hailo8 , performance	1	55	November 8, 2025
Inference Performance Issue of Hailo-8L on RPi5 General	5	347	December 16, 2024
Issue with YOLO FPS on Raspberry Pi 5 General hailort , raspberry-pi	3	421	February 11, 2025
Raspberry pi 5 with Hailo-8L Benchmark General raspberry-pi , hailo8 , performance	13	8049	August 10, 2025

The performance on the Raspberry Pi 5 with the HAILO-8 chip seems not good as he official results

Related topics