Poor performance of Hailo8L and Rpi5

Chuck_Rhoades · November 9, 2024, 5:13am

Hello guys,

if you could please advise me in my issue. First of all, I’ve read:

and

Here are the outputs:

hailortcli run yolov6n.hef --batch-size 1
Running streaming inference (yolov6n.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network yolov6n/yolov6n: 100% | 1408 | FPS: 281.26 | ETA: 00:00:00
> Inference result:
 Network group: yolov6n
    Frames count: 1408
    FPS: 281.26
    Send Rate: 2764.91 Mbit/s
    Recv Rate: 1621.87 Mbit/s

Expected 355 according to model zoo.

hailortcli run yolov6n.hef --batch-size 8
Running streaming inference (yolov6n.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network yolov6n/yolov6n: 100% | 1408 | FPS: 281.24 | ETA: 00:00:00
> Inference result:
 Network group: yolov6n
    Frames count: 1408
    FPS: 281.24
    Send Rate: 2764.74 Mbit/s
    Recv Rate: 1621.77 Mbit/s

Expected 355 according to model zoo.

hailortcli run yolov7.hef --batch-size 1
Running streaming inference (yolov7.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network yolov7/yolov7: 100% | 45 | FPS: 9.00 | ETA: 00:00:00
> Inference result:
 Network group: yolov7
    Frames count: 45
    FPS: 9.00
    Send Rate: 88.46 Mbit/s
    Recv Rate: 155.67 Mbit/s

Expected 25 according to model zoo.

hailortcli run yolov7.hef --batch-size 8
Running streaming inference (yolov7.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network yolov7/yolov7: 100% | 78 | FPS: 15.58 | ETA: 00:00:00
> Inference result:
 Network group: yolov7
    Frames count: 78
    FPS: 15.58
    Send Rate: 153.12 Mbit/s
    Recv Rate: 269.45 Mbit/s

Expected 35 according to model zoo.

sudo lspci -vv

0000:01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
	Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 185
	Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=16K]
	Region 2: Memory at 1800008000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at 1800004000 (64-bit, prefetchable) [size=16K]
	Capabilities: [80] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s (downgraded), Width x1 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [e0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 000000ffffffe000  Data: 0008
	Capabilities: [f8] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
	Capabilities: [108 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [110 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=26016ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [200 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [300 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Kernel driver in use: hailo
	Kernel modules: hailo_pci

What could be the case of my slow downs? Thanks for any help.

KlausK · November 9, 2024, 5:22am

There are two points.

Your Raspberry Pi is set to PCI gen 2. Please follow this guide to upgrade it to PCIe gen 3.

How to upgrade to PCIe gen-3 in RPi5 with Hailo-8L M.2

The second point, the numbers in the Model Zoo are for a Hailo device connected to a system with 4 PCIe lanes and a x86 CPU. Your Raspberry Pi only has a single lane.

Especially for multi-context networks PCIe bandwidth will influence the maximum FPS you can reach.

Chuck_Rhoades · November 11, 2024, 8:56am

@KlausK many thanks for your time, I’ve read somewhere on the forum that anything below 5GT/s means that the PCI is set to gen 2, since it was 5GT/s I thought I am on PCI gen-3.

After changing it to gen-3 it works way faster, thank you.

May I ask one more thing: how do you measure the performance (FPS) in hailortcli?

I am measuring it like this (Python):

                inference_start = time.time()

                bindings.input().set_buffer(np.array(preprocessed_image))
                configured_infer_model.run([bindings], 1000)
                buffer = bindings.output().get_buffer()

                inference_end = time.time()
                inference_time = inference_end - inference_start

I am getting 0.011 s, which translates to 90 FPS f(for yolov6n.hef) which is nowhere near 281 FPS from hailortcli run yolov6n.hef --batch-size 1 on my machine.

Even if I measure only the inference time:

                inference_start = time.time()
                
                configured_infer_model.run([bindings], 1000)

                inference_end = time.time()
                inference_time = inference_end - inference_start

I am getting 0,008s so around 125 FPS.

KlausK · November 11, 2024, 10:08pm

We push as many frames into the device as possible for a few seconds and count them.

For Hailo devices FPS is not the inverse of latency. As soon as the first layer finishes computing the last row of an image it can start working on the next frame. It is a true pipeline and allows higher throughput.

This of course requires the software to be able to work on the input and output streams independently. If you use a blocking software by calling infer and waiting for the result in a single thread you will not reach the maximum throughput. This requires two independent threads.

Chuck_Rhoades · November 12, 2024, 9:05am

Many thanks klaus.

Which of the examples from here: GitHub - hailo-ai/Hailo-Application-Code-Examples is using that approach? I’ve analyzed object detection (Python) and I see there two threads and queues and batch processing. But batch processing does not necessarily mean the same what you are talking about, right? How can I know that hailo is ready to accept a new image?

user131 · March 20, 2025, 7:35am

@KlausK is there any updates regarding to the last comment. Question about example of approach that not wait for result?

omria · March 20, 2025, 8:28am

Hey @user131,

Yes, check out our AsyncInference that is being used for object detection:

github.com/hailo-ai/Hailo-Application-Code-Examples

runtime/python/utils.py

main

from typing import List, Generator, Optional, Tuple, Dict
from pathlib import Path
from functools import partial
import queue
from loguru import logger
import numpy as np
from hailo_platform import (HEF, VDevice,
                            FormatType, HailoSchedulingAlgorithm)
IMAGE_EXTENSIONS: Tuple[str, ...] = ('.jpg', '.png', '.bmp', '.jpeg')


class HailoAsyncInference:
    def __init__(
        self, hef_path: str, input_queue: queue.Queue,
        output_queue: queue.Queue, batch_size: int = 1,
        input_type: Optional[str] = None, output_type: Optional[Dict[str, str]] = None,
        send_original_frame: bool = False) -> None:
        """
        Initialize the HailoAsyncInference class with the provided HEF model 
        file path and input/output queues.

This file has been truncated. show original

Topic		Replies	Views
Performance issue on Rpi5 and Hailo8 General pcie , raspberry-pi , hailo8 , performance	1	653	April 17, 2024
Inference Performance Issue of Hailo-8L on RPi5 General	5	275	December 16, 2024
Benchmark Performance General raspberry-pi , hailo8	7	944	October 9, 2024
How to improve processing speed General raspberry-pi , hailo8	1	45	June 25, 2025
Raspberry pi 5 with Hailo-8L Benchmark General raspberry-pi , hailo8 , performance	11	7005	July 29, 2025

Poor performance of Hailo8L and Rpi5

Related topics