Hailo-8 c-processor not detected by hailortcli etc when on a geekworm PCIe switch for RPI5

Hello!

I’ve been running the RPI AI devkit with great success. I’ve used it successfully with the original RPi M.2 hat that came with the kit, and on the Pimoroni NVMe Base Duo PCIe switch, alongside an NVMe SSD.

However, when I remove the Pimorini NVMe Base Duo and replace it with the Geekworm X1011 PCIe to Four M.2 NVMe SSD Board for Raspberry Pi 5, the Hailo-8 co-processor no longer gets recognized by the Hailo software.

Here’s my environment and a demonstration of the problem:

pi@aye:~ $ sudo apt upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
pi@aye:~ $ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
pi@aye:~ $ uname -r
6.6.47+rpt-rpi-2712
pi@aye:~ $ lsmod | grep hailo
hailo_pci              98304  0
pi@aye:~ $ hailortcli -v
HailoRT-CLI version 4.18.0
pi@aye:~ $ sudo lspci | grep Hailo
0000:06:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
pi@aye:~ $ hailortcli fw-control identify
pi@aye:~ $ hailortcli scan
Hailo devices not found
pi@aye:~ $ hailortcli fw-control identify --bdf 0000:06:00.0
[HailoRT] [error] Requested device not found
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_INVALID_ARGUMENT(2)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_INVALID_ARGUMENT(2)
[HailoRT CLI] [error] CHECK_SUCCESS failed with status=HAILO_INVALID_ARGUMENT(2)
pi@aye:~ $ 

As you can see, the co-processor is present in the output of lspci, but is not recognized by the hailortctrl command, even when I provide the PCIe address.

When I move the co-processor and NVMe back to the Pimoroni, everything works as expected:

pi@aye:~ $ lsmod | grep hailo
hailo_pci              98304  0
pi@aye:~ $ sudo lspci | grep Hailo
0000:03:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
pi@aye:~ $ hailortcli fw-control identify
Executing on device: 0000:03:00.0
Identifying board
Control Protocol Version: 2
Firmware Version: 4.18.0 (release,app,extended context switch buffer)
Logger Version: 0
Board Name: Hailo-8
Device Architecture: HAILO8L
Serial Number: HLDDLBB242500870
Part Number: HM21LB1C2LAE
Product Name: HAILO-8L AI ACC M.2 B+M KEY MODULE EXT TMP

pi@aye:~ $ hailortcli scan
Hailo Devices:
[-] Device: 0000:03:00.0
pi@aye:~ $ 

The only difference in configuration is that on the Geekworm PCIe switch, the Hailo co-processor enumerates at location 06:00.0, while on the Pimoroni PCIe switch, it enumerates at location 03:00.0.

So I realize this is most likely due to a deficiency in the Geekworm PCIe switch, but I’m hoping that perhaps there’s another workaround I can try. Thank you for your time!

Jeremy

Hey @jeremy.impson

Welcome to the Hailo Community!

I understand that the device is detected by lspci but not recognized by hailortcli. Let’s go through some troubleshooting steps:

  1. Check PCIe Link:
    Run sudo lspci -vvv -s 06:00.0 and compare the output with your working Pimoroni setup. Look for differences in link speed or width.

  2. Examine Kernel Logs:
    Use dmesg | grep -i pcie to check for any PCIe-related errors or warnings.

  3. Test Power Supply:
    Try using the Hailo-8 alone on the switch to rule out power issues.

  4. Force Device Identification:
    Run hailortcli fw-control identify --bdf 0000:06:00.0 to attempt direct identification.

  5. Update HailoRT Driver:
    Ensure your HailoRT driver (v4.18.0) is compatible with your setup. Consider rebuilding if necessary.

  6. Reset PCIe Bus:
    Try sudo setpci -s 06:00.0 COMMAND=0x06 to reset and re-enumerate the device.

If these steps don’t resolve the issue, it might be a limitation of the Geekworm PCIe switch. Please let me know the results of your troubleshooting, and we can explore further options if needed.

Best regards,
Omri

I have the same problem.
I’m using asm1184e on CM5.
But when I put nvme and Hailo behind asm1184e, Hailo doesn’t work.
On CM4 on the same circuit, both nvme and Hailo work.
Please show me the steps to find out why it doesn’t work.

horique

$ hailortcli -v
HailoRT-CLI version 4.19.0

$ hailortcli scan
Hailo devices not found

$ hailortcli fw-control identify --bdf 0000:04:00.0
[HailoRT] [error] Requested device not found
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_INVALID_ARGUMENT(2)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_INVALID_ARGUMENT(2)
[HailoRT CLI] [error] CHECK_SUCCESS failed with status=HAILO_INVALID_ARGUMENT(2)

$ dmesg | grep -i hailo
[ 6.064460] hailo: Init module. driver version 4.19.0
[ 6.064581] hailo 0000:04:00.0: Probing on: 1e60:2864…
[ 6.064585] hailo 0000:04:00.0: Probing: Allocate memory for device extension, 11632
[ 6.064604] hailo 0000:04:00.0: enabling device (0000 → 0002)
[ 6.064616] hailo 0000:04:00.0: Probing: Device enabled
[ 6.064646] hailo 0000:04:00.0: Probing: mapped bar 0 - 00000000b3a063de 16384
[ 6.064651] hailo 0000:04:00.0: Probing: mapped bar 2 - 000000006ea5dca7 4096
[ 6.064654] hailo 0000:04:00.0: Probing: mapped bar 4 - 00000000ec0ef54e 16384
[ 6.064664] hailo 0000:04:00.0: Probing: Force setting max_desc_page_size to 4096 (recommended value is 16384)
[ 6.064681] hailo 0000:04:00.0: Probing: Enabled 64 bit dma
[ 6.064685] hailo 0000:04:00.0: Probing: Using userspace allocated vdma buffers
[ 6.064691] hailo 0000:04:00.0: Disabling ASPM L0s
[ 6.064695] hailo 0000:04:00.0: Successfully disabled ASPM L0s
[ 6.064777] hailo 0000:04:00.0: Failed to enable MSI -28
[ 6.064781] hailo 0000:04:00.0: Failed Enabling interrupts -28
[ 6.064783] hailo 0000:04:00.0: Failed activating board -28
[ 6.064797] hailo: probe of 0000:04:00.0 failed with error -28

$ sudo lspci -vvv -s 04:00.0
0000:04:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 41
Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=16K]
Region 2: Memory at 1800008000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at 1800004000 (64-bit, prefetchable) [size=16K]
Capabilities: [80] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 26W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [f8] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
Capabilities: [108 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [110 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [200 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [300 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Kernel modules: hailo_pci

$ sudo lspci -vvv -s 05:00.0
0000:05:00.0 Non-Volatile memory controller: Yangtze Memory Technologies Co.,Ltd PC005 NVMe SSD (rev 03) (prog-if 02 [NVM Express])
Subsystem: Yangtze Memory Technologies Co.,Ltd PC005 NVMe SSD
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 39
Region 0: Memory at 1b80000000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 26W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <8us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b0] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00002100
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [158 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [178 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [180 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Kernel driver in use: nvme

$ dmesg | grep -i pcie
[ 0.000000] Kernel command line: reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe cgroup_disable=memory numa_policy=interleave numa=fake=8 system_heap.max_order=0 smsc95xx.macaddr=2C:CF:67:C2:A1:05 vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000 console=ttyAMA10,115200 console=tty1 root=PARTUUID=9e330620-02 rootfstype=ext4 fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles cfg80211.ieee80211_regdom=JP
[ 0.025363] /axi/pcie@120000/rp1: Fixed dependency cycle(s) with /axi/pcie@120000/rp1
[ 0.025612] /axi/pcie@120000/rp1: Fixed dependency cycle(s) with /axi/pcie@120000/rp1
[ 0.270544] brcm-pcie 1000110000.pcie: host bridge /axi/pcie@110000 ranges:
[ 0.270550] brcm-pcie 1000110000.pcie: No bus range found for /axi/pcie@110000, using [bus 00-ff]
[ 0.270558] brcm-pcie 1000110000.pcie: MEM 0x1b80000000…0x1bffffffff → 0x0080000000
[ 0.270563] brcm-pcie 1000110000.pcie: MEM 0x1800000000…0x1b7fffffff → 0x0400000000
[ 0.270567] brcm-pcie 1000110000.pcie: IB MEM 0x0000000000…0x007fffffff → 0x0000000000
[ 0.271900] brcm-pcie 1000110000.pcie: Forcing gen 2
[ 0.272014] brcm-pcie 1000110000.pcie: PCI host bridge to bus 0000:00
[ 0.380387] brcm-pcie 1000110000.pcie: link up, 5.0 GT/s PCIe x1 (!SSC)
[ 0.394030] pci 0000:04:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[ 0.404682] pci 0000:05:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[ 0.416719] pcieport 0000:00:00.0: enabling device (0000 → 0002)
[ 0.416746] pcieport 0000:00:00.0: PME: Signaling with IRQ 38
[ 0.416821] pcieport 0000:00:00.0: AER: enabled with IRQ 38
[ 0.416866] pcieport 0000:01:00.0: enabling device (0000 → 0002)
[ 0.417062] pcieport 0000:02:03.0: enabling device (0000 → 0002)
[ 0.417157] pcieport 0000:02:05.0: enabling device (0000 → 0002)
[ 0.435380] brcm-pcie 1000120000.pcie: host bridge /axi/pcie@120000 ranges:
[ 0.435385] brcm-pcie 1000120000.pcie: No bus range found for /axi/pcie@120000, using [bus 00-ff]
[ 0.435391] brcm-pcie 1000120000.pcie: MEM 0x1f00000000…0x1ffffffffb → 0x0000000000
[ 0.435395] brcm-pcie 1000120000.pcie: MEM 0x1c00000000…0x1effffffff → 0x0400000000
[ 0.435400] brcm-pcie 1000120000.pcie: IB MEM 0x1f00000000…0x1f003fffff → 0x0000000000
[ 0.435403] brcm-pcie 1000120000.pcie: IB MEM 0x0000000000…0x0fffffffff → 0x1000000000
[ 0.436379] brcm-pcie 1000120000.pcie: Forcing gen 2
[ 0.436404] brcm-pcie 1000120000.pcie: PCI host bridge to bus 0001:00
[ 0.544377] brcm-pcie 1000120000.pcie: link up, 5.0 GT/s PCIe x4 (!SSC)
[ 0.556458] pcieport 0001:00:00.0: enabling device (0000 → 0002)
[ 0.556478] pcieport 0001:00:00.0: PME: Signaling with IRQ 49
[ 0.556545] pcieport 0001:00:00.0: AER: enabled with IRQ 4