Hailortcli fw-control identify: error 6

I have an Ubuntu 22.04.3 with kernel 5.15.0-122-generic. As a drive I have hailort-pcie-driver_4.18.0_all.deb installed and installation looks successfull:

>>lspci | grep Co-processor
03:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
04:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
05:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
06:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)

For actual work i am running the hailo_ai_sw_suite_2024-07:1 Docker container, where the Hailo chips are found:

>>hailortcli scan
Hailo Devices:
[-] Device: 0000:03:00.0
[-] Device: 0000:06:00.0
[-] Device: 0000:04:00.0
[-] Device: 0000:05:00.0

Everything worked fine using the tappas examples. But the problem started after i tried to a python post processing like described in the documentation. I just added the function the read the tensors provided from my object detection Yolov8s.
After running with postprocessing I get the following error:

2024-10-01 11:42:47,680 [INFO] Running Pipeline...
[HailoRT] [error] CHECK failed - Failed to open device file /dev/hailo0 with error 6
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)

also i get:

>>hailortcli fw-control identify
[HailoRT] [error] CHECK failed - Failed to open device file /dev/hailo0 with error 6
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT CLI] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)

I already tried purge and reinstall of the driver and starting a new docker container but nothings helps. Does anybody know what might help?

Hey @dominik.wuttke,

Welcome to the Hailo Community!

I understand you’re facing an issue where the HailoRT driver fails to open /dev/hailo0 (error 6: HAILO_DRIVER_FAIL). Here are some troubleshooting steps:

  1. Check Device Permissions:

    ls -l /dev/hailo*
    sudo chmod 666 /dev/hailo*  # If permissions need adjustment
    
  2. Verify Driver Loading:

    lsmod | grep hailo_pci
    sudo modprobe hailo_pci  # If driver isn't loaded
    
  3. Ensure Docker Container Access:

    docker run --device /dev/hailo0 --device /dev/hailo1 ...
    docker exec -it <container_id> ls -l /dev/hailo*
    
  4. Check Kernel Logs:

    dmesg | grep hailo
    
  5. Reinitialize the Device:

    sudo rmmod hailo_pci
    sudo modprobe hailo_pci
    

Let me know if any of these steps help or if you need further assistance!

Best Regards,
Omri

Hi Omri,

thanks for your reply. I tried you steps but i get the same error for “hailortcli fw-control identify”

The steps 1,2 and 3 you described are without any problem but doing step 4 and 5 still leads to the following message:

>> sudo dmesg | grep hailo
[    5.883242] hailo_pci: loading out-of-tree module taints kernel.
[    5.883275] hailo_pci: module verification failed: signature and/or required key missing - tainting kernel
[    5.883679] hailo: Init module. driver version 4.18.0
[    5.883730] hailo 0000:03:00.0: Probing on: 1e60:2864...
[    5.883733] hailo 0000:03:00.0: Probing: Allocate memory for device extension, 11632
[    5.883740] hailo 0000:03:00.0: enabling device (0000 -> 0002)
[    5.883875] hailo 0000:03:00.0: Probing: Device enabled
[    5.883899] hailo 0000:03:00.0: Probing: mapped bar 0 - 00000000aee0b0d5 16384
[    5.883906] hailo 0000:03:00.0: Probing: mapped bar 2 - 000000004a79f0ec 4096
[    5.883912] hailo 0000:03:00.0: Probing: mapped bar 4 - 00000000bcfb12e2 16384
[    5.883915] hailo 0000:03:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[    5.883947] hailo 0000:03:00.0: Probing: Enabled 64 bit dma
[    5.883949] hailo 0000:03:00.0: Probing: Using userspace allocated vdma buffers
[    5.883952] hailo 0000:03:00.0: Disabling ASPM L0s 
[    5.883954] hailo 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
[    5.883957] hailo 0000:03:00.0: Successfully disabled ASPM L0s 
[    6.030763] hailo 0000:03:00.0: Firmware was loaded successfully
[    6.052744] hailo 0000:03:00.0: Probing: Added board 1e60-2864, /dev/hailo0
[    6.052768] hailo 0000:04:00.0: Probing on: 1e60:2864...
[    6.052770] hailo 0000:04:00.0: Probing: Allocate memory for device extension, 11632
[    6.052778] hailo 0000:04:00.0: enabling device (0000 -> 0002)
[    6.052899] hailo 0000:04:00.0: Probing: Device enabled
[    6.052914] hailo 0000:04:00.0: Probing: mapped bar 0 - 000000003f681360 16384
[    6.052920] hailo 0000:04:00.0: Probing: mapped bar 2 - 0000000071e6fe8d 4096
[    6.052923] hailo 0000:04:00.0: Probing: mapped bar 4 - 00000000f3cef42e 16384
[    6.052927] hailo 0000:04:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[    6.052953] hailo 0000:04:00.0: Probing: Enabled 64 bit dma
[    6.052955] hailo 0000:04:00.0: Probing: Using userspace allocated vdma buffers
[    6.052957] hailo 0000:04:00.0: Disabling ASPM L0s 
[    6.052959] hailo 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control
[    6.052960] hailo 0000:04:00.0: Successfully disabled ASPM L0s 
[    6.192274] hailo 0000:04:00.0: Firmware was loaded successfully
[    6.208860] hailo 0000:04:00.0: Probing: Added board 1e60-2864, /dev/hailo1
[    6.208879] hailo 0000:05:00.0: Probing on: 1e60:2864...
[    6.208880] hailo 0000:05:00.0: Probing: Allocate memory for device extension, 11632
[    6.208888] hailo 0000:05:00.0: enabling device (0000 -> 0002)
[    6.209000] hailo 0000:05:00.0: Probing: Device enabled
[    6.209019] hailo 0000:05:00.0: Probing: mapped bar 0 - 00000000d1766da8 16384
[    6.209022] hailo 0000:05:00.0: Probing: mapped bar 2 - 00000000dcd0c669 4096
[    6.209025] hailo 0000:05:00.0: Probing: mapped bar 4 - 00000000185dfce4 16384
[    6.209028] hailo 0000:05:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[    6.209052] hailo 0000:05:00.0: Probing: Enabled 64 bit dma
[    6.209053] hailo 0000:05:00.0: Probing: Using userspace allocated vdma buffers
[    6.209055] hailo 0000:05:00.0: Disabling ASPM L0s 
[    6.209056] hailo 0000:05:00.0: can't disable ASPM; OS doesn't have ASPM control
[    6.209057] hailo 0000:05:00.0: Successfully disabled ASPM L0s 
[    6.347531] hailo 0000:05:00.0: Firmware was loaded successfully
[    6.364830] hailo 0000:05:00.0: Probing: Added board 1e60-2864, /dev/hailo2
[    6.364846] hailo 0000:06:00.0: Probing on: 1e60:2864...
[    6.364847] hailo 0000:06:00.0: Probing: Allocate memory for device extension, 11632
[    6.364855] hailo 0000:06:00.0: enabling device (0000 -> 0002)
[    6.364958] hailo 0000:06:00.0: Probing: Device enabled
[    6.364975] hailo 0000:06:00.0: Probing: mapped bar 0 - 0000000050991ec7 16384
[    6.364979] hailo 0000:06:00.0: Probing: mapped bar 2 - 000000000d97e996 4096
[    6.364983] hailo 0000:06:00.0: Probing: mapped bar 4 - 00000000077b2227 16384
[    6.364985] hailo 0000:06:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[    6.365011] hailo 0000:06:00.0: Probing: Enabled 64 bit dma
[    6.365012] hailo 0000:06:00.0: Probing: Using userspace allocated vdma buffers
[    6.365014] hailo 0000:06:00.0: Disabling ASPM L0s 
[    6.365015] hailo 0000:06:00.0: can't disable ASPM; OS doesn't have ASPM control
[    6.365016] hailo 0000:06:00.0: Successfully disabled ASPM L0s 
[    6.504941] hailo 0000:06:00.0: Firmware was loaded successfully
[    6.524929] hailo 0000:06:00.0: Probing: Added board 1e60-2864, /dev/hailo3
[  249.934934] hailo 0000:03:00.0: Device disconnected while opening device
[  344.862955] hailo 0000:03:00.0: Device disconnected while opening device

here the log after executing step 4 and 5:

642.544463] hailo: Hailo PCIe driver unloaded.
[  656.343204] hailo: Init module. driver version 4.18.0
[  656.343237] hailo 0000:03:00.0: Probing on: 1e60:2864...
[  656.343238] hailo 0000:03:00.0: Probing: Allocate memory for device extension, 11632
[  656.361431] hailo 0000:03:00.0: enabling device (0000 -> 0002)
[  656.361631] hailo 0000:03:00.0: Probing: Device enabled
[  656.361656] hailo 0000:03:00.0: Probing: mapped bar 0 - 00000000c6e074d4 16384
[  656.361660] hailo 0000:03:00.0: Probing: mapped bar 2 - 00000000959050b7 4096
[  656.361666] hailo 0000:03:00.0: Probing: mapped bar 4 - 00000000ccec5d73 16384
[  656.361669] hailo 0000:03:00.0: Probing: Failed reading device BARs, device may be disconnected
[  656.361675] hailo 0000:03:00.0: Probing: Failed init pcie resources
[  656.361767] hailo 0000:04:00.0: Probing on: 1e60:2864...
[  656.361768] hailo 0000:04:00.0: Probing: Allocate memory for device extension, 11632
[  656.361775] hailo 0000:04:00.0: enabling device (0000 -> 0002)
[  656.361811] hailo 0000:04:00.0: Probing: Device enabled
[  656.361821] hailo 0000:04:00.0: Probing: mapped bar 0 - 0000000050991ec7 16384
[  656.361823] hailo 0000:04:00.0: Probing: mapped bar 2 - 0000000074cbf538 4096
[  656.361826] hailo 0000:04:00.0: Probing: mapped bar 4 - 000000009ff91e8f 16384
[  656.361828] hailo 0000:04:00.0: Probing: Failed reading device BARs, device may be disconnected
[  656.361831] hailo 0000:04:00.0: Probing: Failed init pcie resources
[  656.361883] hailo 0000:05:00.0: Probing on: 1e60:2864...
[  656.361884] hailo 0000:05:00.0: Probing: Allocate memory for device extension, 11632
[  656.361889] hailo 0000:05:00.0: enabling device (0000 -> 0002)
[  656.361937] hailo 0000:05:00.0: Probing: Device enabled
[  656.361943] hailo 0000:05:00.0: Probing: mapped bar 0 - 0000000024e8445e 16384
[  656.361945] hailo 0000:05:00.0: Probing: mapped bar 2 - 0000000047248e1d 4096
[  656.361947] hailo 0000:05:00.0: Probing: mapped bar 4 - 000000003bbda494 16384
[  656.361949] hailo 0000:05:00.0: Probing: Failed reading device BARs, device may be disconnected
[  656.361951] hailo 0000:05:00.0: Probing: Failed init pcie resources
[  656.361987] hailo 0000:06:00.0: Probing on: 1e60:2864...
[  656.361988] hailo 0000:06:00.0: Probing: Allocate memory for device extension, 11632
[  656.361992] hailo 0000:06:00.0: enabling device (0000 -> 0002)
[  656.362024] hailo 0000:06:00.0: Probing: Device enabled
[  656.362029] hailo 0000:06:00.0: Probing: mapped bar 0 - 00000000e20db826 16384
[  656.362031] hailo 0000:06:00.0: Probing: mapped bar 2 - 00000000e0e45c57 4096
[  656.362053] hailo 0000:06:00.0: Probing: mapped bar 4 - 0000000029f9fa23 16384
[  656.362056] hailo 0000:06:00.0: Probing: Failed reading device BARs, device may be disconnected
[  656.362058] hailo 0000:06:00.0: Probing: Failed init pcie resources

Any idea what’s missing?

I just wanted to document the current state and progress, hopefully it helps people with similar problems.

I checked the driver assignment for device module 03:00.0 by calling

sudo lspci -vvv -s 03:00.0

and got the following result:

03:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
	Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at a0304000 (64-bit, prefetchable) [virtual] [size=16K]
	Region 2: Memory at a0308000 (64-bit, prefetchable) [virtual] [size=4K]
	Region 4: Memory at a0300000 (64-bit, prefetchable) [virtual] [size=16K]
	Capabilities: [80] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [f8] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
	Capabilities: [108 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [110 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [200 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000001 00000003 a0304098 d19e0a95
	Capabilities: [300 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Kernel driver in use: hailo
	Kernel modules: hailo_pci

The device appears to be operating correctly with PCIe link status (Speed 8GT/s, Width x4). However, the power management status is D3, which means the device is in a low-power state (D3 NoSoftRst+ PME-Enable-). I assume, maybe i am wrong (?), the device should be in state D0 (fully operational) when in use.

I tried to solve the problem by resetting the device:

echo 1 | sudo tee /sys/bus/pci/devices/0000:03:00.0/reset

This lead my server to crash my remote access but after rebooting the problem was solved and “hailortcli fw-control identify” showed up sucessfully and processing is working as before.

Another option might be to force the device in full power state:

sudo echo on > /sys/bus/pci/devices/0000:03:00.0/power/control

but I did not check this approach.

Thanks for the help. From my point of view the problem is solved.