Hailo-10H (AI HAT+ 2) on Raspberry Pi 5: HAILO_VDMA_LAUNCH_TRANSFER failed with 5 (EIO) on multi-context LLM/STT initialization

Hi everyone,

I am experiencing an issue where my new Raspberry Pi AI HAT+ 2 (featuring the Hailo-10H with 8GB RAM) is failing to load multi-context models (such as Qwen2 and Whisper-Base) using the Python/C++ APIs, even though running individual network groups via hailortcli run2 set-net works successfully.

Here are the details of my setup and the diagnostic logs.

Environment & Specs

  • Host Platform: Raspberry Pi 5 (8GB)
  • OS: Raspberry Pi OS 64-bit (Debian Trixie base)
  • Kernel: 6.12.34+rpt-rpi-v8 #1 SMP PREEMPT
  • NPU Hardware: Raspberry Pi AI HAT+ 2 (Hailo-10H, 8GB dedicated memory)
  • Power Supply: Official Raspberry Pi 27W USB-C PSU
  • HailoRT Version: 5.3.0
  • PCIe Driver Version: 5.3.0 (from hailort-pcie-driver)
  • Firmware Version: 5.3.0 (programmed successfully by driver during boot)

What Works:

Running individual groups of a compiled HEF independently with hailortcli run2 set-net works perfectly at Gen 2 speed. For example, benchmarking Whisper-Base or Qwen2 groups:

  • hailortcli run2 set-net Whisper-Base.hef --name whisper_base_10s_encoderRuns at 19.78 FPS (Succeeded)
  • hailortcli run2 set-net qwen2_1.5b.hef --name base_model__prefillRuns at 3.40 FPS (Succeeded)

The Problem:

When attempting to initialize the model using high-level APIs that load the full pipeline (e.g. C++ LLM class in hailo-ollama, or Python LLM(vdevice, model_path) and Speech2Text(vdevice, model_path) classes), the initialization fails immediately on the first launch transfer:

Application Log:

[HailoRT] [error] Ioctl HAILO_VDMA_LAUNCH_TRANSFER failed with 5. Read dmesg log for more info
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_OPERATION_FAILED(36) - Failed launch transfer
[HailoRT] [error] Ioctl HAILO_SOC_CLOSE failed due to timeout
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_TIMEOUT(87) - Failed soc_close

Host dmesg Output during failure:

[55.963169] hailo1x 0001:01:00.0: Timeout waiting for soc control (timeout_ms=1000)
[55.963175] hailo1x 0001:01:00.0: soc_close failed with err=-110
[55.978742] Channel 0 num-available HW mismatch (20d!=65535d)
[55.978752] hailo1x 0001:01:00.0: Failed to launch transfer
[55.978865] Channel 0 num-available HW mismatch (20d!=65535d)
[55.978868] hailo1x 0001:01:00.0: Failed to launch transfer

Once this timeout occurs, the PCIe NPU interface is wedged. If I attempt to programmatically remove and rescan the device to reload firmware, the driver fails to re-activate the board over vDMA:

[417.173000] hailo1x 0001:01:00.0: Timeout waiting for vDMA boot data completion
[417.173050] hailo1x 0001:01:00.0: Failed writing firmware files over vDMA. err -110
[417.173100] hailo1x 0001:01:00.0: Failed writing SOC firmware on stage 3
[421.199000] hailo1x 0001:01:00.0: SCU log could not be read from device
[421.199100] hailo1x 0001:01:00.0: Firmware load failed
[422.020000] hailo1x 0001:01:00.0: Failed activating board -110
[422.020100] hailo1x 0001:01:00.0: probe with driver hailo1x failed with error -110

Only a full hardware power cycle recovers the board back to detection.


Troubleshooting Executed So Far:

  1. Physical Connections: Re-seated the FFC (ribbon) cable completely on both the Pi 5 and the AI HAT+ 2 connector slots. The cable is completely flat and locked down.

  2. PCIe Gen Settings: Tested at both dtparam=pciex1_gen=2 and dtparam=pciex1_gen=3. The behavior is identical.

  3. PCIe Bus Integrity Audit: Polled the PCIe Advanced Error Reporting (AER) registers immediately after running the successful hailortcli run2 benchmarks. The error status registers show exactly zero receiver errors (RxErr-, BadTLP-, BadDLLP-), proving the physical cable and PCIe link layer are not experiencing transmission drops.

Is this a known issue/bug with the 5.3.0 firmware and PCIe driver when allocating memory channels for multi-context pipelines, or is it likely a defect with the LPDDR4X memory/silicon on the AI HAT+ 2 board itself?

Thanks in advance for the help!