Hailo-10H COMMUNICATION_CLOSED after ~5000 frames of continuous inference (AI HAT+ 2, Pi 5)

Hi all,

I’ve been hitting a consistent crash running continuous YOLOv8m inference on a Pi 5 + AI HAT+ 2 (Hailo-10H). After some debugging I’ve narrowed it down to what I believe is a
firmware/driver issue rather than anything application-side.

Setup:

  • Pi 5 2GB, Raspbian Lite (Trixie), kernel 6.12.47
  • AI HAT+ 2 with active cooling
  • hailo-h10-all 5.1.1, EEPROM up to date
  • blacklist hailo_pci in modprobe.d
  • dkms installed before hailo packages

The problem:

Inference works fine for about 2-3 minutes, then I get HAILO_COMMUNICATION_CLOSED(62) and the device becomes unresponsive. Only a full power cycle recovers it.

To rule out my application code, I stripped everything down to the absolute minimum - no camera, no display, just VDevice + InferModel + dummy numpy array in a loop:

vd = VDevice()
im = vd.create_infer_model(“yolov8m_h10.hef”)
im.input().set_format_type(FormatType.UINT8)
ctx = im.configure()
cm = ctx.enter()
bufs = {o.name: np.empty(o.shape, dtype=np.float32) for o in im.outputs}
bindings = cm.create_bindings(output_buffers=bufs)
dummy = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)

while True:
    bindings.input().set_buffer(np.array(dummy))
    cm.run([bindings], 30000)

This crashes at frame ~5334 every time (~2 min 16s at 39fps). Totally reproducible.

What I’ve tried:

  • Fresh Raspbian image (twice)
  • pcie_aspm=off on the kernel cmdline - this extended it to ~6246 frames (~3 min 9s), so definitely PCIe power management related but not the whole story
  • Reseating the FPC ribbon cable multiple times
  • With and without dtparam=pciex1_gen=3 (no difference)
  • Attempted 5.2.0 driver but it ships with an empty dkms.conf and the compiled module fails to load with “Invalid argument” on kernel 6.12.47

Pi stays cool throughout (48C), Hailo has active cooling, 1.6GB free memory. lspci shows the link at 8GT/s x1 which I understand is expected for the Pi 5’s single PCIe lane.

dmesg after crash:
Nothing useful - no PCIe errors, no kernel warnings. The device just silently drops the connection from the Hailo side.

Has anyone else seen this? Is there a known issue with sustained inference on the 10H, or should I be looking at an RMA?

Cheers

Hi @SamSkjord Thanks for letting us know, we are checking internally and will come back to you here ASAP.

Hi @SamSkjord ,

Can you please try following the complete uninstall: Raspberry Trixie Error with Guide (Pi 5, AI Hat 2) - #18 by Michael

Then installing manually 5.2.0 from our developer zone?
You will need for arm64:

sudo apt install PCIe driver deb
sudo apt install HailoRT deb
sudo reboot now
sudo apt install Tappas deb
pip install HailoRT Py wheel --break-system-packages
pip install Tappas Py wheel --break-system-packages

There are also details here: Upgrading to HailoRT 5.2.0 - step by step (Raspberry PI & Hailo Apps)

Thanks,

Thanks Michael. I followed the uninstall and reinstall process but ran into compatibility issues with the AI HAT+ 2 (Hailo-10H):

  • The hailort and hailort-pcie-driver packages available via apt are 4.23.0, which only include Hailo-8 firmware and the hailo_pci driver module. They don’t support the Hailo-10H.
  • The 5.2.0 .deb files from the developer zone (hailort_5.2.0_arm64.deb, hailort-pcie-driver_5.2.0_all.deb) install the driver and runtime, but:
    • No Hailo-10H firmware is included (the /lib/firmware/hailo/hailo10h/ directory is empty)
    • No python3-hailort package at 5.2.0 is available - the apt version is 4.23.0 which needs libhailort.so.4.23.0 and won’t work with the 5.2.0 runtime
  • The only packages that support the Hailo-10H are the h10-hailort / h10-hailort-pcie-driver / hailo-h10-all set at version 5.1.1, which is where the crash occurs.

I should also mention that the Hailo firmware loading itself is unreliable - only about 1 in 3 cold boots successfully loads the firmware. Most fail with “Timeout waiting for firmware file on stage 2”. This is on a fresh Raspbian Lite Trixie image with updated EEPROM, dkms installed before hailo packages, and pcie_aspm=off on the kernel cmdline. We also tested with and without dtparam=pciex1_gen=3 in config.txt - no
consistent difference. lspci confirms the link runs at 8GT/s Gen3 x1 either way, so the HAT appears to auto-negotiate correctly. The boot failure happens regardless of which driver version is installed (5.1.1 or 5.2.0).

Could you clarify which specific packages and versions support the AI HAT+ 2 at 5.2.0? Or is there a separate Hailo-10H build that I’m missing?

For reference, here’s the full diagnostic info:

Kernel: 6.12.47+rpt-rpi-2712 (aarch64)
EEPROM: Wed 5 Nov 17:37:18 UTC 2025 (up to date)
Pi model: Raspberry Pi 5 2GB

Packages:
h10-hailort              5.1.1
h10-hailort-pcie-driver  5.1.1
hailo-h10-all            5.1.1
python3-h10-hailort      5.1.1-1
hailo-tappas-core        5.1.0

PCIe link:
LnkCap: Speed 8GT/s, Width x4, ASPM L0s L1
LnkSta: Speed 8GT/s, Width x1 (downgraded)

cmdline includes: pcie_aspm=off
config.txt: dtoverlay=vc4-kms-dsi-waveshare-panel,4_0_inchC (no pciex1_gen override)

dkms: hailo1x_pci/5.1.1 installed for 6.12.47+rpt-rpi-2712
Blacklist: hailo_pci in /etc/modprobe.d/hailo-blacklist.conf

The bare stress test that reproduces the crash is in my original post - just VDevice + InferModel + dummy data, no other hardware involved.

HI @SamSkjord,

Just to make sure about the steps you took:

  1. Complete uninstall - starting from clean fresh env.
  2. Only manual installation of all the 5 files with version 5.2
  3. Specifically - skipping the dedicated sudo apt install hailo-h10-all

Can you please confirm?

Thanks,

Following the upgrade guide, I installed the 5 files from the developer zone (hailort-pcie-driver, hailort, hailo-tappas-core, hailo-gen-ai-model-zoo debs,
plus the Python wheel). A few issues along the way:

  • The hailort-pcie-driver_5.2.0_all.deb conflicted with h10-hailort-pcie-driver (5.1.1) - had to purge the h10 packages first
  • The Python wheel is cp312 only, but Trixie ships Python 3.13. Had to build Python 3.12.9 from source and create a separate venv to get the wheel to
    install

Final working package state:
hailort 5.2.0
hailort-pcie-driver 5.2.0
hailo-tappas-core 5.2.0
hailo-gen-ai-model-zoo 5.2.0
hailo-models 1.0.0-2 (from apt, for the .hef files)
Python wheel: hailort-5.2.0-cp312 (in Python 3.12 venv)
Firmware: /lib/firmware/hailo/hailo10h/ (from old h10-hailort-pcie-driver 5.1.1)
Kernel: 6.12.75+rpt-rpi-2712

5.2.0 fixes the standalone inference crash!

The bare stress test (VDevice + InferModel + dummy data, no camera, no display) now runs the full 5 minutes with zero errors:

HailoRT 5.2.0: 11773 frames, 300s, 0 errors (39fps)
HailoRT 5.1.1: crashed at 5334 frames (136s)

Camera + Hailo without display also runs clean:
Camera + Hailo (no pygame): 7754 frames, 309s, 0 errors (~27fps with real USB camera frames)

Display + Hailo still crashes

When pygame renders to a Waveshare DSI display via SDL2 KMSDRM, the Hailo crashes on the very first inference with HAILO_COMMUNICATION_CLOSED even when
running in a completely separate process via multiprocessing (spawn):

Camera + Hailo + pygame KMSDRM: crashes on frame 1

I’ve confirmed the conflict is specifically the V3D GPU. On the Pi 5 there are three DRM cards:

  • card0: rp1dsi (DSI display controller, no GPU)
  • card1: V3D (GPU)
  • card2: vc6 (video core, HDMI)

SDL2’s KMSDRM backend requires a render node, so it uses card1 (V3D) for compositing even though the DSI display is on card0. The V3D GPU’s DMA operations
appear to conflict with the Hailo’s PCIe VDMA at the kernel level, process isolation doesn’t help since they share kernel DMA resources.

Rendering without display (dummy driver), without GPU (software render), and without pygame at all have all been tested, the Hailo only crashes when V3D
GPU page flips are active.

Is there a known incompatibility between V3D GPU DMA and Hailo-10H PCIe VDMA on Pi 5? Or is there a way to render to the DSI display through card0 without
involving V3D?

Hi @SamSkjord,

I suspect the first step of complete uninstall was not successful.
The target is to have the RPI clean completely from all previous Hailo leftovers.
In such case, the manual installation should proceed without any conflicts.
In case you wish, please reach me via a private message to set up RPI connect and I’ll assist remotely.

Thanks,
Michael.

Thanks Michael. The install was done on a completely fresh Raspbian Lite Trixie image with no previous Hailo packages installed. The steps were:

  1. Fresh flash of Raspbian Lite (Trixie) to SD card
  2. sudo apt update && sudo apt full-upgrade 3. sudo apt install dkms
  3. Installed the 5 files from the developer zone (hailort-pcie-driver, hailort, hailo-tappas-core, hailo-gen-ai-model-zoo debs, plus the cp312 Python wheel)
  4. Built Python 3.12.9 from source since Trixie ships 3.13 and the wheel is cp312 only
  5. Created a Python 3.12 venv and installed the wheel there

No hailo-h10-all, no h10-hailort, no previous versions. Completely from scratch. The only package from apt was hailo-models for the .hef files, and the
Hailo-10H firmware files at /lib/firmware/hailo/hailo10h/ which I copied from the h10-hailort-pcie-driver 5.1.1 package since the 5.2.0 driver doesn’t
include them.

To be clear, 5.2.0 standalone inference works perfectly at 11,773 frames over 5 minutes with zero errors. The issue is specifically when SDL2 KMSDRM is
active in the same kernel, even in a different process. I’ve done a fair amount of debugging to narrow this down:

What I’ve tested on 5.2.0 (all from fresh boots, no hailortcli beforehand):

Test Result
Bare Hailo, dummy data, no display 11,773 frames, 5 min, 0 errors
Camera + Hailo + pygame KMSDRM (same process) Crashes on frame 1
Camera + Hailo + pygame KMSDRM (separate process via multiprocessing spawn) Crashes on frame 1
SDL dummy driver + framebuffer mmap (/dev/fb0) 5 min, 0 errors
USB camera + Hailo, no display 7,754 frames, 5 min, 0 errors
V3D GPU blacklisted + KMSDRM Ran briefly then crashed

The DRM cards on my Pi 5 are:

  • card0: raspberrypi,rp1dsi (DSI panel, no GPU)
  • card1: brcm,2712-v3d (GPU)
  • card2: brcm,bcm2712-vc6 (video core, HDMI)

SDL2 KMSDRM uses card1/card2 for compositing even though the display is on card0. Blacklisting V3D removes card1 but vc4 still does DRM page flips through
card2 and the crash persists.

The only configuration that keeps Hailo stable is bypassing DRM entirely by rendering to pygame surfaces with the SDL dummy driver and writing raw pixels to
/dev/fb0 via mmap. This works but loses GPU acceleration and SDL input handling.

Thanks @SamSkjord , we are checking it.

Thanks Michael

Further findings with HailoRT 5.2.0:
The standalone stress test (dummy numpy data, no camera, no display) remains stable at 11,773 frames / 5 minutes with zero errors, confirming 5.2.0 fixed the core runtime issue.

However, when using real USB camera frames (OpenCV V4L2, 1280x720 MJPG at 30fps) alongside inference, the Hailo crashes after 30 seconds to 4 minutes with “Failed to launch transfer” in dmesg, regardless of display method:

Testing matrix (all on HailoRT 5.2.0, kernel 6.12.75):

  • Bare Hailo, dummy data, no camera, no display: 11,773 frames, 5 min, 0 errors
  • Camera + Hailo, no display: 7,754 frames, 5 min, 0 errors (1 timeout at 309s)
  • Camera + Hailo + display (any method): crashes at 30s to 4min

The display method doesn’t matter. Tested KMSDRM, Wayland (labwc), and direct framebuffer (/dev/fb0 mmap with SDL dummy driver). All crash at roughly the same point. The common factor is USB camera V4L2 DMA + Hailo PCIe VDMA running simultaneously with any rendering.

The camera-only test (no display) ran nearly clean for 5 minutes, so the camera DMA alone doesn’t kill Hailo. It’s the three-way combination of camera DMA + inference DMA + display writes that causes the crash.

dmesg shows “Failed to launch transfer” errors starting at ~238s after boot, with no prior warnings. No PCIe errors, no kernel warnings. The device becomes completely unresponsive after the first transfer failure.

Pi 5 2GB, USB camera on xhci-hcd.0-1, Hailo on PCIe 0001:01:00.0, DSI display on rp1-dsi.

Happy to provide remote access if that would help investigate.