Hi all,
I’ve been hitting a consistent crash running continuous YOLOv8m inference on a Pi 5 + AI HAT+ 2 (Hailo-10H). After some debugging I’ve narrowed it down to what I believe is a
firmware/driver issue rather than anything application-side.
Setup:
- Pi 5 2GB, Raspbian Lite (Trixie), kernel 6.12.47
- AI HAT+ 2 with active cooling
- hailo-h10-all 5.1.1, EEPROM up to date
- blacklist hailo_pci in modprobe.d
- dkms installed before hailo packages
The problem:
Inference works fine for about 2-3 minutes, then I get HAILO_COMMUNICATION_CLOSED(62) and the device becomes unresponsive. Only a full power cycle recovers it.
To rule out my application code, I stripped everything down to the absolute minimum - no camera, no display, just VDevice + InferModel + dummy numpy array in a loop:
vd = VDevice()
im = vd.create_infer_model(“yolov8m_h10.hef”)
im.input().set_format_type(FormatType.UINT8)
ctx = im.configure()
cm = ctx.enter()
bufs = {o.name: np.empty(o.shape, dtype=np.float32) for o in im.outputs}
bindings = cm.create_bindings(output_buffers=bufs)
dummy = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
while True:
bindings.input().set_buffer(np.array(dummy))
cm.run([bindings], 30000)
This crashes at frame ~5334 every time (~2 min 16s at 39fps). Totally reproducible.
What I’ve tried:
- Fresh Raspbian image (twice)
- pcie_aspm=off on the kernel cmdline - this extended it to ~6246 frames (~3 min 9s), so definitely PCIe power management related but not the whole story
- Reseating the FPC ribbon cable multiple times
- With and without dtparam=pciex1_gen=3 (no difference)
- Attempted 5.2.0 driver but it ships with an empty dkms.conf and the compiled module fails to load with “Invalid argument” on kernel 6.12.47
Pi stays cool throughout (48C), Hailo has active cooling, 1.6GB free memory. lspci shows the link at 8GT/s x1 which I understand is expected for the Pi 5’s single PCIe lane.
dmesg after crash:
Nothing useful - no PCIe errors, no kernel warnings. The device just silently drops the connection from the Hailo side.
Has anyone else seen this? Is there a known issue with sustained inference on the 10H, or should I be looking at an RMA?
Cheers