Hailo-10H throughput degrades irreversibly within minutes of continuous use (125 → 41 fps), only host reboot recovers

A Hailo-10H module on a Raspberry Pi 5 with the official AI Hat 2 starts at the
published 125 fps (yolo26s_hailo10h.hef, hailortcli benchmark) immediately
after host reboot, but throughput degrades monotonically and irreversibly
during continuous use. After ~3.4 minutes of back-to-back benchmarks (12 runs
× 15 s), fps starts dropping at ~3 fps/run, reaching ~82 fps at run 25. With
extended use we have observed it drop as low as 41 fps. The chip never
recovers in the same boot, not after cooldown, not after driver reload, not
after PCIe rescan, not after hailortcli fw-control reset --reset-type chip.
Only a full host reboot restores the published 125 fps.

The degradation is accompanied by cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -16 (-EBUSY) errors in the chip’s onboard Yocto
runtime log, suggesting CMA exhaustion / fragmentation inside the chip-side
Linux.

Affected device

Field Value
Chip Hailo-10H
Carrier Raspberry Pi 5 + official AI Hat 2
Active cooling Yes (official Pi 5 active cooler)
Chip serial FBFBBDDFC4DEF768D6CEF1F9
SoC ID 8CD33CE2AA4ABFFE8EAD7E1FCF27CEC6D819C55DADEFEB9FF8C9184DAE93ECCB
Board SKU-ID 6
LCS 5

Software versions

Component Version
HailoRT-CLI 5.3.0
Firmware 5.3.0 (release, app)
Driver hailo1x_pci 5.3.0 (srcversion 91195A5A35A8DAAA5717B7D)
Host kernel 6.12.75+rpt-rpi-2712 (Debian 13 Trixie)
Chip-side OS Linux 5.15.325.15.32-yocto-standard-01685-g4cd9cfd0e6e7
HEF yolo26s_hailo10h.hef, md5 07f0dd9d2f44834f75c123af243e7ec3, 13,815,808 bytes

PCIe state

LnkCap:  Speed 8GT/s, Width x4
LnkSta:  Speed 8GT/s, Width x1 (downgraded — RPi 5 M.2 slot limitation)
DevCtl:  MaxPayload 256 bytes, MaxReadReq 512 bytes

The Hailo-10H reaches its published 125 fps over this same x1 link when fresh,
so PCIe x1 bandwidth is not the bottleneck.

Reproduction

sudo systemctl stop tetherbox tetherbox-ai     # stop our app, no other Hailo clients
sudo reboot
# wait for boot, then immediately:
for i in $(seq 1 25); do
  sudo hailortcli benchmark /path/to/yolo26s_hailo10h.hef
done

Measured data — degradation curve (single boot, no other workload)

=== H10 degradation repro: 25 runs at default 15s, no sleep between ===
started at: Thu 30 Apr 05:08:01 BST 2026
run  1: fps=124.78 temp=60.70
run  2: fps=124.76 temp=62.82
run  3: fps=124.90 temp=64.36
run  4: fps=124.81 temp=65.70
run  5: fps=124.89 temp=66.90
run  6: fps=124.81 temp=67.96
run  7: fps=124.80 temp=68.94
run  8: fps=124.89 temp=69.83
run  9: fps=124.99 temp=70.69
run 10: fps=124.81 temp=71.29
run 11: fps=124.89 temp=71.96
run 12: fps=124.87 temp=72.60       ← last stable run
run 13: fps=122.80 temp=73.13       ← onset of degradation
run 14: fps=121.38 temp=73.61
run 15: fps=118.53 temp=73.96
run 16: fps=108.49 temp=73.99       ← temperature plateaus here
run 17: fps=101.51 temp=73.88
run 18: fps=98.34 temp=73.82
run 19: fps=96.64 temp=73.87
run 20: fps=94.04 temp=73.90
run 21: fps=91.14 temp=73.82
run 22: fps=88.55 temp=73.68
run 23: fps=85.65 temp=73.58
run 24: fps=83.99 temp=73.53
run 25: fps=81.60 temp=73.43
done at: Thu 30 Apr 05:14:42 BST 2026

Key observation: after run 16, chip temperature plateaus at 73-74 °C and
falls slightly, while throughput continues to drop monotonically.

Throughput is uncorrelated with temperature once steady-state thermal is reached.

In a separate session with extended use the chip degraded as far as 41 fps
(33 % of published) and stayed there until reboot.

Falsified theories — what the cause is NOT

All of the following were measured on this device and ruled out:

Theory Evidence against
Thermal throttling Chip plateaus at 73-74 °C from run 16 onwards; throughput continues to drop. After 4 min idle (chip cools to °C) throughput stays at fps (see cooldown experiment below).
Host RPi 5 SoC throttling vcgencmd get_throttled = 0x0
PCIe ASPM L1 / D3hot sleep policy=performance, mid-bench device state confirmed D0
PCIe x1 link cap Same x1 link delivers 125 fps when chip is fresh
Chip warming up (cache cold) Runs 1-12 at 124-125 fps with rising temp 60→72 °C; degradation begins after warmup
--power-mode ultra_performance Same degraded throughput
--batch-size 8 Same degraded throughput
--time-to-run 60 Same degraded throughput
HailoRT 4.x driver issues Reproduces on HailoRT 5.3.0 + driver 5.3.0 (latest), Debian 13 Trixie (the officially supported OS for HailoRT 4.23+)

Smoking gun — chip-side runtime log

sudo hailortcli logs runtime returns logs from the chip’s onboard Yocto Linux
(hostname hailo10, kernel 5.15.325-yocto). On every benchmark invocation:

hailo10 user.info HailoRT-Server: [device.cpp:54] [Device] OS Version: Linux 5.15.325.15.32-yocto-standard-01685-g4cd9cfd0e6e7 #1 SMP PREEMPT Wed Feb 18 10:02:32 UTC 2026 aarch64
hailo10 user.info HailoRT-Server: [control.cpp:89] [control__parse_identify_results] firmware_version is: 5.3.0
hailo10 kern.err kernel: cma: cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -16
hailo10 user.info HailoRT-Server: [vdevice.cpp:474] [configure] Configuring HEF on VDevice took 45.79156 milliseconds

ret: -16 is -EBUSY. Each hailortcli benchmark invocation does an
open → configure → run → close cycle. Every configure attempts a 256-page
(1 MiB) contiguous DMA buffer allocation and these allocations fail. The
runtime appears to fall back to non-contiguous allocations (so inference
continues) but evidently builds up state that throttles throughput as the
chip-side CMA region fragments further.

The trigger correlates with the number of configure cycles, not just
inference time — fresh boot to first degradation = 12 benchmark invocations
(every hailortcli benchmark is a fresh open + configure).

Recovery experiments — what does NOT recover throughput

All of the following were tried on the degraded device with no recovery:

Action Result Evidence
Wait for cooldown (4 min idle) No recovery See cooldown experiment below
sudo modprobe -r hailo1x_pci && sudo modprobe hailo1x_pci No recovery dmesg: SOC Firmware batch was already loaded, Firmware loaded in 0 ms — chip retains state
echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove + echo 1 > /sys/bus/pci/rescan No recovery Same Firmware loaded in 0 ms
hailortcli fw-control reset --reset-type chip Bricks the chip until host reboot Timeout waiting for vDMA boot data completion, Failed writing firmware files over vDMA. err -110, Failed activating board -110

hailortcli fw-control reset --reset-type chip — separate bug

The chip reset disconnects the device successfully but the driver then
fails to re-upload firmware over vDMA. Full dmesg:

hailo1x 0001:01:00.0: Firmware batch programming completed for stage 3
hailo1x 0001:01:00.0: Timeout waiting for vDMA boot data completion
hailo1x 0001:01:00.0: Failed writing firmware files over vDMA. err -110
hailo1x 0001:01:00.0: Failed writing SOC firmware on stage 3
hailo1x 0001:01:00.0: SCU log could not be read from device
hailo1x 0001:01:00.0: Firmware load failed
hailo1x 0001:01:00.0: Failed activating board -110
hailo1x 0001:01:00.0: probe with driver hailo1x failed with error -110

The device is unrecoverable in this state until a full host reboot.

Cooldown experiment — proves degradation is not thermal

After the 25-run degradation test (chip at ~52 fps, temp ~58 °C):

[FILLED FROM /tmp/h10_cooldown.log AFTER COMPLETION]

Recovery — only host reboot restores throughput

After sudo reboot, the very first benchmark returns to the published
performance and stays stable for at least 12 runs:

=== uptime ===
 04:59:52 up 1 min,  2 users,  load average: 0.79, 0.48, 0.19
=== fresh boot bench 1 ===  yolo26s: FPS: 124.90  temp mean=59.73
=== fresh boot bench 2 ===  yolo26s: FPS: 125.01  temp mean=61.34
=== fresh boot bench 3 ===  yolo26s: FPS: 124.79  temp mean=62.38
=== fresh boot bench 4 ===  yolo26s: FPS: 124.30  temp mean=63.26
=== fresh boot bench 5 ===  yolo26s: FPS: 124.64  temp mean=64.16
=== fresh boot bench 6 ===  yolo26s: FPS: 124.90  temp mean=64.93

Reboot is the only known recovery path. We have observed the same chip in the
same boot deliver everything from 41 fps to 125 fps depending on cumulative
use.

Asks

  1. Identify and fix the chip-side CMA leak. Each open/configure cycle
    appears to leak (or fragment) the chip’s contiguous-memory pool. After
    ~12 cycles the leak is large enough to throttle inference throughput
    monotonically. Cooldown does not free the leaked memory; only a full chip
    power cycle does.

  2. Provide a working in-band recovery path that does not require a host
    reboot. hailortcli fw-control reset --reset-type chip looks like the
    intended path but currently leaves the device in an unrecoverable state
    (vDMA boot timeout err -110). Either fix the post-reset firmware
    re-upload, or document an alternative supported procedure.

  3. Document any chip-side configuration knobs for the onboard CMA region
    size that we could tune from the host (e.g., via the SCU firmware files
    uploaded on probe). If none exist, please consider adding them.

Reproducer scripts

The exact reproducer is included verbatim above (for i in $(seq 1 25)). The
chip-side runtime log evidence is captured via sudo hailortcli logs runtime.

Happy to run any further diagnostic Hailo would like, we have permanent
access to the device and can capture additional logs (full dmesg, SCU logs,
runtime logs at any point in the degradation curve) on request.

Hi @RGaufman,

Thanks for sharing - we will look into it.

Thanks,
Michael.

1 Like

Let me know please, as currently we have to reboot after a few runs of our testing suite, which slows down testing/iterations.

Hi @RGaufman ,

We are still investigating. I’ll keep you updated here.

Hi @RGaufman,

Unfortunately we can’t reproduce this - we just see the expected degradation to ~111 FPS around thermal throttling.

We would suggest cleaning all Hailo installations and reinstalling the entire runtime SW suite: Raspberry Trixie Error with Guide (Pi 5, AI Hat 2) - #18 by Michael

Thanks,
Michael.