A Hailo-10H module on a Raspberry Pi 5 with the official AI Hat 2 starts at the
published 125 fps (yolo26s_hailo10h.hef, hailortcli benchmark) immediately
after host reboot, but throughput degrades monotonically and irreversibly
during continuous use. After ~3.4 minutes of back-to-back benchmarks (12 runs
× 15 s), fps starts dropping at ~3 fps/run, reaching ~82 fps at run 25. With
extended use we have observed it drop as low as 41 fps. The chip never
recovers in the same boot, not after cooldown, not after driver reload, not
after PCIe rescan, not after hailortcli fw-control reset --reset-type chip.
Only a full host reboot restores the published 125 fps.
The degradation is accompanied by cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -16 (-EBUSY) errors in the chip’s onboard Yocto
runtime log, suggesting CMA exhaustion / fragmentation inside the chip-side
Linux.
Affected device
| Field | Value |
|---|---|
| Chip | Hailo-10H |
| Carrier | Raspberry Pi 5 + official AI Hat 2 |
| Active cooling | Yes (official Pi 5 active cooler) |
| Chip serial | FBFBBDDFC4DEF768D6CEF1F9 |
| SoC ID | 8CD33CE2AA4ABFFE8EAD7E1FCF27CEC6D819C55DADEFEB9FF8C9184DAE93ECCB |
| Board SKU-ID | 6 |
| LCS | 5 |
Software versions
| Component | Version |
|---|---|
| HailoRT-CLI | 5.3.0 |
| Firmware | 5.3.0 (release, app) |
| Driver | hailo1x_pci 5.3.0 (srcversion 91195A5A35A8DAAA5717B7D) |
| Host kernel | 6.12.75+rpt-rpi-2712 (Debian 13 Trixie) |
| Chip-side OS | Linux 5.15.325.15.32-yocto-standard-01685-g4cd9cfd0e6e7 |
| HEF | yolo26s_hailo10h.hef, md5 07f0dd9d2f44834f75c123af243e7ec3, 13,815,808 bytes |
PCIe state
LnkCap: Speed 8GT/s, Width x4
LnkSta: Speed 8GT/s, Width x1 (downgraded — RPi 5 M.2 slot limitation)
DevCtl: MaxPayload 256 bytes, MaxReadReq 512 bytes
The Hailo-10H reaches its published 125 fps over this same x1 link when fresh,
so PCIe x1 bandwidth is not the bottleneck.
Reproduction
sudo systemctl stop tetherbox tetherbox-ai # stop our app, no other Hailo clients
sudo reboot
# wait for boot, then immediately:
for i in $(seq 1 25); do
sudo hailortcli benchmark /path/to/yolo26s_hailo10h.hef
done
Measured data — degradation curve (single boot, no other workload)
=== H10 degradation repro: 25 runs at default 15s, no sleep between ===
started at: Thu 30 Apr 05:08:01 BST 2026
run 1: fps=124.78 temp=60.70
run 2: fps=124.76 temp=62.82
run 3: fps=124.90 temp=64.36
run 4: fps=124.81 temp=65.70
run 5: fps=124.89 temp=66.90
run 6: fps=124.81 temp=67.96
run 7: fps=124.80 temp=68.94
run 8: fps=124.89 temp=69.83
run 9: fps=124.99 temp=70.69
run 10: fps=124.81 temp=71.29
run 11: fps=124.89 temp=71.96
run 12: fps=124.87 temp=72.60 ← last stable run
run 13: fps=122.80 temp=73.13 ← onset of degradation
run 14: fps=121.38 temp=73.61
run 15: fps=118.53 temp=73.96
run 16: fps=108.49 temp=73.99 ← temperature plateaus here
run 17: fps=101.51 temp=73.88
run 18: fps=98.34 temp=73.82
run 19: fps=96.64 temp=73.87
run 20: fps=94.04 temp=73.90
run 21: fps=91.14 temp=73.82
run 22: fps=88.55 temp=73.68
run 23: fps=85.65 temp=73.58
run 24: fps=83.99 temp=73.53
run 25: fps=81.60 temp=73.43
done at: Thu 30 Apr 05:14:42 BST 2026
Key observation: after run 16, chip temperature plateaus at 73-74 °C and
falls slightly, while throughput continues to drop monotonically.
Throughput is uncorrelated with temperature once steady-state thermal is reached.
In a separate session with extended use the chip degraded as far as 41 fps
(33 % of published) and stayed there until reboot.
Falsified theories — what the cause is NOT
All of the following were measured on this device and ruled out:
| Theory | Evidence against |
|---|---|
| Thermal throttling | Chip plateaus at 73-74 °C from run 16 onwards; throughput continues to drop. After 4 min idle (chip cools to °C) throughput stays at fps (see cooldown experiment below). |
| Host RPi 5 SoC throttling | vcgencmd get_throttled = 0x0 |
| PCIe ASPM L1 / D3hot sleep | policy=performance, mid-bench device state confirmed D0 |
| PCIe x1 link cap | Same x1 link delivers 125 fps when chip is fresh |
| Chip warming up (cache cold) | Runs 1-12 at 124-125 fps with rising temp 60→72 °C; degradation begins after warmup |
--power-mode ultra_performance |
Same degraded throughput |
--batch-size 8 |
Same degraded throughput |
--time-to-run 60 |
Same degraded throughput |
| HailoRT 4.x driver issues | Reproduces on HailoRT 5.3.0 + driver 5.3.0 (latest), Debian 13 Trixie (the officially supported OS for HailoRT 4.23+) |
Smoking gun — chip-side runtime log
sudo hailortcli logs runtime returns logs from the chip’s onboard Yocto Linux
(hostname hailo10, kernel 5.15.325-yocto). On every benchmark invocation:
hailo10 user.info HailoRT-Server: [device.cpp:54] [Device] OS Version: Linux 5.15.325.15.32-yocto-standard-01685-g4cd9cfd0e6e7 #1 SMP PREEMPT Wed Feb 18 10:02:32 UTC 2026 aarch64
hailo10 user.info HailoRT-Server: [control.cpp:89] [control__parse_identify_results] firmware_version is: 5.3.0
hailo10 kern.err kernel: cma: cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -16
hailo10 user.info HailoRT-Server: [vdevice.cpp:474] [configure] Configuring HEF on VDevice took 45.79156 milliseconds
ret: -16 is -EBUSY. Each hailortcli benchmark invocation does an
open → configure → run → close cycle. Every configure attempts a 256-page
(1 MiB) contiguous DMA buffer allocation and these allocations fail. The
runtime appears to fall back to non-contiguous allocations (so inference
continues) but evidently builds up state that throttles throughput as the
chip-side CMA region fragments further.
The trigger correlates with the number of configure cycles, not just
inference time — fresh boot to first degradation = 12 benchmark invocations
(every hailortcli benchmark is a fresh open + configure).
Recovery experiments — what does NOT recover throughput
All of the following were tried on the degraded device with no recovery:
| Action | Result | Evidence |
|---|---|---|
| Wait for cooldown (4 min idle) | No recovery | See cooldown experiment below |
sudo modprobe -r hailo1x_pci && sudo modprobe hailo1x_pci |
No recovery | dmesg: SOC Firmware batch was already loaded, Firmware loaded in 0 ms — chip retains state |
echo 1 > /sys/bus/pci/devices/0001:01:00.0/remove + echo 1 > /sys/bus/pci/rescan |
No recovery | Same Firmware loaded in 0 ms |
hailortcli fw-control reset --reset-type chip |
Bricks the chip until host reboot | Timeout waiting for vDMA boot data completion, Failed writing firmware files over vDMA. err -110, Failed activating board -110 |
hailortcli fw-control reset --reset-type chip — separate bug
The chip reset disconnects the device successfully but the driver then
fails to re-upload firmware over vDMA. Full dmesg:
hailo1x 0001:01:00.0: Firmware batch programming completed for stage 3
hailo1x 0001:01:00.0: Timeout waiting for vDMA boot data completion
hailo1x 0001:01:00.0: Failed writing firmware files over vDMA. err -110
hailo1x 0001:01:00.0: Failed writing SOC firmware on stage 3
hailo1x 0001:01:00.0: SCU log could not be read from device
hailo1x 0001:01:00.0: Firmware load failed
hailo1x 0001:01:00.0: Failed activating board -110
hailo1x 0001:01:00.0: probe with driver hailo1x failed with error -110
The device is unrecoverable in this state until a full host reboot.
Cooldown experiment — proves degradation is not thermal
After the 25-run degradation test (chip at ~52 fps, temp ~58 °C):
[FILLED FROM /tmp/h10_cooldown.log AFTER COMPLETION]
Recovery — only host reboot restores throughput
After sudo reboot, the very first benchmark returns to the published
performance and stays stable for at least 12 runs:
=== uptime ===
04:59:52 up 1 min, 2 users, load average: 0.79, 0.48, 0.19
=== fresh boot bench 1 === yolo26s: FPS: 124.90 temp mean=59.73
=== fresh boot bench 2 === yolo26s: FPS: 125.01 temp mean=61.34
=== fresh boot bench 3 === yolo26s: FPS: 124.79 temp mean=62.38
=== fresh boot bench 4 === yolo26s: FPS: 124.30 temp mean=63.26
=== fresh boot bench 5 === yolo26s: FPS: 124.64 temp mean=64.16
=== fresh boot bench 6 === yolo26s: FPS: 124.90 temp mean=64.93
Reboot is the only known recovery path. We have observed the same chip in the
same boot deliver everything from 41 fps to 125 fps depending on cumulative
use.
Asks
-
Identify and fix the chip-side CMA leak. Each open/configure cycle
appears to leak (or fragment) the chip’s contiguous-memory pool. After
~12 cycles the leak is large enough to throttle inference throughput
monotonically. Cooldown does not free the leaked memory; only a full chip
power cycle does. -
Provide a working in-band recovery path that does not require a host
reboot.hailortcli fw-control reset --reset-type chiplooks like the
intended path but currently leaves the device in an unrecoverable state
(vDMA boot timeouterr -110). Either fix the post-reset firmware
re-upload, or document an alternative supported procedure. -
Document any chip-side configuration knobs for the onboard CMA region
size that we could tune from the host (e.g., via the SCU firmware files
uploaded on probe). If none exist, please consider adding them.
Reproducer scripts
The exact reproducer is included verbatim above (for i in $(seq 1 25)). The
chip-side runtime log evidence is captured via sudo hailortcli logs runtime.
Happy to run any further diagnostic Hailo would like, we have permanent
access to the device and can capture additional logs (full dmesg, SCU logs,
runtime logs at any point in the degradation curve) on request.