How to force 2-context compilation for custom YOLOv8m-pose? (3-context gives lower FPS than official 2-context HEF)

Hi,

I’m compiling a custom-trained YOLOv8m-pose model (17 keypoints, 1 class - fall detection) using DFC
3.30.0, and the compiler always selects 3 contexts, resulting in 56 FPS on Hailo-8.

However, the official yolov8m_pose.hef from the Model Zoo runs with 2 contexts at 65 FPS — a 16%
performance gap.

Compilation log (key section)

[info] Trying to compile the network in a single context
[info] Single context flow failed: Recoverable single context error
[info] Using Multi-context flow
[info] Resources optimization params: max_control_utilization=60%, max_compute_utilization=60%,
max_memory_utilization=60%

Found valid partition to 2 contexts
Found valid partition to 2 contexts, Performance improved by 13.7%
Found valid partition to 2 contexts, Performance improved by 11.2%
Found valid partition to 3 contexts, Performance improved by 4.7%   ← switches to 3 here
…
[info] Partitioner finished after 220 iterations
[info] Applying selected partition to 3 contexts…

The compiler does find valid 2-context partitions, but then switches to 3 contexts because it estimates ~4.7% better theoretical performance. In practice, the context switching overhead makes 3 contexts slower than 2.

What I’ve tried

  • context_switch_param(max_control_utilization=0.6, max_compute_utilization=0.6,
    max_memory_utilization=0.6, max_utilization=0.8) — still 3 contexts
  • Setting utilization values higher — compiler still prefers 3 contexts
  • I noticed yolov8n.alls in the Model Zoo uses:
    allocator_param(width_splitter_defuse=disabled, spatial_defuse_legacy=True)
  • But when I try spatial_defuse_legacy=True in my model script, it fails with an error (not recognized in public DFC).

Questions

  1. Is there a supported way to limit max contexts to 2 (e.g., max_contexts=2 parameter)?
  2. Is spatial_defuse_legacy available in public DFC releases, or is it internal-only?
  3. Any other approach to force the compiler to keep the 2-context partition it already found?

Environment

  • DFC version: 3.30.0
  • Target: Hailo-8
  • Model: Custom YOLOv8m-pose (ONNX export from Ultralytics, 17 keypoints, 1 class)
  • Platform: RK3588 + Hailo-8 M.2

Thanks in advance

Hi @user252,

Welcome to the Hailo Community!

To increase the FPS, you can try compiling the model with “performance mode”. This mode will force the compiler to compile the model at different resources utilization levels (e.g. 100%, 90%, …) and select the best solution. If the allocation fails for a certain utilization level, the compiler will automatically skip to the next one. For this reason, compilation time will be significantly longer.
You can activate performance mode by adding this line in the model script:

performance_param(compiler_optimization_level=max)

This method does not necessarily imply a lower number of contexts. but it may help increase the FPS.

Hi, pierrem

I actually tried the opposite approach first — setting compiler_optimization_level=0 together with a highermax_utilization — and wanted to share what happened, because it wasn’t what I expected.

Attempt 1 — Forcing fewer contexts:

performance_param(compiler_optimization_level=0) context_switch_param(max_control_utilization=0.85, max_compute_utilization=0.85, max_memory_utilization=0.85, max_utilization=0.95)

This did successfully reduce my yolov8m-pose from 3 contexts down to 2, which I thought would improve throughput. But FPS actually decreased:

compiler_optimization_level=0 + util=0.85 setting model : context(2), FPS : 53
default setting model : context(3), FPS : 56.3

Looking at the compiler log, the reason became clear — the two contexts were highly asymmetric in workload:

context_0: bottleneck fps ~104 ← bottleneck
context_1: bottleneck fps ~271 ← lightly loaded

Since the pipeline is only as fast as the slowest context, cramming everything into 2 contexts created a bottleneck on context_0 and the whole pipeline paid for it. The default 3-context partition balanced the load much better.

Attempt 2 — Your suggestion (compiler_optimization_level=max):
= default setting model : context(3), FPS : 56.3

Switching to performance_param(compiler_optimization_level=max) let the compiler explore multiple utilization ratios. In my case it actually ended up with more contexts than the =0 version, but each one was better balanced — and overall FPS went up, not down.

So the takeaway (at least for my model): fewer contexts ≠ faster. Context balance matters more than context count, and compiler_optimization_level=max finds a better balance than forcing =0. Manually dropping the context count via =0 was actually a net loss for me.

Thanks again for pointing me in the right direction!

Your takeaway is correct. That’s why performance mode managed to provide higher throughput.
Also, I noticed that the throughput you obtained is aligned with the performance of the reference yolov8m_pose compiled with DFC 3.30 (Model Zoo v2.14). See this page.
The model’s performance increased to 68 FPS when compiled with more recent DFC versions (for example, in Model Zoo v2.18)

1 Like