HAR cannot be recompiled with DFC 5.2.0 due to Context-Partition Topology Error which Blocks LoRA Deployment

The qwen2_1.5b_instruct.q.har cannot be recompiled to HEF using DFC 5.2.0 even with zero modifications - this is the only HAR file that Hailo offers. Running ClientRunner(har=...).compile() on the unmodified HAR fails after ~15 hours with a Context-Partition topology error in the allocator. This blocks any workflow involving custom LoRA adapters, since all documented paths require recompilation.

I’m building an edge AI voice companion on Raspberry Pi 5 + Hailo-10H, using hailo-ollama with Qwen2-1.5B - this is the only HAR file that you offer. I need to deploy a custom LoRA fine-tune (personality adaptation) to the NPU. The adapter targets MLP layers only (gate_proj, up_proj, down_proj), rank 32, alpha 64 which matches the pre-existing LoRA scaffold in the HAR.

To isolate the issue from my LoRA modifications, I compiled the HAR with zero changes:from hailo_sdk_client import ClientRunner
runner = ClientRunner(har=‘qwen2_1.5b_instruct.q.har’)

No load_lora_weights(), no weight injection, no model script changes

hef = runner.compile()

Result: Fails after ~15 hours (55,472 seconds) with:

BackendAllocatorException: Compilation failed: Context-Partition topology error:
base_model/reshape_from_input_layer3_to_block1__ew_mult1 (bucket_10) is in later context
than its successor base_model/block1__ew_mult1 (bucket_5).
base_model/reshape_from_input_layer4_to_block1__ew_mult3 (bucket_10) is in later context
than its successor base_model/block1__ew_mult3 (bucket_5).
base_model/reshape_from_input_layer5_to_block1__ew_mult2 (bucket_10) is in later context
than its successor base_model/block1__ew_mult2 (bucket_5).
base_model/reshape_from_input_layer6_to_block1__ew_mult4 (bucket_10) is in later context
than its successor base_model/block1__ew_mult4 (bucket_5).
base_model/reshape_from_input_layer2_to_block1__softmax_mask_matmul1 (bucket_10) is in later
context than its successor base_model/block1__softmax_mask_matmul1 (bucket_5).
base_model/reshape_from_input_layer2_to_block1__ew_mult5 (bucket_10) is in later context
than its successor base_model/block1__ew_mult5 (bucket_5).

All 6 failing edges are in block1, involving reshape operations placed in bucket_10 that feed into element-wise operations in bucket_5 which is a dependency ordering violation.

Before discovering the stock HAR itself can’t compile, I spent over a week attempting LoRA integration (my rented GPU bills are about as much as I paid for the Hailo-10H):

Approach 1: load_lora_weights() + compile()

The documented workflow. Failed with ValueError: 10873 is not in list in precision_splitter.pyload_lora_weights() adds new graph nodes but doesn’t register their indices in PrecisionSplitterLayer._output_indices. After extensive graph surgery (monkey-patching the precision splitter, BFS ancestor walks to fix connectivity across multiple HailoNN graph instances), compilation reached the C++ allocator, which produced the same Context-Partition error.

Approach 2 - Direct Weight Injection into Existing Scaffold:

Since the HAR’s lora_weights_metadata maps all 84 LoRA positions, I wrote a script that replaces kernel values in the existing base_model/ scaffold layers (no new graph nodes, identical topology). This also reached the allocator and produced the same error.

Approach 3 of Stock HAR:

Confirms the error is inherent to the HAR + DFC 5.2.0 combination, not caused by any LoRA modification.

As the HAR metadata shows sdk_version: 5.1.0.dev0 but I’m compiling with DFC 5.2.0. This seems likely to be the root cause but the HAR’s internal graph representation may be incompatible with the 5.2.0 allocator.

My questions for Dev:

  1. Is there a DFC version that can successfully recompile this HAR - AND run it on the Hailo-10H? If the HAr was built with SDK 5.1.0.dev0, does compilation require that same version, and will I actually be able to run hailo-ollama on my Hailo-10h if I try this approach?
  2. Is the HAR intended to be recompiled by users at all? Or was the shipping HEF compiled through an internal tooling, and HAR is only published for inspection and runtime LoRA swaps?
  3. What is the supported workflow for deploying a custom LoRA adapter to hailo-ollama on Haolo-10H? I’ve tried both the documented load_lora_weights() path and direct weight injection, and both are blocked by this compilation failure. I’m open to any approach and the end goal is simply to run a personality-tuned Qwen2-1.5B on the NPU.
  4. Could a HAR built with DFC 5.2.0 be provided? This would eliminate the version mismatch and likely(?) resolve the context-partition error.

Right now LoRA is unsupported for the Hailo-10H, and I am eager for any path forward that you can provide me.

1 Like

Hi @bill_bz ,

Q1: Currently HAR compilation are restricted to the same version so using DFC of v5.2.0 for compilation requires you to use the HAR that was released with the v5.2.0 version only.

Q2: HAR is intended for users’ compilation. The idea is for you to train your own set of LoRA weights and recompile the LLM. Please see our LoRA Tutorial in the DFC for more information.

Q3: Supported workflow for deployment a custom LoRA adapter on Hailo in v5.2.0 are CPP GenAI API and Python API (the Hailo-Ollama doesn’t support LoRA adapters yet). For more information checkout our HailoRT tutorials on LLM inference.

Q4: All relevant files for compilation were released in v5.2.0 and can be found in the DFC LoRA tutorial:
http://dev-public.hailo.ai/v5.2.0/blob/qwen2_1.5b_instruct.q.har
http://dev-public.hailo.ai/v5.2.0/blob/qwen2_1.5b_instruct.alls
http://dev-public.hailo.ai/v5.2.0/blob/qwen2_1.5b_instruct_compilation.alls

Thanks,

@Michael Thank you, your post adds a lot of clarity. I think the issue likely stems from the DFC documentation - it doesn’t actually contain the things that you reference - Gen AI and LoRA are only a page and a half and are missing substantial instructions, and the tutorial likewise doesn’t contain any of the detail that your answer just provided. I have attached a snapshot of the HTML version - when I compare this to the PDF version, I think that we might have published the wrong version, as the changelog in the PDF version notes:
”Updated the LoRA tutorial notebook to use an LLM base model quantized with group-wise quantization.”

The only .alls file to which I had access was the embedded one that I extracted from the HAR file from your downloads page (which is SDK version 5.1.0) which was 28 lines (vs the 4,480 lines that you just gave me).

I sincerely appreciate this sir, thank you. It’s been a week and half of substantial frustration, and by everything that I can see, you’ve just handed me the keys to move forward.

1 Like

Hi @bill_bz ,

Thank you for the detailed feedback on the documentation - this is very valuable for us!

To confirm your path forward:

  1. Use the v5.2.0 HAR and ALLS files from the links I provided - these replace the v5.1.0 HAR from the downloads page. The ALLS you extracted from the old HAR is not sufficient for compilation; the full qwen2_1.5b_instruct.alls and qwen2_1.5b_instruct_compilation.alls are what you need.
  2. Train your LoRA adapter targeting your MLP layers (gate_proj, up_proj, down_proj) as you’ve already done - rank 32 / alpha 64 should work fine with the scaffold in the HAR.
  3. Compile with DFC v5.2.0 using the v5.2.0 HAR and ALLS files. This should eliminate the Context-Partition topology error you encountered, as the version mismatch between the v5.1.0 HAR and the v5.2.0 allocator was the root cause.
  4. For inference on the Hailo-10H, use either the C++ GenAI API or the Python API with your compiled HEF + LoRA weights. As noted, hailo-ollama does not yet support LoRA adapters, so you’ll need to use one of these APIs directly.

Thanks,