The qwen2_1.5b_instruct.q.har cannot be recompiled to HEF using DFC 5.2.0 even with zero modifications - this is the only HAR file that Hailo offers. Running ClientRunner(har=...).compile() on the unmodified HAR fails after ~15 hours with a Context-Partition topology error in the allocator. This blocks any workflow involving custom LoRA adapters, since all documented paths require recompilation.
I’m building an edge AI voice companion on Raspberry Pi 5 + Hailo-10H, using hailo-ollama with Qwen2-1.5B - this is the only HAR file that you offer. I need to deploy a custom LoRA fine-tune (personality adaptation) to the NPU. The adapter targets MLP layers only (gate_proj, up_proj, down_proj), rank 32, alpha 64 which matches the pre-existing LoRA scaffold in the HAR.
To isolate the issue from my LoRA modifications, I compiled the HAR with zero changes:from hailo_sdk_client import ClientRunner
runner = ClientRunner(har=‘qwen2_1.5b_instruct.q.har’)
No load_lora_weights(), no weight injection, no model script changes
hef = runner.compile()
Result: Fails after ~15 hours (55,472 seconds) with:
BackendAllocatorException: Compilation failed: Context-Partition topology error:
base_model/reshape_from_input_layer3_to_block1__ew_mult1 (bucket_10) is in later context
than its successor base_model/block1__ew_mult1 (bucket_5).
base_model/reshape_from_input_layer4_to_block1__ew_mult3 (bucket_10) is in later context
than its successor base_model/block1__ew_mult3 (bucket_5).
base_model/reshape_from_input_layer5_to_block1__ew_mult2 (bucket_10) is in later context
than its successor base_model/block1__ew_mult2 (bucket_5).
base_model/reshape_from_input_layer6_to_block1__ew_mult4 (bucket_10) is in later context
than its successor base_model/block1__ew_mult4 (bucket_5).
base_model/reshape_from_input_layer2_to_block1__softmax_mask_matmul1 (bucket_10) is in later
context than its successor base_model/block1__softmax_mask_matmul1 (bucket_5).
base_model/reshape_from_input_layer2_to_block1__ew_mult5 (bucket_10) is in later context
than its successor base_model/block1__ew_mult5 (bucket_5).
All 6 failing edges are in block1, involving reshape operations placed in bucket_10 that feed into element-wise operations in bucket_5 which is a dependency ordering violation.
Before discovering the stock HAR itself can’t compile, I spent over a week attempting LoRA integration (my rented GPU bills are about as much as I paid for the Hailo-10H):
Approach 1: load_lora_weights() + compile()
The documented workflow. Failed with ValueError: 10873 is not in list in precision_splitter.py — load_lora_weights() adds new graph nodes but doesn’t register their indices in PrecisionSplitterLayer._output_indices. After extensive graph surgery (monkey-patching the precision splitter, BFS ancestor walks to fix connectivity across multiple HailoNN graph instances), compilation reached the C++ allocator, which produced the same Context-Partition error.
Approach 2 - Direct Weight Injection into Existing Scaffold:
Since the HAR’s lora_weights_metadata maps all 84 LoRA positions, I wrote a script that replaces kernel values in the existing base_model/ scaffold layers (no new graph nodes, identical topology). This also reached the allocator and produced the same error.
Approach 3 of Stock HAR:
Confirms the error is inherent to the HAR + DFC 5.2.0 combination, not caused by any LoRA modification.
As the HAR metadata shows sdk_version: 5.1.0.dev0 but I’m compiling with DFC 5.2.0. This seems likely to be the root cause but the HAR’s internal graph representation may be incompatible with the 5.2.0 allocator.
My questions for Dev:
- Is there a DFC version that can successfully recompile this HAR - AND run it on the Hailo-10H? If the HAr was built with SDK 5.1.0.dev0, does compilation require that same version, and will I actually be able to run hailo-ollama on my Hailo-10h if I try this approach?
- Is the HAR intended to be recompiled by users at all? Or was the shipping HEF compiled through an internal tooling, and HAR is only published for inspection and runtime LoRA swaps?
- What is the supported workflow for deploying a custom LoRA adapter to hailo-ollama on Haolo-10H? I’ve tried both the documented
load_lora_weights()path and direct weight injection, and both are blocked by this compilation failure. I’m open to any approach and the end goal is simply to run a personality-tuned Qwen2-1.5B on the NPU. - Could a HAR built with DFC 5.2.0 be provided? This would eliminate the version mismatch and likely(?) resolve the context-partition error.
Right now LoRA is unsupported for the Hailo-10H, and I am eager for any path forward that you can provide me.