Custom Deeplabv3 segmentation model conversion error

user228 · September 1, 2025, 4:27pm

I am using wsl2 ubuntu 24.04 with DFC 3.31

First, I convert my pth to onnx using

dummy_input = torch.randn(1, 3, IMAGE_HEIGHT, IMAGE_WIDTH) torch.onnx.export(         model, dummy_input, STATIC_ONNX,         export_params=True,         opset_version=13,         do_constant_folding=True,         input_names=["input"],         output_names=["output"] )      onnx_model = onnx.load(STATIC_ONNX) simplified_model, check = simplify(onnx_model)     if not check:         raise RuntimeError("Simplification failed") simplified_model = onnx.shape_inference.infer_shapes(simplified_model) onnx.save(simplified_model, SIMPLIFIED_ONNX)

I then convert onnx to har using

from hailo_sdk_client import ClientRunner
import os
from MODEL_ENCODER import MODEL_NAME


if __name__ == "__main__":
    ONNX            = os.path.join("Onnx", MODEL_NAME)
    ONNX_PATH       = os.path.join(ONNX, f"{MODEL_NAME}_simplified.onnx")
    HW_ARCH         = 'hailo8l'
    HAR             = os.path.join("Hailo", MODEL_NAME)
    os.makedirs(HAR, exist_ok=True)
    print(f"Translating {ONNX_PATH} to HAR format...")
    runner = ClientRunner(hw_arch=HW_ARCH)
    hn, npz = runner.translate_onnx_model(
        ONNX_PATH,
        MODEL_NAME
    )

    save_file = os.path.join(HAR, f"{MODEL_NAME}.har")
    runner.save_har(save_file)
    print(f"HAR file saved as {MODEL_NAME}.har in {HAR} directory.")

I then convert har to optimized har using

from hailo_sdk_client import ClientRunner
import os
from MODEL_ENCODER import MODEL_NAME

if __name__ == "__main__":
    # user params
    CALIB_PATH       = "aug_val/calib_set.npy"
    HAR              = os.path.join("Hailo", MODEL_NAME)
    HAR_IN           = os.path.join(HAR, f"{MODEL_NAME}.har")
    HAR_OUT          = os.path.join(HAR, f"{MODEL_NAME}_optimized.har")
    ALLS_FILE        = os.path.join("Hailo", "model_script.alls")

    runner = ClientRunner(har=HAR_IN)

    runner.load_model_script(model_script=ALLS_FILE)

    runner.optimize(calib_data=CALIB_PATH)
    runner.save_har(HAR_OUT)
    print(f"\n Optimized HAR file saved as {HAR_OUT}_optimized.har in {HAR} directory.")

with the following alls file

normalize1 = normalization([0, 0, 0], [255, 255, 255])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[2, 2])
model_optimization_flavor(optimization_level=1)
performance_param(compiler_optimization_level=max)

I compile the optimized har using

from hailo_sdk_client import ClientRunner
import os
from MODEL_ENCODER import MODEL_NAME

if __name__ == "__main__":
    # user params
    HAR             = os.path.join("Hailo", MODEL_NAME)
    HAR_IN          = os.path.join(HAR, f"{MODEL_NAME}_optimized.har")

    runner = ClientRunner(har=HAR_IN)
    hef = runner.compile()
    file_name = os.path.join(HAR, f'{MODEL_NAME}.hef')
    with open(file_name, 'wb') as f:
        f.write(hef)
    print(f"\n HEF file saved as {MODEL_NAME}.hef in {HAR} directory.")

However, I am getting the following error

[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Building optimization options for network layers...
[info] Successfully built optimization options - 4m 3s 787ms
[info] Trying to compile the network in a single context
[info] Single context flow failed: Recoverable single context error
[info] Building optimization options for network layers...
[info] Successfully built optimization options - 23m 23s 541ms
[info] Using Multi-context flow
[info] Resources optimization params: max_control_utilization=100%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=85%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Finding the best partition to contexts...
compiler: ../src/allocator/network_graph_appender.cpp:402: Status<network_graph::NetworkNode*> allocator::NetworkGraphAppender::AddShortcut(std::string, network_graph::NetworkNode&, std::vector<network_graph::NetworkNode*>): Assertion `src_node.output_format() == (*first_succ)->input_format()' failed.
compiler: ../src/allocator/network_graph_appender.cpp:402: Status<network_graph::NetworkNode*> allocator::NetworkGraphAppender::AddShortcut(std::string, network_graph::NetworkNode&, std::vector<network_graph::NetworkNode*>): Assertion `src_node.output_format() == (*first_succ)->input_format()' failed.

[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed with unexpected crash

Need help fixing this. Thanks in advance

omria · September 3, 2025, 4:01pm

Hey @user228!

What’s Going On

Basically, your model is getting split across multiple processing contexts, and there’s a format mismatch (NHWC vs NCHW) somewhere that the compiler can’t automatically fix. It usually happens around skip connections or when certain optimizations are applied incorrectly.

Let’s Fix This Step by Step

First things first - simplify everything:
Strip your .alls file down to just the basics:

normalize1 = normalization([0, 0, 0], [255, 255, 255])
model_optimization_flavor(optimization_level=1)

The usual suspects:

Remove global_avgpool_reduction - This is probably your main culprit! It’s meant for classification models, not segmentation. DeepLabv3 and similar models don’t need it and it often causes exactly this kind of format mismatch.
Turn down the optimization level - Remove performance_param(compiler_optimization_level=max) for now. We can add it back once things are working.

Double-check your layer names - Make sure avgpool1 actually exists in your model:

import onnx
model = onnx.load("your_model.onnx")
print([n.name for n in model.graph.node if "pool" in n.name.lower()])

If you’re still stuck, try these:

Reduce input size temporarily (like 768→512) to force single-context compilation

Add format handling:

allocator_param(automatic_reshapes=True)
context_switch_param(mode=allowed, allow_auto_merge_in_multicontext=True)

Turn on debug mode to see what’s happening:

export HAILO_LOG_LEVEL=debug

Bottom Line

Nine times out of ten, removing that global_avgpool_reduction line fixes this error for segmentation models. Start there, and if you’re still having issues, work through the other steps. Once you get a clean compile, you can gradually add back optimizations.

Hope this helps!

user228 · September 4, 2025, 9:49am

Thank you for the quick response

I strip .alls file down to just the basics

normalize1 = normalization([0, 0, 0], [255, 255, 255])
model_optimization_flavor(optimization_level=1)

but then I get this error

raise AccelerasNumerizationError(
hailo_model_optimization.acceleras.utils.acceleras_exceptions.AccelerasNumerizationError: Shift delta in DeepLabV3_mobilenet_v2_d5/avgpool1/avgpool_op is larger than 2 (2.05), cannot quantize. A possible solution is to use a pre-quantization model script command to reduce global average-pool spatial dimensions, please refer to the user guide for more info.

I then double check the layers using

import onnx
model = onnx.load("Onnx/DeepLabV3_mobilenet_v2_d5/DeepLabV3_mobilenet_v2_d5_simplified.onnx")
print([n.name for n in model.graph.node if "pool" in n.name.lower()])

the outputs is

['/decoder/aspp/convs.4/convs.4.0/AveragePool']

Adding back global_avgpool_reduction into the .alls file allow the optimization step to proceed
Compilation still fails after adding format handling
The is the hailo_sdk_core.log

[2025-09-04 16:29:03.310] [default] [info] Ran from command: "compile_har.py"
[2025-09-04 16:29:03.312] [default] [info] Loading network parameters
[2025-09-04 16:29:03.419] [default] [info] e[1;36mStarting Hailo allocation and compilation flowe[0m
[2025-09-04 16:29:03.434] [default] [info] Model name: DeepLabV3_mobilenet_v2_d5
[2025-09-04 16:29:03.465] [default] [info] Building optimization options for network layers...
[2025-09-04 16:33:07.080] [default] [info] Successfully built optimization options - 4m 3s 614ms
[2025-09-04 16:33:07.081] [default] [info] Trying to compile the network in a single context
[2025-09-04 16:33:07.081] [default] [info] Trying to solve in single context
[2025-09-04 16:33:07.401] [default] [info] Single context flow failed: Recoverable single context error
[2025-09-04 16:54:27.020] [default] [info] Building optimization options for network layers...
[2025-09-04 17:17:42.995] [default] [info] Successfully built optimization options - 23m 15s 974ms
[2025-09-04 17:17:42.995] [default] [info] Using Multi-context flow
[2025-09-04 17:17:42.995] [default] [info] Resources optimization params: max_control_utilization=60%, max_compute_utilization=60%, max_compute_16bit_utilization=60%, max_memory_utilization (weights)=60%, max_input_aligner_utilization=60%, max_apu_utilization=60%
[2025-09-04 17:17:42.996] [default] [info] e[1;36mFinding the best partition to contexts...e[0m
[2025-09-04 17:17:43.457] [default] [info] Iteration failed on: Automri finished with too many resources on context_0 
...
[2025-09-04 17:17:49.288] [default] [info] Iteration failed on: Automri finished with too many resources on context_1
...
[2025-09-04 17:18:01.864] [default] [info] Iteration failed on: Automri finished with too many resources on context_2
...

omria · September 8, 2025, 11:13am

The key error lines show:

Iteration failed on: Automri finished with too many resources on context_0/1/2

What’s happening:
The compiler can’t split your model across the available hardware contexts because each context runs out of resources (memory, compute, etc.) no matter how it tries to partition things.

Quick fixes to try:

Bump up your resource limits in the .alls file:

performance_param(
    max_control_utilization=90,
    max_compute_utilization=90,
    max_memory_utilization=weights_85,  # try weights_100 if needed
    max_apu_utilization=90
)

Limit the contexts if you’re on a multi-APU device:

hardware_arch_config(contexts=2)

Try a higher optimization level:

model_optimization_flavor(optimization_level=2)

The good news is your model translated fine and the avgpool issue is fixed - this is just the compiler struggling to fit everything within the default resource constraints. Try increasing those limits first, that usually does the trick.

Hope this helps!

user228 · September 10, 2025, 7:11pm

I tried using

performance_param(
    max_control_utilization=90,
    max_compute_utilization=90,
    max_memory_utilization=weights_85, 
    max_apu_utilization=90
)

but then

raise AllocatorScriptParserException(f"{next(iter(param))} is not a legal performance parameter.")
hailo_sdk_client.sdk_backend.sdk_backend_exceptions.AllocatorScriptParserException: max_control_utilization is not a legal performance parameter.

The optimization step is successful using the following .alls

normalize1 = normalization([0, 0, 0], [255, 255, 255])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[2, 2])
model_optimization_flavor(optimization_level=2)
performance_param(compiler_optimization_level=max)
allocator_param(automatic_reshapes=True)
context_switch_param(mode=allowed, allow_auto_merge_in_multicontext=True)

However

[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Building optimization options for network layers...
[info] Successfully built optimization options - 4m 8s 476ms
[info] Trying to compile the network in a single context
[info] Single context flow failed: Recoverable single context error
[info] Building optimization options for network layers...
[info] Successfully built optimization options - 23m 24s 130ms
[info] Using Multi-context flow
[info] Resources optimization params: max_control_utilization=120%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=85%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Finding the best partition to contexts...
compiler: ../src/allocator/network_graph_appender.cpp:402: Status<network_graph::NetworkNode*> allocator::NetworkGraphAppender::AddShortcut(std::string, network_graph::NetworkNode&, std::vector<network_graph::NetworkNode*>): Assertion `src_node.output_format() == (*first_succ)->input_format()' failed.
compiler: ../src/allocator/network_graph_appender.cpp:402: Status<network_graph::NetworkNode*> allocator::NetworkGraphAppender::AddShortcut(std::string, network_graph::NetworkNode&, std::vector<network_graph::NetworkNode*>): Assertion `src_node.output_format() == (*first_succ)->input_format()' failed.

[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed with unexpected crash

I am just going to quit trying converting deeplabv3 and deeplabv3plus model, Thank you for the suggestion.