Using hailomz parse to convert onnx to har is successful.
However, after quantizing the har file using hailomz optimize, the output of the Matmul1 layer changes to two during the hailomz compile process, resulting in the following error.
[error] Mapping Failed (allocation time: 8m 53s)
Failed to reach required FPS on the following layers:
Compilation failed with exception: More than one output is not supported for layer matmul1
[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: Failed to reach required FPS on the following layers:
Compilation failed with exception: More than one output is not supported for layer matmul1
My analysis of this issue that after optimization, the model’s structure changes so that matmul1’s output is split into two. So, if we can avoid this process, the conversion should succeed. Is there a way to do this?
Your analysis is spot on. The issue occurs because hailomz optimize transforms your graph so that Matmul1 gets multiple outputs, but our compiler doesn’t support multi-output MatMul operations.
What’s happening:
The optimization process applies graph rewrites that sometimes split outputs to serve multiple consumers. In your case, this creates the unsupported multi-output MatMul scenario.
Solutions:
Limit optimization scope - Use --only-quantization to skip structural changes:
hailomz optimize model.har --only-quantization
Disable problematic passes - Run with --disable-pass split_outputs or similar (check --print-passes for exact names)
Preprocess your ONNX - Ensure Matmul1 has single consumer before parsing by adding Identity nodes or using tools like onnx-simplifier
Try QAT - Quantization Aware Training can avoid post-training structural changes