Paser onnx model failed

davis.bian · December 10, 2024, 4:55am

I am attempting to convert a pp-ocr model to a hailo hef model. First, the pp-ocr model is converted to an onnx model. Then, the onnx model is converted to the hef format. However, I encounter an error when parsing the onnx model by DFC:

hailo_sdk_client.model_translator.exceptions.ParsingWithRecommendationException: Parsing failed. The errors found in the graph are:
UnsupportedSoftmaxLayerError in op p2o.Softmax.0: Unsupported softmax
UnsupportedSoftmaxLayerError in op p2o.Softmax.1: Unsupported softmax
Please try to parse the model again, using these end node names: p2o.MatMul.2, p2o.Squeeze.2, p2o.AveragePool.0, p2o.Transpose.0

How can I handle this case?

omria · December 10, 2024, 1:49pm

Hey @davis.bian ,

Welcome to the Hailo Community!

You can use the Netron app to visualize and analyze your ONNX model. It will help you identify the unsupported Softmax layers and plan the necessary modifications.

To resolve the error and make your model compatible with the Hailo Dataflow Compiler (DFC), you have two options:

Remove the Softmax Layers:
- Use the end nodes suggested in the error message (p2o.MatMul.2, p2o.Squeeze.2, etc.) to truncate the model.
- Utilize tools like ONNX Graph Surgeon (ONNX-GS) or ONNX utilities to remove the Softmax layers and redefine the model’s output.
Replace the Softmax Layers:
- Replace the Softmax layers with equivalent operations that are supported by the Hailo DFC, such as custom normalization layers.

If the removed Softmax layers are essential for inference (e.g., to obtain probability distributions), you can perform the softmax operation during post-processing on the CPU or another supported platform.

Add the Softmax function in your application code after retrieving the raw outputs from the Hailo device:

import numpy as np

def softmax(logits):
    exps = np.exp(logits - np.max(logits, axis=-1, keepdims=True))
    return exps / np.sum(exps, axis=-1, keepdims=True)

# Example usage
output = hailo_model.run_inference(input_data)
probabilities = softmax(output)

By applying these modifications, you should be able to compile your model successfully with the Hailo DFC.

davis.bian · December 12, 2024, 2:54am

Hey @omria
Thank you very much for the information. I will try those methods and share the results.

FrozenStream · December 30, 2024, 1:10pm

hey @davis.bian
I meet the same problem when dealing with ppocr-rec, could you teach me how to solve it?

jinqing.mu · April 27, 2025, 9:58am

Hello, I want to know whether the above situation can be configured or otherwise to make softma run on the CPU while softmax is in the hef file. Because I think this is a better solution. If it is split, then a model may be cut into many parts, which is not user-friendly.

omria · April 29, 2025, 4:20pm

Hey @jinqing.mu ,

Welcome to the Hailo Community!

No, as of the current Hailo Dataflow Compiler (DFC) and SDK versions, you cannot configure a Softmax layer to run on the CPU while keeping it defined inside the HEF.

Here’s how it works:

Softmax is supported in limited cases — typically after a dense (fully connected) layer with a rank-2 output — and is executed on the Hailo device itself.
You can sometimes add a Softmax manually using a model script (like defining a “logits_layer”), but only if the model meets specific conditions (e.g., rank-2 output, max 3 Softmax layers).
There’s no mechanism to selectively offload unsupported layers (like Softmax) to the host CPU while leaving them inside the HEF.
HEF files are purely hardware-executable — they don’t support runtime delegation of specific layers to the CPU.

jinqing.mu · May 12, 2025, 6:29am

Thank you very much for your reply. If there are two operators in a model that are not supported, I need to split the model into three parts. In this case, can these three models be loaded onto the device at the same time and then inferred? If they cannot be loaded onto the device at the same time and inferred, the model needs to be continuously unloaded from the npu and then loaded again, which will have a huge time overhead. At present, I use the python script to find that when loading the second model onto the device, it shows that the device is occupied and cannot be loaded. Or is there any way to eliminate the time overhead of unloading and loading the model from the device.

Roy · June 11, 2025, 5:46am

When i used DFC 3.30 it got the same error.
But when i used DFC3.31 and tried to run translate_onnx_model(), it worked.
If you use old DFC version, you may solve it by version up the DFC.

omria · June 12, 2025, 2:25pm

Hey @jinqing.mu

We actually have a scheduler that handles this - it can run multiple networks on a single device pretty efficiently.

omria · June 12, 2025, 2:26pm

Hey Roy,

Welcome to the Hailo Community!

Yeah, each newer version comes with improvements in parsing and compiling.
But be carefull , DFC 3.31 output works with HailoRT >= 4.21 only!

jinqing.mu · June 13, 2025, 1:41am

Hey @omria
I found a way to run multiple models on one device at the same time. But I tested and found that when two models running in series are inferred on one device, the inference time of the second model will be doubled (17ms for single inference, 33ms for multi-model inference), while the time increase for the first model is not much. My models are not running in parallel, so there should not be a time increase problem. I run it on PI5 in multi-threaded mode, not multi-process. Is this normal? Or is there any way to solve it?

Topic		Replies	Views
UnsupportedShuffleLayerError while parsing ONNX model General dfc , error , parser	9	244	January 30, 2025
Error: While Parsing onnx Based models General dfc , error	4	424	November 19, 2024
update_reshape_output_format return a none General dfc , hailo8 , error	1	42	April 1, 2025
Parsing ONNX to HAR General dfc	3	129	April 19, 2025
Got error message during ONNX parsing General dfc	0	37	March 26, 2025

Paser onnx model failed

Related topics