Paser onnx model failed

I am attempting to convert a pp-ocr model to a hailo hef model. First, the pp-ocr model is converted to an onnx model. Then, the onnx model is converted to the hef format. However, I encounter an error when parsing the onnx model by DFC:

hailo_sdk_client.model_translator.exceptions.ParsingWithRecommendationException: Parsing failed. The errors found in the graph are:
UnsupportedSoftmaxLayerError in op p2o.Softmax.0: Unsupported softmax
UnsupportedSoftmaxLayerError in op p2o.Softmax.1: Unsupported softmax
Please try to parse the model again, using these end node names: p2o.MatMul.2, p2o.Squeeze.2, p2o.AveragePool.0, p2o.Transpose.0

How can I handle this case?

Hey @davis.bian ,

Welcome to the Hailo Community!

You can use the Netron app to visualize and analyze your ONNX model. It will help you identify the unsupported Softmax layers and plan the necessary modifications.

To resolve the error and make your model compatible with the Hailo Dataflow Compiler (DFC), you have two options:

  1. Remove the Softmax Layers:

    • Use the end nodes suggested in the error message (p2o.MatMul.2, p2o.Squeeze.2, etc.) to truncate the model.
    • Utilize tools like ONNX Graph Surgeon (ONNX-GS) or ONNX utilities to remove the Softmax layers and redefine the model’s output.
  2. Replace the Softmax Layers:

    • Replace the Softmax layers with equivalent operations that are supported by the Hailo DFC, such as custom normalization layers.

If the removed Softmax layers are essential for inference (e.g., to obtain probability distributions), you can perform the softmax operation during post-processing on the CPU or another supported platform.

  • Add the Softmax function in your application code after retrieving the raw outputs from the Hailo device:

    import numpy as np
    
    def softmax(logits):
        exps = np.exp(logits - np.max(logits, axis=-1, keepdims=True))
        return exps / np.sum(exps, axis=-1, keepdims=True)
    
    # Example usage
    output = hailo_model.run_inference(input_data)
    probabilities = softmax(output)
    

By applying these modifications, you should be able to compile your model successfully with the Hailo DFC.

Hey @omria
Thank you very much for the information. I will try those methods and share the results.

2 Likes

hey @davis.bian
I meet the same problem when dealing with ppocr-rec, could you teach me how to solve it?

Hello, I want to know whether the above situation can be configured or otherwise to make softma run on the CPU while softmax is in the hef file. Because I think this is a better solution. If it is split, then a model may be cut into many parts, which is not user-friendly.

1 Like

Hey @jinqing.mu ,

Welcome to the Hailo Community!

No, as of the current Hailo Dataflow Compiler (DFC) and SDK versions, you cannot configure a Softmax layer to run on the CPU while keeping it defined inside the HEF.

Here’s how it works:

  • Softmax is supported in limited cases — typically after a dense (fully connected) layer with a rank-2 output — and is executed on the Hailo device itself.
  • You can sometimes add a Softmax manually using a model script (like defining a “logits_layer”), but only if the model meets specific conditions (e.g., rank-2 output, max 3 Softmax layers).
  • There’s no mechanism to selectively offload unsupported layers (like Softmax) to the host CPU while leaving them inside the HEF.
    HEF files are purely hardware-executable — they don’t support runtime delegation of specific layers to the CPU.
1 Like

Thank you very much for your reply. If there are two operators in a model that are not supported, I need to split the model into three parts. In this case, can these three models be loaded onto the device at the same time and then inferred? If they cannot be loaded onto the device at the same time and inferred, the model needs to be continuously unloaded from the npu and then loaded again, which will have a huge time overhead. At present, I use the python script to find that when loading the second model onto the device, it shows that the device is occupied and cannot be loaded. Or is there any way to eliminate the time overhead of unloading and loading the model from the device.

When i used DFC 3.30 it got the same error.
But when i used DFC3.31 and tried to run translate_onnx_model(), it worked.
If you use old DFC version, you may solve it by version up the DFC.

Hey @jinqing.mu

We actually have a scheduler that handles this - it can run multiple networks on a single device pretty efficiently.

Hey Roy,

Welcome to the Hailo Community!

Yeah, each newer version comes with improvements in parsing and compiling.
But be carefull , DFC 3.31 output works with HailoRT >= 4.21 only!

Hey @omria
I found a way to run multiple models on one device at the same time. But I tested and found that when two models running in series are inferred on one device, the inference time of the second model will be doubled (17ms for single inference, 33ms for multi-model inference), while the time increase for the first model is not much. My models are not running in parallel, so there should not be a time increase problem. I run it on PI5 in multi-threaded mode, not multi-process. Is this normal? Or is there any way to solve it?