Handling Unsupported Operations with ONNXRuntime

oreny · October 20, 2025, 8:45am

When encountering an unsupported ONNX operation/layer that can’t be parsed using the DFC, consider where is the unsupported operation in regards to the full network:

Near the input - try to keep that part as preprocess and parse the rest into a single HEF.
At the net’s middle - crop the problematic block, run it ORT and resume to device,
thus having 2 HEFS: HEF → ORT → HEF →(optional) CPU postprocess
At the net’s end - This is a short guide for that case.

When the unsupported part is at the end, choose one of two paths:

Implement operation as native postprocess in Python/C++ (no ORT)
Crop the ONNX before the problematic layer and use ONNX Runtime as a postprocess on the app.
Thus creating a pipeline of:
Hailo HEF → ORT postprocess → (optional) CPU postprocess*

We’ll use YOLOv8-Seg as an example**.

* CPU postprocess is for any operations that are not in the ONNX (e.g. visualization)
** For most YOLO models, Hailo Model Zoo provides Hailo-postprocess — an optimized postprocessing stage that runs on HailoRT inside the HEF. This model is an exception, and so we’ll run the postprocess on ORT.

Step 1 — Locate the crop point

Open the model in Netron .
Find the first unsupported op
Pick end-nodes right before that op
- When original model’s parsing fails, check the parser’s recommended end-nodes. They’re usually valid cropping points for your pipeline.
- If there are a few unsupported ops in different branches, the end-node should be the last supported op the feeds all of them.
- For YOLOv8-Seg, the unsupported (and also last) operations are performing BBOX decoding, and so our ORT will perform just that. In this case the chosen nodes are 9 conv heads plus the mask prototype tensor.
  Note: We chose those nodes because the next operations are Concats, which we excluded from the HEF because of quantization considerations:
  - Those tensors can have very different value ranges. When quantized together, one scale has to fit the biggest range, which blurs the small details. Doing Concat in float (ORT/CPU) keeps that precision better.
- This example’s chosen end nodes:
```
/model.22/cv2.2/cv2.2.2/Conv
/model.22/cv3.2/cv3.2.2/Conv
/model.22/cv4.2/cv4.2.2/Conv
/model.22/cv2.1/cv2.1.2/Conv
/model.22/cv3.1/cv3.1.2/Conv
/model.22/cv4.1/cv4.1.2/Conv
/model.22/cv2.0/cv2.0.2/Conv
/model.22/cv3.0/cv3.0.2/Conv
/model.22/cv4.0/cv4.0.2/Conv
/model.22/proto/cv3/act/Mul
```
Tip: In Netron, click a node to see its name - it’ll be our end-node-names

netron1024×515 90.6 KB

Step 2 — Create the postprocess-only ONNX (two options)

Option A — Using Hailo DFC parser (recommended)
```
hailo parser onnx <original.onnx> --end-node-names <"END_NODE1" "END_NODE2" ..>
```
- This produces a HAR. Extract it. Inside you’ll find *_postprocess.onnx which starts after your end-nodes.
- **Sanity checks (**This is a relatively new Hailo tool, so please review these before deploying)
  - Inputs of *_postprocess.onnx match (name/shape/dtype/order) the tensors you’ll get from the HEF.
  - Outputs match your intended postprocess outputs.
Option B — Using standard ONNX tools

Use standard ONNX graph tooling (e.g., onnx.utils.extract_model) to crop the network so that the inputs are your boundary layers and the outputs are the last layers ORT should compute, probably the original model outputs.