Handling Unsupported Operations with ONNXRuntime

When encountering an unsupported ONNX operation/layer that can’t be parsed using the DFC, consider where is the unsupported operation in regards to the full network:

  • Near the input - try to keep that part as preprocess and parse the rest into a single HEF.

  • At the net’s middle - crop the problematic block, run it ORT and resume to device,
    thus having 2 HEFS: HEF → ORT → HEF →(optional) CPU postprocess

  • At the net’s end - This is a short guide for that case.

When the unsupported part is at the end, choose one of two paths:

  • Implement operation as native postprocess in Python/C++ (no ORT)
  • Crop the ONNX before the problematic layer and use ONNX Runtime as a postprocess on the app.
    Thus creating a pipeline of:
    Hailo HEF → ORT postprocess → (optional) CPU postprocess*

We’ll use YOLOv8-Seg as an example**.

* CPU postprocess is for any operations that are not in the ONNX (e.g. visualization)
** For most YOLO models, Hailo Model Zoo provides Hailo-postprocess — an optimized postprocessing stage that runs on HailoRT inside the HEF. This model is an exception, and so we’ll run the postprocess on ORT.

Step 1 — Locate the crop point

  • Open the model in Netron .

  • Find the first unsupported op

  • Pick end-nodes right before that op

    • When original model’s parsing fails, check the parser’s recommended end-nodes. They’re usually valid cropping points for your pipeline.

    • If there are a few unsupported ops in different branches, the end-node should be the last supported op the feeds all of them.

    • For YOLOv8-Seg, the unsupported (and also last) operations are performing BBOX decoding, and so our ORT will perform just that. In this case the chosen nodes are 9 conv heads plus the mask prototype tensor.
      Note: We chose those nodes because the next operations are Concats, which we excluded from the HEF because of quantization considerations:

      • Those tensors can have very different value ranges. When quantized together, one scale has to fit the biggest range, which blurs the small details. Doing Concat in float (ORT/CPU) keeps that precision better.
    • This example’s chosen end nodes:

      /model.22/cv2.2/cv2.2.2/Conv
      /model.22/cv3.2/cv3.2.2/Conv
      /model.22/cv4.2/cv4.2.2/Conv
      /model.22/cv2.1/cv2.1.2/Conv
      /model.22/cv3.1/cv3.1.2/Conv
      /model.22/cv4.1/cv4.1.2/Conv
      /model.22/cv2.0/cv2.0.2/Conv
      /model.22/cv3.0/cv3.0.2/Conv
      /model.22/cv4.0/cv4.0.2/Conv
      /model.22/proto/cv3/act/Mul
      
  • Tip: In Netron, click a node to see its name - it’ll be our end-node-names

Step 2 — Create the postprocess-only ONNX (two options)

  • Option A — Using Hailo DFC parser (recommended)

    hailo parser onnx <original.onnx> --end-node-names <"END_NODE1" "END_NODE2" ..>
    
    • This produces a HAR. Extract it. Inside you’ll find *_postprocess.onnx which starts after your end-nodes.

    • **Sanity checks (**This is a relatively new Hailo tool, so please review these before deploying)

      • Inputs of *_postprocess.onnx match (name/shape/dtype/order) the tensors you’ll get from the HEF.

      • Outputs match your intended postprocess outputs.

  • Option B — Using standard ONNX tools

    Use standard ONNX graph tooling (e.g., onnx.utils.extract_model) to crop the network so that the inputs are your boundary layers and the outputs are the last layers ORT should compute, probably the original model outputs.

2 Likes