Converting Object detection model like MaskRCNN or DINO-detr

Hi !

I want to use a custom instance detection model and convert it to a Hailo HAR/HEF file. I am using a MaskRCNN or a DINO-detr and I would prefer not to use a Yolo model which is featured in all of the user guides. I am using a non-standard input shape which is rectangular. Unfortunately I have been stuck on the conversion, namely the translate_onnx_model function.
For MaskRCNN I understand that I should carefully select the end node names so that the NMS shouldn’t be included. However I’m struggling to select the right nodes. For DINO-detr, since there is no NMS step I am not sure why the conversion is not working.

I am visualizing the networks with netron.app, I also tried visualizing them with the data flow compiler Studio but the graph is very big making it difficult to select correctly.
I am generating the ONNX graphs with the code below. I can provide the onnx files if needed.

For DINO-detr i’m using this snippet to translate, I tried both the custom names I gave at the export and the real name of the parent:

hn, npz = runner.translate_onnx_model(
    onnx_file_path,
    onnx_model_name,
    start_node_names=["input1"],
    end_node_names=['logits', 'boxes', 'masks',
                  'last_hidden_state', 'encoder_last_hidden_state'],
    # end_node_names=['/class_labels_classifier/Add', '/Sigmoid', '/Reshape_4', '/decoder/layernorm/Add_1', '/encoder/layers.5/final_layer_norm/Add_1'],
    net_input_shapes=[1,3,750,1333],
    disable_rt_metadata_extraction=False,
)

And I have this output:

[info] Translation started on ONNX model detr_instance_segmentation
[warning] Large model detected. The graph may contain either a large number of operators, or weight variables with a very large capacity.
[warning] Translation time may be a bit long, and some features may be disabled (e.g. model augmentation, retry simplified model, onnx runtime hailo model extraction, etc.).
[info] Restored ONNX model detr_instance_segmentation (completion time: 00:00:00.92)
Traceback (most recent call last):
  File "/local/workspace/onnx_to_href.py", line 37, in <module>
    hn, npz = runner.translate_onnx_model(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 1177, in translate_onnx_model
    parser.translate_onnx_model(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 252, in translate_onnx_model
    raise e from None
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 239, in translate_onnx_model
    parsing_results = self._parse_onnx_model_to_hn(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 320, in _parse_onnx_model_to_hn
    return self.parse_model_to_hn(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 371, in parse_model_to_hn
    fuser = HailoNNFuser(converter.convert_model(), net_name, converter.end_node_names)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/translator.py", line 83, in convert_model
    self._create_layers()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/edge_nn_translator.py", line 38, in _create_layers
    self._update_vertices_info()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/onnx_translator/onnx_translator.py", line 316, in _update_vertices_info
    node.update_output_format()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/onnx_translator/onnx_graph.py", line 500, in update_output_format
    self.update_reshape_output_format(input_format)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/onnx_translator/onnx_graph.py", line 343, in update_reshape_output_format
    elif len(output_shapes) == len(input_format) == 4:
TypeError: object of type 'NoneType' has no len()

I would be grateful if someone can guide me.
Thank you

from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNN, MaskRCNNPredictor
from torchvision.models.detection.rpn import AnchorGenerator
from torchvision.models import Wide_ResNet101_2_Weights
import torch
from onnxruntime.transformers.torch_onnx_export_helper import TrainingMode
from transformers import DetrForSegmentation


### MaskRCNN ###
backbone = resnet_fpn_backbone(
        backbone_name='wide_resnet101_2',
        weights=Wide_ResNet101_2_Weights.DEFAULT,
        trainable_layers=5,
    )  
model = MaskRCNN(backbone, num_classes=2,
                    min_size=750, max_size=1333)

# Replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(model.roi_heads.box_predictor.cls_score.in_features, 2)

# Replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(
    model.roi_heads.mask_predictor.conv5_mask.in_channels, 256, 2
)
model.rpn.anchor_generator = AnchorGenerator(
        sizes=(16, 32, 64, 128, 256), aspect_ratios=(0.5, 1.0, 2.0)
    )

model.eval()

input_x = torch.rand(1, 3, 750, 1333).to('cpu')
torch.onnx.export(
    model=model,  # model being run
    args=input_x,  # model input (or a tuple for multiple inputs)
    f='maskrcnn.onnx',
    export_params=True,  # store the trained parameter weights inside the model file
    training=TrainingMode.EVAL,
    opset_version=13,
    input_names=['input1'],
    output_names=['boxes', 'labels', 'scores', 'masks'],
    do_constant_folding=True,
    keep_initializers_as_inputs=False,
    dynamic_axes={"boxes": {0: 'nb_instances'},
                  "labels": {0: 'nb_instances'},
                  "scores": {0: 'nb_instances'},
                  "masks": {0: 'nb_instances'}},
    verbose=0)

#### DETR ####

# Load a pre-trained Detr model for instance segmentation
model = DetrForSegmentation.from_pretrained("facebook/detr-resnet-50", local_files_only=True)

# Set the model to evaluation mode
model.eval()

# Define the input tensor
input_x = torch.rand(1, 3, 750, 1333).to('cpu')

# Export the model to ONNX
torch.onnx.export(
    model=model,  # model being run
    args=(input_x,),  # model input (or a tuple for multiple inputs)
    f='detr_instance_segmentation.onnx',
    export_params=True,  # store the trained parameter weights inside the model file
    opset_version=13,
    input_names=['input1'],
    output_names=['logits', 'boxes', 'masks',
                  'last_hidden_state', 'encoder_last_hidden_state'],
    do_constant_folding=True,
    verbose=0
)
1 Like

Hey @Anne-Maelle_Barneche ,

Welcome to the Hailo Community!

When you encounter this error during translate_onnx_model():

elif len(output_shapes) == len(input_format) == 4:
TypeError: object of type 'NoneType' has no len()

The problem is that your model contains nodes with undefined output shapes. This typically happens in two scenarios:

  1. Your selected end node doesn’t have fully shape-inferred tensors
  2. The node has dynamic behavior (common in models with reshape operations or mask outputs like MaskRCNN or DETR)

How to Fix the Issue

  1. Use a simplified ONNX version with properly inferred shapes
  2. Select end nodes that aren’t connected to post-processing or attention decoder internals
  3. Validate your node names with the hailo parser CLI tool
  4. Consider using Netron to visually explore the graph and verify output dimensions

Best Practices for Model Translation

For MaskRCNN Models:

  • Choose end nodes that come before any dynamic post-processing logic
  • Avoid including NMS, classifiers, or mask heads
  • Instead, select nodes just after feature extraction or ROI alignment layers
  • Good candidates: roi_align or backbone fpn outputs

For DETR/DINO-DETR Models:

  • Be aware that outputs like masks and last_hidden_state may come from dynamic reshaping
  • Transformer and attention-based models often contain unsupported dynamic operations

Remember that ONNX models must include static shapes for all operators to work with the Hailo compiler.

Hi @omria,

Thank you very much for your quick and detailed response.
I will delve deeper into the graphs to try to identify and select nodes that maintain static shapes, as you suggested.

1 Like