Converting Object detection model like MaskRCNN or DINO-detr

Anne-Maelle_Barneche · March 31, 2025, 4:39pm

Hi !

I want to use a custom instance detection model and convert it to a Hailo HAR/HEF file. I am using a MaskRCNN or a DINO-detr and I would prefer not to use a Yolo model which is featured in all of the user guides. I am using a non-standard input shape which is rectangular. Unfortunately I have been stuck on the conversion, namely the translate_onnx_model function.
For MaskRCNN I understand that I should carefully select the end node names so that the NMS shouldn’t be included. However I’m struggling to select the right nodes. For DINO-detr, since there is no NMS step I am not sure why the conversion is not working.

I am visualizing the networks with netron.app, I also tried visualizing them with the data flow compiler Studio but the graph is very big making it difficult to select correctly.
I am generating the ONNX graphs with the code below. I can provide the onnx files if needed.

For DINO-detr i’m using this snippet to translate, I tried both the custom names I gave at the export and the real name of the parent:

hn, npz = runner.translate_onnx_model(
    onnx_file_path,
    onnx_model_name,
    start_node_names=["input1"],
    end_node_names=['logits', 'boxes', 'masks',
                  'last_hidden_state', 'encoder_last_hidden_state'],
    # end_node_names=['/class_labels_classifier/Add', '/Sigmoid', '/Reshape_4', '/decoder/layernorm/Add_1', '/encoder/layers.5/final_layer_norm/Add_1'],
    net_input_shapes=[1,3,750,1333],
    disable_rt_metadata_extraction=False,
)

And I have this output:

[info] Translation started on ONNX model detr_instance_segmentation
[warning] Large model detected. The graph may contain either a large number of operators, or weight variables with a very large capacity.
[warning] Translation time may be a bit long, and some features may be disabled (e.g. model augmentation, retry simplified model, onnx runtime hailo model extraction, etc.).
[info] Restored ONNX model detr_instance_segmentation (completion time: 00:00:00.92)
Traceback (most recent call last):
  File "/local/workspace/onnx_to_href.py", line 37, in <module>
    hn, npz = runner.translate_onnx_model(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 1177, in translate_onnx_model
    parser.translate_onnx_model(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 252, in translate_onnx_model
    raise e from None
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 239, in translate_onnx_model
    parsing_results = self._parse_onnx_model_to_hn(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 320, in _parse_onnx_model_to_hn
    return self.parse_model_to_hn(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 371, in parse_model_to_hn
    fuser = HailoNNFuser(converter.convert_model(), net_name, converter.end_node_names)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/translator.py", line 83, in convert_model
    self._create_layers()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/edge_nn_translator.py", line 38, in _create_layers
    self._update_vertices_info()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/onnx_translator/onnx_translator.py", line 316, in _update_vertices_info
    node.update_output_format()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/onnx_translator/onnx_graph.py", line 500, in update_output_format
    self.update_reshape_output_format(input_format)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/onnx_translator/onnx_graph.py", line 343, in update_reshape_output_format
    elif len(output_shapes) == len(input_format) == 4:
TypeError: object of type 'NoneType' has no len()

I would be grateful if someone can guide me.
Thank you

from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNN, MaskRCNNPredictor
from torchvision.models.detection.rpn import AnchorGenerator
from torchvision.models import Wide_ResNet101_2_Weights
import torch
from onnxruntime.transformers.torch_onnx_export_helper import TrainingMode
from transformers import DetrForSegmentation


### MaskRCNN ###
backbone = resnet_fpn_backbone(
        backbone_name='wide_resnet101_2',
        weights=Wide_ResNet101_2_Weights.DEFAULT,
        trainable_layers=5,
    )  
model = MaskRCNN(backbone, num_classes=2,
                    min_size=750, max_size=1333)

# Replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(model.roi_heads.box_predictor.cls_score.in_features, 2)

# Replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(
    model.roi_heads.mask_predictor.conv5_mask.in_channels, 256, 2
)
model.rpn.anchor_generator = AnchorGenerator(
        sizes=(16, 32, 64, 128, 256), aspect_ratios=(0.5, 1.0, 2.0)
    )

model.eval()

input_x = torch.rand(1, 3, 750, 1333).to('cpu')
torch.onnx.export(
    model=model,  # model being run
    args=input_x,  # model input (or a tuple for multiple inputs)
    f='maskrcnn.onnx',
    export_params=True,  # store the trained parameter weights inside the model file
    training=TrainingMode.EVAL,
    opset_version=13,
    input_names=['input1'],
    output_names=['boxes', 'labels', 'scores', 'masks'],
    do_constant_folding=True,
    keep_initializers_as_inputs=False,
    dynamic_axes={"boxes": {0: 'nb_instances'},
                  "labels": {0: 'nb_instances'},
                  "scores": {0: 'nb_instances'},
                  "masks": {0: 'nb_instances'}},
    verbose=0)

#### DETR ####

# Load a pre-trained Detr model for instance segmentation
model = DetrForSegmentation.from_pretrained("facebook/detr-resnet-50", local_files_only=True)

# Set the model to evaluation mode
model.eval()

# Define the input tensor
input_x = torch.rand(1, 3, 750, 1333).to('cpu')

# Export the model to ONNX
torch.onnx.export(
    model=model,  # model being run
    args=(input_x,),  # model input (or a tuple for multiple inputs)
    f='detr_instance_segmentation.onnx',
    export_params=True,  # store the trained parameter weights inside the model file
    opset_version=13,
    input_names=['input1'],
    output_names=['logits', 'boxes', 'masks',
                  'last_hidden_state', 'encoder_last_hidden_state'],
    do_constant_folding=True,
    verbose=0
)

omria · April 1, 2025, 12:16pm

Hey @Anne-Maelle_Barneche ,

Welcome to the Hailo Community!

When you encounter this error during translate_onnx_model():

elif len(output_shapes) == len(input_format) == 4:
TypeError: object of type 'NoneType' has no len()

The problem is that your model contains nodes with undefined output shapes. This typically happens in two scenarios:

Your selected end node doesn’t have fully shape-inferred tensors
The node has dynamic behavior (common in models with reshape operations or mask outputs like MaskRCNN or DETR)

How to Fix the Issue

Use a simplified ONNX version with properly inferred shapes
Select end nodes that aren’t connected to post-processing or attention decoder internals
Validate your node names with the hailo parser CLI tool
Consider using Netron to visually explore the graph and verify output dimensions

Best Practices for Model Translation

For MaskRCNN Models:

Choose end nodes that come before any dynamic post-processing logic
Avoid including NMS, classifiers, or mask heads
Instead, select nodes just after feature extraction or ROI alignment layers
Good candidates: roi_align or backbone fpn outputs

For DETR/DINO-DETR Models:

Be aware that outputs like masks and last_hidden_state may come from dynamic reshaping
Transformer and attention-based models often contain unsupported dynamic operations

Remember that ONNX models must include static shapes for all operators to work with the Hailo compiler.

Anne-Maelle_Barneche · April 2, 2025, 9:22am

Hi @omria,

Thank you very much for your quick and detailed response.
I will delve deeper into the graphs to try to identify and select nodes that maintain static shapes, as you suggested.

Topic		Replies	Views
Compiling MaskRCNN DFC 3.25.0 General dfc , hailo8 , error	3	209	October 9, 2024
Convert custom onnx model to har file General dfc , raspberry-pi , hailo8	2	412	September 27, 2024
Parsing ONNX to HAR General dfc	3	147	April 19, 2025
Converting TFLite Model (EfficientDet Lite 0) to HAR/HEF Using Hailo SDK General	18	496	February 18, 2025
Cannot compile ONNX OCR model to HEF format General	0	59	April 13, 2025

Converting Object detection model like MaskRCNN or DINO-detr

How to Fix the Issue

Best Practices for Model Translation

For MaskRCNN Models:

For DETR/DINO-DETR Models:

Related topics