Compiling MaskRCNN DFC 3.25.0

Hi there,
I’m trying to compile a custom MaskRCNN for an hailo8.
The hardware has on it hailort 4.15 on it that’s why I’m using an old version of the DFC.
I successfully converted my model to onnx, I have both a straight model and a simplified version. Both of them are throwing errors during first conversion to .har .
Here is the snippet I’m running:
“”"
onnx_model = onnx.load(ONNX_MODEL_PATH)
if start_node is None:
start_node = onnx_model.graph.node[0]
print(f"start node: \n{start_node}“)
start_node = start_node.name
if end_node is None:
end_node = onnx_model.graph.node[-1]
print(f"end_node: \n{end_node}”)
end_node = end_node.name
try:
runner = ClientRunner(hw_arch=hardware)
hn, npz = runner.translate_onnx_model(ONNX_MODEL_PATH, model_name, start_node_names=[start_node], end_node_names=[end_node])
“”"
But the translation is throwing:
“”"
start node:
input: “backbone.body.layer4.0.bn3.bias”
output: “backbone.body.layer4.0.downsample.1.bias”
name: “Identity_1053”
op_type: “Identity”

end_node:
input: “res_append.7”
input: “/Constant_35_output_0”
output: “masks”
name: “/Unsqueeze_8”
op_type: “Unsqueeze”

[warning] Large model detected. The graph contains a large number of operators and variables, and might take a bit longer to load from ONNX.
[warning] ONNX shape inference failed: Unsupported dynamic shape([-1, 3, 0, 0]) found on input node input. Please use net_input_shapes, see documentation for additional info.
[info] Found a net_name that starts with a digit, added a prefix. New net_name is net_20k_longlong
[info] Attempting to retry parsing on a simplified model, using onnx simplifier
[warning] ONNX shape inference failed: Unsupported dynamic shape([-1, 3, 0, 0]) found on input node input. Please use net_input_shapes, see documentation for additional info.
[info] Found a net_name that starts with a digit, added a prefix. New net_name is net_20k_longlong
Failed to convert onnx to har: Can’t find vertex Identity_1053 in graph
“”"
I visualized the graph and found the first node to be a “/Split” immediatly below the input. So I fed it as start_node, moreover I added the option “net_input_shapes=[-1,480,640,3]” to solve the dynamic input warning. By the way [-1,480,640,3] is the shape of the images that I feed to the pytorch model.
The output changes to:
“”"
[warning] Large model detected. The graph contains a large number of operators and variables, and might take a bit longer to load from ONNX.
[warning] ONNX shape inference failed: [ONNXRuntimeError] : 1 : FAIL : Node (/transform/Sub) Op (Sub) [ShapeInferenceError] Incompatible dimensions
[info] Found a net_name that starts with a digit, added a prefix. New net_name is net_20k_longlong
[info] Attempting to retry parsing on a simplified model, using onnx simplifier
[warning] ONNX shape inference failed: [ONNXRuntimeError] : 1 : FAIL : Node (/transform/Sub) Op (Sub) [ShapeInferenceError] Incompatible dimensions
[info] Found a net_name that starts with a digit, added a prefix. New net_name is net_20k_longlong
Failed to convert onnx to har: list index out of range
“”"
What can I do? Are there any good practices to adopt while dealing big models?
I tried the same code with a different model and successfully compiled it.

Hey @saverio.taliani,

Welcome to the Hailo Community!

I see that your device is currently running an older version, but you can upgrade to version 4.18 to take advantage of the latest updates in both DFC and HailoRT (versions 3.28 and 4.18, respectively). You can find more information and detailed instructions in this thread: Update Firmware on Hailo-8. The DFC has received significant improvements since version 3.25, which was released about a year ago.
Then please re-run the compilation of the model and check if you have the same problem !

Regards

1 Like

Hi, thanks for your reply. I tried to convert my network with the latest docker release, unfortunately, that did not solve the problem.

Again if I ask to onnx the first node of the graph and I feed it to the function “translate_onnx_model” I obtain “Can’t find vertex Identity_1053 in graph”.

If instead I manually open the onnx model and inspecting it I select the first layer I see after the input when it is fed to translation function I obtain the following error: “[warning] ONNX shape inference failed: [ONNXRuntimeError] : 1 : FAIL : Node (/transform/Sub) Op (Sub) [ShapeInferenceError] Incompatible dimensions”

Hey @saverio.taliani

It seems like you are encountering issues related to ONNX model conversion, specifically when trying to feed the first node into the translate_onnx_model function and running into the error about missing vertices and incompatible dimensions.

Here are a few suggestions and steps to troubleshoot the problem:

1. ONNX Shape Inference Error:

The error ShapeInferenceError: Incompatible dimensions suggests that there might be a mismatch in the tensor shapes between layers in your model. ONNX performs shape inference to check the compatibility of tensor operations, and this error indicates that the output shape of one layer is not compatible with the input shape of the next.

a. Check Input and Output Shapes:

  • Manually inspect the input and output shapes of the layers before and after the node where the error occurs (in this case, the Sub node).
  • Use ONNX tools like Netron to visualize the model graph and confirm if the shapes match at each layer.

Example:

import onnx
from onnx import shape_inference

# Load and perform shape inference on the model
model = onnx.load('model.onnx')
inferred_model = shape_inference.infer_shapes(model)
onnx.save(inferred_model, 'inferred_model.onnx')

This will help you identify if there’s any shape mismatch before feeding the model to the translate_onnx_model function.

b. Fixing the Shape Mismatch:

  • If the shape mismatch is found, you may need to modify the model to ensure the dimensions align. This could involve adding reshape layers, or verifying that the input sizes are consistent throughout the model.

2. Vertex Identity Error:

The error Can't find vertex Identity_1053 in graph indicates that the node you are trying to reference does not exist or is incorrectly named. This could be due to several reasons:

  • Graph Structure: ONNX models have a directed graph structure, where each node is assigned a unique name. If the node Identity_1053 doesn’t exist, it could be that the node has a different name, or it may have been renamed during optimization.

  • ONNX Runtime Inconsistencies: Different versions of ONNX runtime may handle graphs and node names slightly differently. Ensure that you are using the correct version of ONNX and ONNX runtime compatible with your model.

  • Inspecting the ONNX Graph: You can manually inspect the ONNX graph to check the node names and find the correct references. Use the following Python script to inspect the node names:

import onnx

# Load ONNX model
model = onnx.load("model.onnx")

# Print all node names
for node in model.graph.node:
    print(node.name)

3. Manually Fixing ONNX Nodes:

Since you mentioned that opening and inspecting the ONNX model manually helped you avoid the first error, you can try to selectively modify the nodes that cause the issue by:

  • Manually editing the model: You can remove or replace problematic nodes, or use reshaping operations to ensure the tensor dimensions match.
  • Simplifying the model: If you are able to identify problematic parts of the model, you can simplify it by exporting only the functional portions, testing with translate_onnx_model step by step.

4. Additional Debugging with ONNX Runtime:

ONNX Runtime can sometimes provide more detailed error messages. You can run the model using ONNX Runtime to see if it throws any additional information about the shape mismatch:

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")

for input_name in session.get_inputs():
    print(f"Input name: {input_name.name}, shape: {input_name.shape}")

5. Ensure Compatibility with Hailo’s Translator:

If you’re using a Hailo translator for ONNX models, make sure your model’s format is fully supported by the latest version of the Hailo SDK. Check the Hailo documentation for any specific requirements or limitations related to model conversion.

Good luck! Let me know how it goes.

1 Like