Conversion Issue with ConvTranspose2d Layer from ONNX to HAR

Dear Hailo community,

We are facing an issue with the conversion of ConvTranspose2d from ONNX to HAR, which is similar to Wrong dimensions in har file from onnx containing torch.nn.ConvTranspose2d layers, and would like to ask for your support.

Problem description

We have an onnx model, which runs a successful inference using the onnxruntime. The library versions we are using to export/run the model are

onnx==1.16.0 onnxruntime-gpu==1.18.0 onnxscript==0.1.0.dev20241030 torch==2.3.1+cu121

The model contains a ConvTranspose layer with an 1x112x23x40 input and 1x40x45x80 output in the onnx file. This ConvTranspose layer has the following attributes:

dilations = (1, 1) group = 1 kernel_shape = (3, 3) output_padding = (0, 1) pads = (1, 1, 1, 1) strides = (2, 2)

According to the formulas given in the pytorch documentation, the output dimensions given in the onnx file should be correct.

When we convert the onnx model to the HAR format, the function call ClientRunner.translate_onnx_model() throws an hailo_sdk_common.hailo_nn.exceptions.UnsupportedModelError. The problem here is that the output of the ConvTranspose has dimensions [-1, 46, 80, 40] (HAR) and does therefore not match the dimensions of this layer in the onnx file ([-1, 45, 80, 40]). Therefore, a subsequent concat layer, which uses the input from ConvTranspose, fails.

For the transform we are using the Hailo AI SW Suite version 2024-10_docker.

Debugging steps

To get a better understanding of what is happening during the onnx to HAR conversion, we tried to only convert the first part of the network, which is from the input node to the ConvTranspose layer. This transform runs through successfully and we can visualize the resulting HAR model:

From the screenshot one can see that the ConvTranspose layer turned into a deconv with the output sizes mentioned above.

The attributes of the deconv layer in the HAR model are the following:

activation = linear batch_norm = false dilations = (1, 1, 1, 1) elementwise_add = false groups = 1 input_disparity = 1 kernel_shape = (3, 3, 112, 40) layer_disparity = 1 padding = DECONV strides = (1, 2, 2, 1)

Currently, we have the suspicion that the unsymmetric padding we are using (0, 1) might cause this issue. Could you therefore please let us know your ideas on how to transform the model to HAR with correct shapes?

Hi @therrmann,

As you pointed out, the output_padding used in your model may be responsible of the issue at parsing time. Indeed, ConvTranspose (or deconv, in Hailo naming) operations with output padding are not officially supported, ad may lead to differences with respect to the original model.
Could you please add a snapshot of the ONNX that shows how you concatenate the output of this operation with another tensor? Also, could you please share the entire parsing log?

Hi @pierrem,

Thanks for your answer!

To better explain this issue, we’ve prepared an MWE in the python script below. The script creates and onnx model which concatenates two signals:

When converting this model into the Hailo format, the above mentioned dimension error arises. Does this information help?

import logging

import onnxruntime as ort
import torch
from torch import nn


logger = logging.getLogger(__name__)


class HailoDemoModel(nn.Module):
    """
    Model for demonstrating the behavior of ConvTranspose2d.

    The model first downsamples the input tensor by a factor of 2 using Conv2d,
    and then upsamples it back to the original size using ConvTranspose2d.
    Based on the equations in the torch docs, the parameters provided to the
    ConvTranspose2d layer should result in the output tensor having the same
    size as the input tensor, hence concatenating them in the forward method
    should be successful.

    To demonstrate the Hailo export failure, use an input shape of
    (1, 1, 45, 80).

    """

    def __init__(self):
        """
        Initialize the model.

        Note, that the Conv2d and ConvTranspose2d layers are initialized with
        the same `kernel_size`, `stride`, and `padding` parameters. This,
        together with the `output_padding` parameter of the ConvTranspose2d
        layer, should result in the upsampled shape matching the shape of the
        input exactly (1, 1, 45, 80).

        Specifying the output padding in the ConvTranspose2d layer is necessary
        to reach the exact size of the tensor before the Conv2d layer. Output
        padding is generally necessary, because if there is stride > 1, multiple
        different input shapes to the Conv2d result in the same downsampled
        shape. (E.g giving as input (1, 1, 45, 80) or (1, 1, 46, 80) both
        results in (1, 1, 23, 80) based on the current parameterization of the
        Conv2D layer.) To be able to choose which of the multiple input shapes
        to reconstruct, output padding has to be specified.
        """
        super().__init__()

        # Scales an input of (1, 1, 45, 80) to (1, 1, 23, 40).
        self._conv_2d = nn.Conv2d(
            in_channels=1,
            out_channels=1,
            kernel_size=3,
            stride=2,
            padding=(1, 1),
        )

        # Scales an input of (1, 1, 23, 40) to (1, 1, 45, 80).
        self._transposed_conv_2d = nn.ConvTranspose2d(
            in_channels=1,
            out_channels=1,
            kernel_size=3,
            stride=2,
            padding=(1, 1),
            output_padding=(0, 1),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x1 = self._conv_2d(x)  # (1, 1, 23, 40)
        x2 = self._transposed_conv_2d(x1)  # (1, 1, 45, 80)

        return torch.cat([x, x2], dim=1)  # (1, 2, 45, 80)


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    torch.manual_seed(0)

    device = torch.device("cpu")
    execution_provider = (
        ["CPUExecutionProvider"]
    )

    # Constructing the model and a dummy input to export with.
    model = HailoDemoModel().to(device)
    x = torch.rand(1, 1, 45, 80).to(device)

    # Exporting the model to onnx.
    model_save_path = "hailo_demo.onnx"
    exported_model = torch.onnx.export(model, x, model_save_path)
    logger.info("Successfully exported the model to onnx.")

    # Saving the model to disk.
    # exported_model.save(model_save_path)
    logger.info(f"Model saved to {model_save_path}.")

    # Creating an onnx runtime session and running the exported model.
    ort_session = ort.InferenceSession(
        model_save_path,
        providers=execution_provider,
    )

    ort_input_names = ort_session.get_inputs()
    ort_input = {ort_input_names[0].name: x.cpu().numpy()}

    ort_out = ort_session.run(None, ort_input)
    logger.info("Successfully ran the exported model with onnx runtime.")

    # Validating the output of the exported model.
    gt_out = model(x)
    if not torch.allclose(gt_out, torch.tensor(ort_out[0], device=device)):
        raise RuntimeError(
            "Output mismatch between native pytorch and the onnx exported"
            " model."
        )

    logger.info("Successfully validated the exported model.")