Conversion Issue with ConvTranspose2d Layer from ONNX to HAR

Dear Hailo community,

We are facing an issue with the conversion of ConvTranspose2d from ONNX to HAR, which is similar to Wrong dimensions in har file from onnx containing torch.nn.ConvTranspose2d layers, and would like to ask for your support.

Problem description

We have an onnx model, which runs a successful inference using the onnxruntime. The library versions we are using to export/run the model are

onnx==1.16.0 onnxruntime-gpu==1.18.0 onnxscript==0.1.0.dev20241030 torch==2.3.1+cu121

The model contains a ConvTranspose layer with an 1x112x23x40 input and 1x40x45x80 output in the onnx file. This ConvTranspose layer has the following attributes:

dilations = (1, 1) group = 1 kernel_shape = (3, 3) output_padding = (0, 1) pads = (1, 1, 1, 1) strides = (2, 2)

According to the formulas given in the pytorch documentation, the output dimensions given in the onnx file should be correct.

When we convert the onnx model to the HAR format, the function call ClientRunner.translate_onnx_model() throws an hailo_sdk_common.hailo_nn.exceptions.UnsupportedModelError. The problem here is that the output of the ConvTranspose has dimensions [-1, 46, 80, 40] (HAR) and does therefore not match the dimensions of this layer in the onnx file ([-1, 45, 80, 40]). Therefore, a subsequent concat layer, which uses the input from ConvTranspose, fails.

For the transform we are using the Hailo AI SW Suite version 2024-10_docker.

Debugging steps

To get a better understanding of what is happening during the onnx to HAR conversion, we tried to only convert the first part of the network, which is from the input node to the ConvTranspose layer. This transform runs through successfully and we can visualize the resulting HAR model:

From the screenshot one can see that the ConvTranspose layer turned into a deconv with the output sizes mentioned above.

The attributes of the deconv layer in the HAR model are the following:

activation = linear batch_norm = false dilations = (1, 1, 1, 1) elementwise_add = false groups = 1 input_disparity = 1 kernel_shape = (3, 3, 112, 40) layer_disparity = 1 padding = DECONV strides = (1, 2, 2, 1)

Currently, we have the suspicion that the unsymmetric padding we are using (0, 1) might cause this issue. Could you therefore please let us know your ideas on how to transform the model to HAR with correct shapes?