Converting Vision Transfromer Tracker to HEF

Hi, community, first of all, thank you for your product!
In our company, we are going to test object tracking models on RPi5, specifically – ViTT model.
We have found .onnx model from github, where there is a single .onnx file

And now, I’m trying to convert it to .HEF format to furtherly launch on RPi AI Kit.

On the very first step, I have successfully converted model to .HAR

hn, npz = runner.translate_onnx_model(
            os.path.join(models_dir, models_file),
            models_output,
            start_node_names=['template', 'search'],
            end_node_names=['Transpose_76', 'Concat_34']
        )

I have visualized network structure, but unfortunately, I cannot attach it to the post for reference.

But I’m facing issues with quantization step.
I am using the same calibration dataset, as in tutorial (DFC_2_Model_Optimization_Tutorial), with changed input dimensions to 128 (as for node input_layer_1).

However, during optimization, I am facing, that for layer const_input_1 raises an exception in const_op.py → _compute_output_shape – int object is not subscriptable, i.e., input shape in this function is a int, not an array, as expected.

Can you please point me out, what am I doing wrong?

hi @dmytro.pekach, could you please send us the link of the repo where you got the onnx from?

Hello, thank you for your reply. Here is the link to repo.
Ideally, I would like to have an opportunity to launch this tracker on Hailo device. But I haven’t found any relevant examples regarding single object tracking. Can you please also point out, what gstreamer pipeline is needed to be built in order to launch this tracker properly?

Moreover, as a second step, I also would like to convert OpenCV TrackerNano, that consists of 2 files (backbone.onnx and head.onnx) to .hef format and launch this

The easiest solution would be to use our already-implemented jde tracker (hailo_tracker)

Would that be good enough for your use case?

I get the option about using jde tracker. However, unfortunately, for our use case it is not sophisticated enough, since we have already tested this tracker and it didn’t show promising results. Moreover, one of our necessity is to be able to fine tune those trackers and be able to select and combine trackers outputs (like, ensembling idea)

We took a look at the VIT tracker and it seems that there are a few operations that are currently unsupported. This means the model code would mostly likely have to be altered.

Another tracker we already have experience with is ByteTrack. This one can also easily be implemented into a GStreamer pipeline, so let us know if it fits your requirements.

Thanks a lot for your efforts.
I’ve spoken with our managers and for now we will stop on conducting experiments with YOLO architecture.
However, would it be possible to include both Nano and VIT architectures to a future releases?