Training yolo models with imgsz >640px

Hi,
I am facing an issue when training YOLO models with imgsz values above 640px. My detections are correct, but the bounding boxes are shifted.

I came across this post, where someone faced the same problem: Trouble running custom yolo.hef models with imgz = 1088 - #2 by omria

Honestly, I don’t understand what is happening here and what @omria tried to explain and where I could apply this corresponding to the dataset I use (WIDERFACE), which contains images with wildly different dimensions.

What I understand so far:
According to Ultralytics, using higher imgsz values shouldn’t be a problem. The imgsz parameter during training just fixes one side of the input size of the model. For example, an imgsz=1024 simply means that the width is fixed (e.g. 1024 x XXXX), which would mean that the aspect ratio of the training data is preserved and that letterboxing and padding are automatically applied.

This is my basic standard, training command:

!yolo task=detect mode=train model=yolov8n.pt data=/root/datasets/datasets/UPSCALED_WIDERFACE_YOLO/yolodataset.yaml epochs=15 batch=4 imgsz=1024 plots=True device=0,1

My questions are:

  • Why do these issues (shifted bounding boxes) only become apparent when using imgsz values above 640px?
  • Is it possible that this discrepancy stems from a configuration in the toolchain that I possibly could change?
  • From my understanding, there can’t be a one size fits all ratio or padding value for varying training data like WIDERFACE.
    • Why should this be considered in the compilation or inference process in the first place with a model trained on such varied data?
    • And why does inference work correctly for a trained model with imgsz= 640 and wildly different training data, where also padding and letterboxing seem to be applied during training (standard ultralytics configuration)?
  • Has anyone encountered and resolved this issue by modifying the compile-time configuration or the post-processing pipeline?

I don’t get it…

The good news is that I could live with the status quo. Because resizing the higher output dimensions (camera feed) to the model’s expected input dimension is at the moment sufficient.

But still: I would appreciate clarification on what is happening here. Any material is welcome.

Thank you!

Solved. I found out that I had to change the model specific nms_config in the modelzoo. Thanks, @Omer.

1 Like