Improving Small Object Detection in 4K Drone Footage Using Tiling with Hailo AI Processors

Detecting small objects in high-resolution images—especially 4K drone video—is a common challenge when using standard convolutional neural network (CNN) models like the YOLO series. Even powerful models often miss tiny targets when operating on full-frame 4K images.

So, how can we solve this problem?

In this post, I’ll share how we leveraged tiling techniques combined with the Hailo AI chip to dramatically improve small-object detection performance in 4K and even higher-resolution footage.


The Problem with Small Object Detection in 4K

In a typical 4K (3840x2160) frame, small objects occupy very few pixels relative to the whole image. Models like YOLOv8 process the entire frame at once, meaning small targets might get lost in downsampling or insufficient feature extraction.


The Tiling Solution

Our approach divides the 4K image into smaller, overlapping tiles. By processing each tile individually, we preserve the relative size of small objects in the input to the detector, improving detection accuracy.

In our tests, we used 4 tiles (2x2 grid), each with 1080p resolution and some overlap between tiles to avoid missing objects at the edges.

Example Tiling Parameters (C++ OpenCV Example):

// Tile configuration
const int offset_x = 50;
const int offset_y = 50;
const int tile_width = 1920;
const int tile_height = 1080;
const int overlap = 50;

std::vector<cv::Rect> tiles;
for (int row = 0; row < 2; ++row) {
    for (int col = 0; col < 2; ++col) {
        int x = offset_x + col * (tile_width - overlap);
        int y = offset_y + row * (tile_height - overlap);
        if (x + tile_width <= frame.cols && y + tile_height <= frame.rows) {
            tiles.emplace_back(x, y, tile_width, tile_height);
        }
    }
}

Each tile is passed independently through the YOLO model running on the Hailo AI accelerator, ensuring efficient and parallel processing.


Detection Results

The following results compare detection performance using the same confidence threshold on:

The results clearly show that tiling dramatically improves the detection of small objects, even when using the lighter YOLOv8s model. Many targets missed in the full-frame tests were successfully detected in the tiled approach.


Conclusion

If you are working on small object detection in high-resolution scenarios like drones, security cameras, or industrial inspection, tiling + efficient bath inference is a simple but powerful technique.

This approach enables small models to achieve excellent performance in high-resolution environments, reducing both compute load and power consumption—critical factors for edge deployments.


Happy to discuss further optimizations, such as dynamic tiling, adaptive overlap, or post-processing strategies to merge detections from tiles.

6 Likes

Out of curiosity, are you using an UDP RTSP stream? I was looking into the C++ API but it doesn’t support TCP out of the box and I’m not sure the Hailo team is open to PRs in that regard.

The host is doing the streaming part. Hailo8/8L only for the model inference

2 Likes

How do you handle duplicate detections occurring the the overlap areas?