Help on bounding box not properly drawn

Hello! I’m ack again, my account was deleted :sweat_smile:.
I recently developed a C++ app integrated into LVGL to have my own graphic interface and show prediction on video input, actually i’m testing on walkin people “example_640.mp4” using yolov6n.hef.
It works, i see prediction but i’m not able to draw bounding boxes correctly on the source video


What could be the problem? I use GStreamer pipeline and for drawing boxes

#define FRAME_WIDTH 640
#define FRAME_HEIGHT 640

void draw_detection_overlay(uint8_t *rgb888, int width, int height) {
    for (int i = 0; i < g_num_detections; ++i) {
        Detection det = g_detections[i];

        // Coordinate normalizzate su immagine intera
        int x0 = (int)((det.x - det.w / 2.0f) * width);
        int y0 = (int)((det.y - det.h / 2.0f) * height);
        int x1 = (int)((det.x + det.w / 2.0f) * width);
        int y1 = (int)((det.y + det.h / 2.0f) * height);

        // Clamp
        x0 = MY_CLAMP(x0); y0 = MY_CLAMP(y0);
        if (x1 > width) x1 = width;
        if (y1 > height) y1 = height;

        // Colore in base alla classe
        uint8_t r = 255, g = 0, b = 0;
        if (det.class_id < num_class_colors) {
            r = class_colors[det.class_id][0];
            g = class_colors[det.class_id][1];
            b = class_colors[det.class_id][2];
        }

        for (int y = y0; y < y1; ++y) {
            for (int x = x0; x < x1; ++x) {
                if (x == x0 || x == x1 - 1 || y == y0 || y == y1 - 1) {
                    int idx = (y * width + x) * 3;
                    rgb888[idx + 0] = r;
                    rgb888[idx + 1] = g;
                    rgb888[idx + 2] = b;
                }
            }
        }

        printf("Detection: class=%d box=(%.2f %.2f %.2f %.2f)\n",
       det.class_id, det.x, det.y, det.w, det.h);

    }
}

int gstreamer_grab_frame(lv_color_t *dst_buffer) {
    GstSample *sample = gst_app_sink_pull_sample(GST_APP_SINK(appsink));
    if (!sample) return -1;

    GstBuffer *buffer = gst_sample_get_buffer(sample);
    GstMapInfo map;
    if (!gst_buffer_map(buffer, &map, GST_MAP_READ)) {
        gst_sample_unref(sample);
        return -1;
    }

    const uint8_t *rgb_data = map.data;
    uint8_t rgb_copy[FRAME_WIDTH * FRAME_HEIGHT * 3];
    memcpy(rgb_copy, rgb_data, FRAME_WIDTH * FRAME_HEIGHT * 3);

    draw_detection_overlay(rgb_copy, FRAME_WIDTH, FRAME_HEIGHT);
    rgb888_to_rgb565(rgb_copy, dst_buffer, FRAME_WIDTH * FRAME_HEIGHT);

    gst_buffer_unmap(buffer, &map);
    gst_sample_unref(sample);
    return 0;
}


int main(int argc, char **argv) {
    (void)argc;
    (void)argv;

    lv_init();
    hal_init();

    if (gstreamer_init() < 0) {
        return 1;
    }

    ui_init();
    lv_img_set_src(ui_VideoFrame, &webcam_img);

    while (1) {
        pthread_mutex_lock(&buffer_mutex);
        lv_color_t *dst_buffer = (current_buffer == webcam_buffer_a) ? webcam_buffer_b : webcam_buffer_a;

        if (gstreamer_grab_frame(dst_buffer) == 0) {
            current_buffer = dst_buffer;
            webcam_img.data = (const uint8_t *)current_buffer;
            lv_img_cache_invalidate_src(&webcam_img);
            lv_obj_invalidate(ui_VideoFrame);
        }

        pthread_mutex_unlock(&buffer_mutex);

        lv_timer_handler();
        if (gstreamer_grab_frame_and_infer() != 0) {
            break;
        }
        usleep(1000);
    }

    return 0;
}

Yes something strange in the code could be in! This is just a try, and for next step and finalizing prototype, i need to have correctly drawn bounding boxes…someone could help?

Hey @Andrew92,

Sorry for the inconvenience - I’ll check out what happened there.

I’d recommend using our post-process GStreamer element since it’s built specifically for drawing on top of frames. You can just grab the postprocess function from the detection example and adapt it.

Quick question - does the YOLOv6n model you’re using have HailoTPP (postprocess on chip) or NMS enabled?

Regarding your coordinate issue:

When you’re drawing boxes by taking the normalized <x,y,w,h> from the detector and multiplying by your hardcoded 640×640, you’re likely mixing up two different coordinate spaces:

1. Your detections are normalized on the pre-processed input, not your raw display frame
The HailoDetection bbox values are in [0,1] relative to whatever ROI was fed into the network. If your GStreamer pipeline is doing letterboxing or padding to maintain aspect ratio, those black bars mess up the mapping back to your original frame.

Here’s how to fix it:

  • Get the actual input dimensions and letterbox padding the plugin used (check the DSP’s Crop & Resize params or GStreamer caps/metadata)
  • Calculate the scale and offset:
float scale = MIN(frame_w/(float)net_w, frame_h/(float)net_h);
float pad_x = (frame_w - net_w*scale) * 0.5f;
float pad_y = (frame_h - net_h*scale) * 0.5f;

// denormalize to net coords, then map to frame:
float cx = det.x * net_w;
float cy = det.y * net_h;
float bw = det.w * net_w;
float bh = det.h * net_h;

int x0 = int((cx - bw/2)*scale + pad_x);
int y0 = int((cy - bh/2)*scale + pad_y);
int x1 = int((cx + bw/2)*scale + pad_x);
int y1 = int((cy + bh/2)*scale + pad_y);

If you’re using HailoDSP’s letterbox API (dsp_crop_and_resize_letterbox()), you can pull the exact pad_x, pad_y, and scale parameters for perfect mapping.

2. You’re grabbing two different frames per loop
In your while loop you call gstreamer_grab_frame(dst_buffer) for drawing, then later gstreamer_grab_frame_and_infer() to update detections. Each call pulls a fresh sample from the appsink, so you end up drawing boxes from Frame N+1 onto Frame N.

Fix: Pull one GstSample per iteration, map it once, copy the data, run inference on that same copy, then draw the overlay and push to LVGL. Don’t call gst_app_sink_pull_sample twice in one loop.

3. Watch out for stride != width*3
If your appsink buffer has row padding (stride > width*3), indexing with y*width*3 + x*3 will distort your image and box positions.

int row_stride = /* get from GstVideoInfo or GstMapInfo.stride */;
int idx = y*row_stride + x*3;
rgb888[idx+0] = r; // etc.

4. Query the real frame size instead of hardcoding
After pulling the sample:

GstCaps *caps = gst_sample_get_caps(sample);
GstStructure *s = gst_caps_get_structure(caps, 0);
gst_structure_get_int(s, "width", &width);
gst_structure_get_int(s, "height", &height);

By unifying your frame grab (so inference and drawing happen on the same image), reading the real dimensions and stride from GStreamer, and compensating for letterbox padding, your red rectangles should line up perfectly with your detections.

Let me know if you need any clarification on this!