hailo-CLIP: how can I create a bounding box around detected objects?

I am running the hailo-CLIP examples. Now I would like to mark detected objects with a bounding box, in a similar way as it is done in the hailo-rpi5-examples.

In the file clip_application.py the bounding box coordinates appear to be determined for every detection:

    # Parse the detections
    for detection in detections:
        track = detection.get_objects_typed(hailo.HAILO_UNIQUE_ID)
        track_id = None
        label = None
        confidence = 0.0
        for track_id_obj in track:
            track_id = track_id_obj.get_id()
        if track_id is not None:
            string_to_print += f'Track ID: {track_id} '
        classifications = detection.get_objects_typed(hailo.HAILO_CLASSIFICATION)
        if len(classifications) > 0:
            string_to_print += ' CLIP Classifications:'
            for classification in classifications:
                label = classification.get_label()
                confidence = classification.get_confidence()
                string_to_print += f'Label: {label} Confidence: {confidence:.2f} '
            string_to_print += '\n'
        if isinstance(detection, hailo.HailoDetection):
            label = detection.get_label()
            **bbox = detection.get_bbox()**
            confidence = detection.get_confidence()
            string_to_print += f"Detection: {label} {confidence:.2f}\n"
    if string_to_print:
        print(string_to_print)
    return Gst.PadProbeReturn.OK

how can I now draw the box with the retrieved coordinates for bbox?
Thanks for your support!

Hey @schiwo1,

You can try something like this:

label = detection.get_label()
bbox = detection.get_bbox()
confidence = detection.get_confidence()
string_to_print += f"Detection: {label} {confidence:.2f}\n"

# Draw the bounding box on the frame
x_min, y_min, x_max, y_max = [int(v) for v in bbox]
color = (0, 255, 0)  # Green box
thickness = 2

cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), color, thickness)
if label:
    cv2.putText(frame, f"{label} {confidence:.2f}", (x_min, y_min - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

Haven’t tested this exact code, but the idea is there—should get you started!

Hi @omria thanks for getting back.

yes this is how I did my screen annotations in the hailo_rpi5_examples code.
However in hailo-CLIP I don’t know how to get the frame data like in the hailo_rpi5_examples code like:

        # Get video frame
        frame = get_numpy_from_buffer(buffer, format, width, height)

and how then to output the annotated frame to the screen like:

       # output frame to screen
        user_data.set_frame(frame)

Maybe I am missing something - but I could not find where in the hailo-CLIP code the screen annotation like the classification label and confidence on the output screen is done and if cv2 is used for this. It seems to me that the screen annotation and output handling is done differently from the hailo_rpi5_examples coding.
Pls let me know - thanks Stephan