Single object tracking

vtol.abhijeet · May 31, 2025, 1:32am

"I’m using YOLOv8 on the Hailo AI Kit for object detection, and it’s working well. We are using an external camera and streaming video via RTSP. Now, I want to implement single object tracking (SOT) where a user can click on a detected object (using a mouse) to start tracking it in real time.

How can I integrate user interaction for selecting a specific object to track? Specifically:

Does the Hailo SDK support using OpenCV UI elements like cv2.setMouseCallback() for mouse-based ROI selection?
What is the recommended way to implement tracking (e.g., Deep SORT, ByteTrack, or OpenCV trackers) after detection on the Hailo platform?
Is there an example or best practice for implementing real-time single object tracking or ROI tracking after detection in a Hailo-compatible pipeline?

I’d also like to know if it’s feasible to overlay detection results and user-selected tracking information on the RTSP stream output."

KlausK · May 31, 2025, 1:42am

While we do provide example applications in our GitHub repositories

GitHub - Hailo Application Code Examples
GitHub - Hailo RPi5 Examples
GitHub - Hailo AI Tappas

it is up to our users to create applications. Our goal is to help you run the compute intensive part of running inference of a NN model efficiently with high performance.

User interaction and interface and the choice of framework is up to you.

Vlad_Klimov · May 31, 2025, 5:46am

Hi @vtol.abhijeet ,

To solve your problem you can use DeGirum PySDK and DeGirum Tools packages.
The script below does exactly what you want: it applies object detection model to a video stream, then it applies BYTETrack to track detected objects, then it handles mouse click event using OpenCV mouse callback mechanism to track only one object, and finally it displays the live video with selected object. The script is well commented, so you can understand what is going on.

You may need to adjust some parameters at the beginning of the script to select inference location, model zoo, models. etc. If you use cloud zoo, you will also need a token
(you may paste it instead of degirum_tools.get_token() expression or put it into env.ini file close to the script).

This example works out of the box using cloud inference (hw_location = "@cloud"), assuming you provided a token. If you have Hailo device installed locally, you may switch to local inference by assigning hw_location = "@local".

Example uses some mp4 video with walking people. You may change it by assigning video_source to integer index to select local camera, or RTSP camera URL, or local path to mp4 file.

The full documentation of PySDK is available here: Overview | DeGirum Docs

The full documentation of DeGirum Tools is here: Overview | DeGirum Docs

You may also look our GitHub example repo for Hailo: DeGirum/hailo_examples: DeGirum PySDK with Hailo AI Accelerators

import degirum as dg, degirum_tools, cv2


# adjust all these parameters to your needs
hw_location = "@cloud"
model_zoo_url = "degirum/models_hailort"
person_model_name = "yolov8n_relu6_coco_pose--640x640_quant_hailort_hailo8_1"
video_source = "https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/WalkingPeople2.mp4"
display_name = "Object Selector Example"

# connect to the model zoo
zoo = dg.connect(
    inference_host_address=hw_location,
    zoo_url=model_zoo_url,
    token=degirum_tools.get_token(),
)

# load person/pose detection model
person_model = zoo.load_model(person_model_name, overlay_line_width=1)

# create a context to store detections and selected track_id
context = dict(detections=None, track_id=None)


def point_in_rect(x, y, rect):
    """Check if point (x, y) is inside rectangle [x1, y1, x2, y2]."""
    x1, y1, x2, y2 = rect
    return x1 <= x <= x2 and y1 <= y <= y2


def is_object_selected(obj, result):
    """Return 1 when object has track_id matching the context, otherwise return 0."""
    # store detections in context for use in mouse callback
    context["detections"] = result.results
    sel_track_id = context.get("track_id")
    track_id = obj.get("track_id")
    return int(
        sel_track_id is not None and track_id is not None and track_id == sel_track_id
    )


def mouse_callback(event: int, x: int, y: int, flags: int, context: dict):
    """Mouse callback to set the context for object selection"""
    if event == cv2.EVENT_LBUTTONDOWN:
        detections = context.get("detections")
        if detections is not None:
            # look for the object that contains clicked point
            for obj in detections:
                # check if the clicked point is inside the bounding box of the object
                track_id = obj.get("track_id")
                if track_id is not None and point_in_rect(x, y, obj["bbox"]):
                    # if so, remember the track_id in context
                    context["track_id"] = track_id
                    break
            else:
                context["track_id"] = None


# create object tracker analyzer to track objects
tracker = degirum_tools.ObjectTracker(
    track_thresh=0.35,
    match_thresh=0.9999,
    anchor_point=degirum_tools.AnchorPoint.CENTER,
    show_overlay=False,
)

# create object selector analyzer to select clicked person
selector = degirum_tools.ObjectSelector(
    top_k=0,
    selection_strategy=degirum_tools.ObjectSelectionStrategies.CUSTOM_METRIC,
    # use custom metric to select the object of interest: object with highest metric value is selected
    custom_metric=is_object_selected,
    metric_threshold=0.5,
    use_tracking=False,
    show_overlay=False,
)

# attach object tracker and object selector analyzers to person detection model
degirum_tools.attach_analyzers(person_model, [tracker, selector])

# open display window
with degirum_tools.Display(display_name) as display:
    # perform streaming inference on video source
    for i, result in enumerate(
        degirum_tools.predict_stream(person_model, video_source)
    ):
        # show the result on the display
        display.show(result)
        # set mouse callback only once and only when the display is opened
        if i == 0:
            cv2.setMouseCallback(display_name, mouse_callback, context)

Vlad_Klimov · May 31, 2025, 11:20pm

@vtol.abhijeet ,

Regarding RTSP streaming from Python. Typically, GStreamer is used for that purpose. It is well known technique which is easy to implement. Please refer to the following links:

vtol.abhijeet · June 10, 2025, 9:00am

Thank you for replying. I want to know if we can integrate this without using the DeGirum PySDK and DeGirum Tools packages. When I try using OpenCV trackers, detection happens, but I am unable to click on the detected object to start tracking. Please help me fix this

Vlad_Klimov · June 10, 2025, 7:37pm

@vtol.abhijeet ,

I want to know if we can integrate this without using the DeGirum PySDK and DeGirum Tools packages.

Of course you can. But it will require for you to write all necessary functionality yourself.
DeGirum Tools greatly simplifies those efforts by providing easy to use software classes for most common tasks.

When I try using OpenCV trackers, detection happens, but I am unable to click on the detected object to start tracking.

Do you mean that this problem happens in the code provided above?

Topic		Replies	Views
Integrating Existing OpenCV System with Hailo NPU on Raspberry Pi 5 Guides gstreamer , raspberry-pi	1	143	May 2, 2025
Detection and tracking in real-time using the Python API General raspberry-pi	7	1141	March 18, 2025
How to run Hailo inference (object detection with custom yolo model) in ROS2 callback? General hailo8 , yolov8	0	71	April 5, 2025
RPI5 + Hailo8L Object counting python General raspberry-pi , hailo8	2	533	October 15, 2024
Object Detection in Images using Hailo AI Kit for Raspberry Pi Guides raspberry-pi	16	1879	January 10, 2025

Single object tracking

Related topics