Single object tracking

"I’m using YOLOv8 on the Hailo AI Kit for object detection, and it’s working well. We are using an external camera and streaming video via RTSP. Now, I want to implement single object tracking (SOT) where a user can click on a detected object (using a mouse) to start tracking it in real time.

How can I integrate user interaction for selecting a specific object to track? Specifically:

  • Does the Hailo SDK support using OpenCV UI elements like cv2.setMouseCallback() for mouse-based ROI selection?
  • What is the recommended way to implement tracking (e.g., Deep SORT, ByteTrack, or OpenCV trackers) after detection on the Hailo platform?
  • Is there an example or best practice for implementing real-time single object tracking or ROI tracking after detection in a Hailo-compatible pipeline?

I’d also like to know if it’s feasible to overlay detection results and user-selected tracking information on the RTSP stream output."

While we do provide example applications in our GitHub repositories

GitHub - Hailo Application Code Examples
GitHub - Hailo RPi5 Examples
GitHub - Hailo AI Tappas

it is up to our users to create applications. Our goal is to help you run the compute intensive part of running inference of a NN model efficiently with high performance.

User interaction and interface and the choice of framework is up to you.

Hi @vtol.abhijeet ,

To solve your problem you can use DeGirum PySDK and DeGirum Tools packages.
The script below does exactly what you want: it applies object detection model to a video stream, then it applies BYTETrack to track detected objects, then it handles mouse click event using OpenCV mouse callback mechanism to track only one object, and finally it displays the live video with selected object. The script is well commented, so you can understand what is going on.

You may need to adjust some parameters at the beginning of the script to select inference location, model zoo, models. etc. If you use cloud zoo, you will also need a token
(you may paste it instead of degirum_tools.get_token() expression or put it into env.ini file close to the script).

This example works out of the box using cloud inference (hw_location = "@cloud"), assuming you provided a token. If you have Hailo device installed locally, you may switch to local inference by assigning hw_location = "@local".

Example uses some mp4 video with walking people. You may change it by assigning video_source to integer index to select local camera, or RTSP camera URL, or local path to mp4 file.

The full documentation of PySDK is available here: Overview | DeGirum Docs

The full documentation of DeGirum Tools is here: Overview | DeGirum Docs

You may also look our GitHub example repo for Hailo: DeGirum/hailo_examples: DeGirum PySDK with Hailo AI Accelerators

import degirum as dg, degirum_tools, cv2


# adjust all these parameters to your needs
hw_location = "@cloud"
model_zoo_url = "degirum/models_hailort"
person_model_name = "yolov8n_relu6_coco_pose--640x640_quant_hailort_hailo8_1"
video_source = "https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/WalkingPeople2.mp4"
display_name = "Object Selector Example"

# connect to the model zoo
zoo = dg.connect(
    inference_host_address=hw_location,
    zoo_url=model_zoo_url,
    token=degirum_tools.get_token(),
)

# load person/pose detection model
person_model = zoo.load_model(person_model_name, overlay_line_width=1)

# create a context to store detections and selected track_id
context = dict(detections=None, track_id=None)


def point_in_rect(x, y, rect):
    """Check if point (x, y) is inside rectangle [x1, y1, x2, y2]."""
    x1, y1, x2, y2 = rect
    return x1 <= x <= x2 and y1 <= y <= y2


def is_object_selected(obj, result):
    """Return 1 when object has track_id matching the context, otherwise return 0."""
    # store detections in context for use in mouse callback
    context["detections"] = result.results
    sel_track_id = context.get("track_id")
    track_id = obj.get("track_id")
    return int(
        sel_track_id is not None and track_id is not None and track_id == sel_track_id
    )


def mouse_callback(event: int, x: int, y: int, flags: int, context: dict):
    """Mouse callback to set the context for object selection"""
    if event == cv2.EVENT_LBUTTONDOWN:
        detections = context.get("detections")
        if detections is not None:
            # look for the object that contains clicked point
            for obj in detections:
                # check if the clicked point is inside the bounding box of the object
                track_id = obj.get("track_id")
                if track_id is not None and point_in_rect(x, y, obj["bbox"]):
                    # if so, remember the track_id in context
                    context["track_id"] = track_id
                    break
            else:
                context["track_id"] = None


# create object tracker analyzer to track objects
tracker = degirum_tools.ObjectTracker(
    track_thresh=0.35,
    match_thresh=0.9999,
    anchor_point=degirum_tools.AnchorPoint.CENTER,
    show_overlay=False,
)

# create object selector analyzer to select clicked person
selector = degirum_tools.ObjectSelector(
    top_k=0,
    selection_strategy=degirum_tools.ObjectSelectionStrategies.CUSTOM_METRIC,
    # use custom metric to select the object of interest: object with highest metric value is selected
    custom_metric=is_object_selected,
    metric_threshold=0.5,
    use_tracking=False,
    show_overlay=False,
)

# attach object tracker and object selector analyzers to person detection model
degirum_tools.attach_analyzers(person_model, [tracker, selector])

# open display window
with degirum_tools.Display(display_name) as display:
    # perform streaming inference on video source
    for i, result in enumerate(
        degirum_tools.predict_stream(person_model, video_source)
    ):
        # show the result on the display
        display.show(result)
        # set mouse callback only once and only when the display is opened
        if i == 0:
            cv2.setMouseCallback(display_name, mouse_callback, context)
1 Like

@vtol.abhijeet ,

Regarding RTSP streaming from Python. Typically, GStreamer is used for that purpose. It is well known technique which is easy to implement. Please refer to the following links: