Hi @vtol.abhijeet ,
To solve your problem you can use DeGirum PySDK and DeGirum Tools packages.
The script below does exactly what you want: it applies object detection model to a video stream, then it applies BYTETrack to track detected objects, then it handles mouse click event using OpenCV mouse callback mechanism to track only one object, and finally it displays the live video with selected object. The script is well commented, so you can understand what is going on.
You may need to adjust some parameters at the beginning of the script to select inference location, model zoo, models. etc. If you use cloud zoo, you will also need a token
(you may paste it instead of degirum_tools.get_token()
expression or put it into env.ini file close to the script).
This example works out of the box using cloud inference (hw_location = "@cloud"
), assuming you provided a token. If you have Hailo device installed locally, you may switch to local inference by assigning hw_location = "@local"
.
Example uses some mp4 video with walking people. You may change it by assigning video_source
to integer index to select local camera, or RTSP camera URL, or local path to mp4 file.
The full documentation of PySDK is available here: Overview | DeGirum Docs
The full documentation of DeGirum Tools is here: Overview | DeGirum Docs
You may also look our GitHub example repo for Hailo: DeGirum/hailo_examples: DeGirum PySDK with Hailo AI Accelerators
import degirum as dg, degirum_tools, cv2
# adjust all these parameters to your needs
hw_location = "@cloud"
model_zoo_url = "degirum/models_hailort"
person_model_name = "yolov8n_relu6_coco_pose--640x640_quant_hailort_hailo8_1"
video_source = "https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/WalkingPeople2.mp4"
display_name = "Object Selector Example"
# connect to the model zoo
zoo = dg.connect(
inference_host_address=hw_location,
zoo_url=model_zoo_url,
token=degirum_tools.get_token(),
)
# load person/pose detection model
person_model = zoo.load_model(person_model_name, overlay_line_width=1)
# create a context to store detections and selected track_id
context = dict(detections=None, track_id=None)
def point_in_rect(x, y, rect):
"""Check if point (x, y) is inside rectangle [x1, y1, x2, y2]."""
x1, y1, x2, y2 = rect
return x1 <= x <= x2 and y1 <= y <= y2
def is_object_selected(obj, result):
"""Return 1 when object has track_id matching the context, otherwise return 0."""
# store detections in context for use in mouse callback
context["detections"] = result.results
sel_track_id = context.get("track_id")
track_id = obj.get("track_id")
return int(
sel_track_id is not None and track_id is not None and track_id == sel_track_id
)
def mouse_callback(event: int, x: int, y: int, flags: int, context: dict):
"""Mouse callback to set the context for object selection"""
if event == cv2.EVENT_LBUTTONDOWN:
detections = context.get("detections")
if detections is not None:
# look for the object that contains clicked point
for obj in detections:
# check if the clicked point is inside the bounding box of the object
track_id = obj.get("track_id")
if track_id is not None and point_in_rect(x, y, obj["bbox"]):
# if so, remember the track_id in context
context["track_id"] = track_id
break
else:
context["track_id"] = None
# create object tracker analyzer to track objects
tracker = degirum_tools.ObjectTracker(
track_thresh=0.35,
match_thresh=0.9999,
anchor_point=degirum_tools.AnchorPoint.CENTER,
show_overlay=False,
)
# create object selector analyzer to select clicked person
selector = degirum_tools.ObjectSelector(
top_k=0,
selection_strategy=degirum_tools.ObjectSelectionStrategies.CUSTOM_METRIC,
# use custom metric to select the object of interest: object with highest metric value is selected
custom_metric=is_object_selected,
metric_threshold=0.5,
use_tracking=False,
show_overlay=False,
)
# attach object tracker and object selector analyzers to person detection model
degirum_tools.attach_analyzers(person_model, [tracker, selector])
# open display window
with degirum_tools.Display(display_name) as display:
# perform streaming inference on video source
for i, result in enumerate(
degirum_tools.predict_stream(person_model, video_source)
):
# show the result on the display
display.show(result)
# set mouse callback only once and only when the display is opened
if i == 0:
cv2.setMouseCallback(display_name, mouse_callback, context)