How to sync metadata with video frame? how to edit metadata timestamps?

Hi there,
I’m using a setup with multiple rpi5 with camera modules, and a rpi5 with hailo 8l, all connected in a local network.
I’m using gstreamer to stream the video, a gst hailo pipeline to recieve the stream and run inference, and hailoexportzmq element to stream the metadata.

I have another rpi5 on the network connected to a display, which is receiving and recording the video stream (from the pi with camera), and also receiving and recording the metadata zmq stream (from the pi with hailo).

This setup might seem complicated but it satisfies my constraints.

My requirement:
I want to draw the bboxes on the frames once i recieve it on the pi with display and also perform basic frame by frame analysis based on the metadata.

My problem:
I"m having trouble associating the correct metadata with the correct frame. timestamps arent helpful as there are time/network delays in between. frame numbers arent helpful as there is a mismatch.

Can someone please help me come up with a solution?
If i know where the timestamp is added to the buffer metadata, maybe i can replace that timestamp with the timestamp of video frame creation which is already in the buffer, so that the timestamp stays constant and can easily be associated with the frame regardless of delays.
could you please suggest any edits to the cpp/hpp files to edit so i can recompile the so files if needed?
could you help me locate where the timestamp, buffer_offset, stream_id are created/added to the buffer so i can edit it?

Thank you, looking forward to your ideas!

Hey @user113 ,

To synchronize video frames and metadata in your setup, follow these modifications:

1. Modify Camera Pi (Video Source)

  • Attach a unique frame identifier (frame_id) to each frame in the GStreamer pipeline
  • Ensure this ID stays consistent throughout the pipeline

2. Modify Hailo Inference Pi

  • Preserve frame_id in Hailo’s inference pipeline
  • Configure hailoexportzmq to transmit the frame_id along with the inference metadata

3. Modify Display Pi

  • Extract frame_id from the video stream when it arrives
  • Match frame_id with metadata from ZMQ before overlaying bounding boxes
  • Implement a frame-metadata association mechanism (dictionary or queue) to handle network delays

4. Ensure Proper Data Flow

  • Verify frame_id propagates correctly in all pipelines
  • Validate by logging frame_id at each stage
  • Tune buffering if necessary to account for network latencies

This approach ensures accurate frame-metadata association without relying on timestamps, eliminates issues due to network latency or buffer mismatches, and enables real-time bounding box rendering with correct metadata.

For more info please check out our TAPPAS documentation , Pages 114-128.
Let me know if you need anything else!