Looking for more info to work with 2 streams and 2 networks

Hi., I’m Intrested, to know more in detail and what approach should I prefer. I am working with 2 cameras on rpi5 trying to detect and estimate a precise point of a object in video.

I am currently experimenting with pose model. But I assume this will not be much accurate in case of overlapping object. So I am thinking to use 2 stage inference. First network will be obb(it seems best way for my usecase, maybe segmentation can also help). Then i want to crop the detected objects(max 4 objects). And make a tile of detected objects then pushed to yolov8 pose detection network. This will detect my required point(i think precisely). 15fps is ideal for my purpose. And also little(not more than 300ms) bit delay is acceptable.

I am new to the ML. So, less knowledge of models in depth.
From my perspective what i have to use and understand are…

  1. Training (know a little bit and can do)
  2. Conversion to onxx (know a little bit and can do)
  3. Convert to har (know a little bit and can do)
  4. Quantization and optimization (know a little bit and can do)
  5. Compile to hef (know a little bit and can do)
  6. How to write post processing steps(i am completely unsure how to start)
  7. Custom Gstreamer plugin(I have tried but not able to register(unable to load), tried to follow tappas plugins design aswell)
  8. Object tracker(I can use hailotracker) but want to maintain timing of first detection. (So no sure what should i do)

So my questions are,

  1. Is there anything else i need to know other than mentioned above.
  2. What are good resources to create gstreamer plugins. As i explained i have to make tiles by cropping and masking detection. So i will have to write code. I want to build in c++, because i want to make process more speedy.
  3. Any other recommendation on my approach.
  4. How exactly i should use hailonet for 2 hefs. I have find something

    tappas/docs/pipelines/parallel_networks.rst at master · hailo-ai/tappas · GitHub
    tappas/docs/pipelines/single_network.rst at master · hailo-ai/tappas · GitHub

From What I know hailo internally manage interface to access device allow us to use multiple instance of same device virtually. And Correct me if I am wrong.

I am intrested in merging obb and pose har to single hef.(Mentioned in Above Image). Should I prefer merge option or should I keep 2 hefs.

Till Now, I have manage to work with just yolov8 pose part (It’s working but unable to post process correctly my points in gstreamer pipeline. See here the Issue Hey I want to build my own custom postprocessing .so - #7 by saurabh )

Hey @saurabh

Your project using two cameras on a Raspberry Pi 5 for a two-stage inference pipeline with object detection and pose estimation sounds fascinating. Let’s dive into the details of your approach and address your questions.

Your Approach:

  1. Two-Stage Inference:
    • First, use an object bounding box (OBB) detection model or segmentation to detect objects (you mentioned detecting up to 4 objects).
    • Crop the detected objects and tile them into a composite image.
    • Feed the cropped objects into a YOLOv8 pose detection network for precise point detection.

This approach is solid, and the idea of breaking it into two stages (OBB → Cropping → Pose Estimation) is good for dealing with overlapping objects.


Addressing Your Specific Questions:

1. Is there anything else I need to know?

Based on your current progress, here are a few additional points you might want to consider:

  • Synchronization Between Cameras: Ensure that the frames from both cameras are synchronized properly to avoid discrepancies in object detection between streams.
  • Model Calibration and Quantization: If you are new to quantization, be aware that post-training quantization can help you speed up inference on edge devices like the Raspberry Pi without sacrificing much accuracy.
  • Latency Optimization: Use hardware acceleration to ensure that your processing pipeline can achieve 15 FPS with minimal delay (within your acceptable 300ms window).

2. Post-Processing Steps:

Post-processing is crucial for both stages of your inference. Here’s a guide to help:

  • For OBB Detection: After detecting objects, you will need to crop the images and tile them together before passing them to the next network. This involves extracting bounding boxes from the first network’s output and dynamically creating a composite image.
  • For Pose Detection: The pose detection will give you key points that you need to correctly map to your original frame. If you use YOLOv8, you can leverage the model’s built-in keypoint format, but you’ll need to carefully re-scale and position the keypoints in the original image space.

3. GStreamer Plugin Development:

Creating a GStreamer plugin to handle the custom tiling of cropped objects is a good idea, and you can certainly speed up processing by using C++. Below are steps and resources to help with GStreamer plugin development:

Key Steps:

  • Start with the TAPPAS framework examples (as you mentioned). Follow their structure for plugin registration and ensure that your plugin is correctly compiled and registered with GStreamer.
  • Look into existing GStreamer plugins like videomixer or compositor that handle video streams. You can learn from these examples to understand how to tile images in GStreamer.

Resources:

4. Object Tracking:

Since you plan to maintain timing from the first detection (with a HailoTracker), I recommend you synchronize the timestamp of the object detection with the tracker. You could pass the timestamp through your GStreamer pipeline and ensure that the tracker keeps track of detected objects over time, using the first detection as a reference.

5. Multiple HEFs vs. Merged HEF:

Whether to use two separate HEFs or merge them into one HEF depends on your specific use case and hardware resources:

Separate HEFs:

  • This allows for flexibility. You can run the two networks in parallel, but keep in mind that managing two streams and two networks can be resource-heavy.

Merged HEF:

  • Merging both OBB detection and pose estimation into a single HEF can simplify your pipeline and reduce overhead. However, it will depend on how well the two networks can be merged and optimized together.

  • Recommendation: Start by keeping the HEFs separate until you are sure the process works smoothly, and then consider merging them for optimization purposes. If latency becomes an issue with two separate HEFs, merging might help.

6. Using HailoNet for Multiple HEFs:

The Hailo system supports multiple instances of the same device, allowing you to manage different networks virtually. You can run the two models simultaneously on a single Hailo chip by using parallel inference pipelines, which Hailo supports natively. Here’s an outline:

  • Use Hailo’s parallel inference capability by assigning one network (OBB detection) to one virtual instance and the second network (pose detection) to another virtual instance.
  • Use the TAPPAS parallel networks example to manage two networks. It provides a detailed explanation of how to manage two networks on a single Hailo device.
  • Use the Hailo Scheduler to manage network execution efficiently.

7. Building Your Custom GStreamer Post-Processing Plugin:

For the post-processing tile creation:

  • Write a GStreamer plugin in C++ that takes the bounding boxes from the OBB network and crops the corresponding parts of the video stream.
  • After cropping, tile the objects and pass them into the pose detection model.
  • This will require you to handle multiple outputs from the GStreamer pipeline, so check the documentation for handling multiple sinks and sources.

Final Recommendations:

  • Tiling Cropped Objects: You’ll need to manually handle the output from the first network (OBB detection) to crop objects and create a composite image.
  • Synchronization Between Networks: Ensure that the two models run efficiently, and consider running the inference for both networks in parallel.
  • Profiling: Profile your pipeline to ensure you meet the 15 FPS requirement. Utilize the Raspberry Pi’s hardware acceleration and Hailo optimizations to keep inference latency under 300ms.

I hope this provides the detailed information you need to move forward. Let me know if you need more help or specific examples, especially with GStreamer plugin development or post-processing steps!

Best regards,
Omri

@omria
Thank you for your detailed explanations and clarifications. I am was little unsure about few things. But it’s clear now.

Although I have few issues, There is no obb model in the examples so that can use the post processing. how can I write the post processing part for yolo obb part. Don’t know where to start.
So, I am not sure if you are going to add post processing in near future or not.

And I am even struggling with pose_post processing for 2 points to get it working.

I am currently testing my basic idea in python using object-detection and pose-estimation.
Due to the post-processing part issues, I am unable to use obb and pose. Pose Post processing so file issue also restricting me to work only using python API. And I am also unable to use GStreamer currently due to this.

Hey @saurabh

I’m glad to hear that the explanations were helpful for you! It’s great that you’re ready to dive into implementing the post-processing steps.

First, let’s talk about Oriented Bounding Box (OBB) post-processing for YOLO. The key difference with OBB is that you’re dealing with rotated bounding boxes, so the model output includes an additional angle parameter. Here’s a simple example of how you can implement the post-processing:

def yolo_obb_postprocess(output, confidence_threshold=0.5):
    obb_detections = []
    
    for detection in output:
        confidence = detection[4]
        if confidence > confidence_threshold:
            x_center, y_center, width, height = detection[0:4]
            angle = detection[5]
            obb_detections.append((x_center, y_center, width, height, angle))
    
    return obb_detections

The key steps are:

  1. Extract the bounding box center, width, height, and angle from the model output.
  2. Filter detections based on a confidence threshold.
  3. Return the processed OBB detections.

Now, let’s move on to pose estimation post-processing. Here, you’ll be extracting keypoints (like body joints) from the model output. Each keypoint consists of an (x, y) coordinate pair. Here’s an example of how you can process the pose output:

def pose_postprocess(output, num_keypoints=2):
    keypoints = []
    for i in range(num_keypoints):
        x = output[i * 2]
        y = output[i * 2 + 1]
        keypoints.append((x, y))
    
    return keypoints

The main steps are:

  1. Iterate over the model output to extract (x, y) coordinate pairs for each keypoint.
  2. Return the list of keypoints.

Now, the tricky part is integrating these post-processing steps with the Hailo SDK and GStreamer. Since you mentioned that using the Hailo GStreamer elements is the preferred approach, let’s focus on that.

You can use the hailonet element to perform inference on the Hailo device, and then use the hailofilter element to apply your custom post-processing logic. Here’s a rough outline of how your GStreamer pipeline might look:

pipeline = Gst.parse_launch("""
    filesrc location=input.mp4 ! decodebin ! videoconvert !
    hailonet model=/path/to/compiled/model.hef !
    hailofilter script-path=/path/to/postprocess_script.py !
    videoconvert ! autovideosink
""")

The key steps are:

  1. Use the hailonet element to specify your compiled Hailo model.
  2. Use the hailofilter element and provide the path to your Python post-processing script.

In your post-processing script, you can include the yolo_obb_postprocess and pose_postprocess functions we discussed earlier, and apply them to the inference results.

I hope this clarifies things a bit and gives you a starting point for tackling the post-processing integration with the Hailo SDK and GStreamer.

If you have any more questions or need further assistance, don’t hesitate to ask. I’m here to help!

@omria
Thank you so much. But seems it will not help much, Since end nodes are changed, We will have write a the all model post processing operations. Which seems not straight forward.

But One interesting thing I saw in explanation is you have provided python script to hailofilter.
I believe this is incorrect? From what i can see here tappas/docs/elements/hailo_filter.rst at master · hailo-ai/tappas · GitHub

The correct GStreamer Element should be tappas/docs/elements/hailo_python.rst at master · hailo-ai/tappas · GitHub for python processing?

I will search for resources about post processing. And will update.
Thank you :slight_smile:

1 Like

Hi @saurabh,

Thanks for pointing that out! Let’s break down your questions and address them specifically.

1. End Nodes Changed:

You mentioned that the end nodes of your model have changed, which means you’ll need to rewrite the post-processing operations from scratch. This is indeed not straightforward but necessary when working with custom models, especially if their output structure (bounding boxes, keypoints, etc.) differs from the standard models in the examples.

To handle this:

  • You need to understand the exact structure of your model’s output (using tools like Netron to visualize the ONNX model).
  • Once you identify the new output format, you can write custom post-processing functions that extract the necessary information (bounding boxes, keypoints, or angles).

This can be a bit tricky, but once you know the output structure, it’s just a matter of translating that into a post-processing function.

2. hailofilter vs hailopython:

You’re absolutely right—hailofilter is not designed for running Python post-processing scripts. I mistakenly mentioned it in my earlier response.

For Python-based post-processing, you should indeed use hailopython, as it allows you to run Python scripts directly in the GStreamer pipeline.

Correct Usage:

  • hailopython is specifically designed for executing Python scripts during inference, whereas hailofilter is used for C-based post-processing libraries.
  • Your pipeline should use hailopython for Python post-processing:
hailopython script-path=/path/to/postprocess_script.py

This element will run your Python script, allowing you to handle the custom output nodes and any post-processing operations you need to apply.

Conclusion:

  • End Nodes Changed: Yes, you’ll need to rewrite the post-processing for your new output nodes, but once you identify the structure, the custom code can be written accordingly.
  • Correct GStreamer Element: You’re absolutely correct—hailopython is the element for Python post-processing, not hailofilter.

Thanks for bringing this up, and I hope this clears things up. Let me know if you need any more guidance with the custom post-processing or GStreamer setup!

Best regards,
Omri

@omria
I am trying to work with hailopython GStreamer Element. but not sure how to implement callback function that will be invoked by hailopython and with what arguments it will be invoked.

Need a example.

I am using this code get the raw output tensors… But it is always blank…No items.

    tensors = roi.get_tensors()
    has_tensors = roi.has_tensors()
    print(tensors, has_tensors)

just after this line.
hailo-rpi5-examples/basic_pipelines/pose_estimation.py at 123e6755d88583ccee6ebd4123de41b6868f1239 · hailo-ai/hailo-rpi5-examples · GitHub.

Same issue when I try to access inside hailopython gstreamer element function.

import gi
gi.require_version('Gst', '1.0')
from gi.repository import Gst, GLib
import hailo

def run(frame, **args):
    print('--------HAILO PYTHON STARTED------------------')
    print(frame)
    buffer = frame.buffer
    print(buffer)
    roi = hailo.get_roi_from_buffer(buffer)
    print(roi)
    # detections = roi.get_objects_typed(hailo.HAILO_DETECTION)
    tensors = roi.get_tensors()
    has_tensors = roi.has_tensors()
    objects = roi.get_objects()
    print(tensors, has_tensors)
    print(objects)
    
    print('+++++++++++++++++++HAILO PYTHON END++++++++++++++++++++++')
    return Gst.PadProbeReturn.OK

C++ Lib
In HailoMainObject these methods exists…
And HailoROI Class is inherited from the HailoMainObject …

Result
image

Found the solution of blank tensors issue. I can now get the tensors list.

We can’t use hailopython and hailofilter both. I think hailofilter removes the tensors metadata from buffer after processing.

So to use hailopython It’s important to keep the hailopython gst element before hailofilter or don’t use hailofilter.

I noticed hailopython does not removes the tensors metadata. It’s because my later hailofilter element is working fine.

Not Possible(tensors can not be retrived)

hailofilter function-name={post_function_name} so-path={postprocess_so} qos=false !  
hailopython module={module_path} function=run qos=false ! 

Possible (tensors can be retrived)

hailopython module={module_path} function=run qos=false !
hailofilter function-name={post_function_name} so-path={postprocess_so} qos=false !   

I think this is the line in hailofilter removing the tensors metadata.

So, I just want to understand, removing tensors metadata from buffer has any major significance.

I think keeping the tensors metadata will be great. As we will have flexibility to perform any operations later on. I can either use inside
gst element hailopython or identity python callback.

And for cascaded networks… tensors metadata will be overwritten automatically.

@omria
I am not sure how to convert the tensors(List[hailo.HailoTensor]) raw data to a numpy rep in hailopython python script …

def get_numpy_tensor(tensor):
    """
    Adapt a HailoTensorPtr to a numpy array (quantized).

    Args:
        tensor (HailoTensorPtr): Tensor object containing data, size, and shape.

    Returns:
        numpy.ndarray: A numpy array adapted from the tensor.
    """
    # Assuming tensor.data() returns the data as a byte buffer or similar,
    # tensor.size() gives the total number of elements,
    # and tensor.shape() returns the shape of the tensor as a tuple.
    data = np.frombuffer(tensor.data(), dtype=np.uint8, count=tensor.size())
    return data.reshape(tensor.shape())

def run(frame, **args):
    """
    Run Processing Steps on Detections.
    
    Resources:
        https://github.com/hailo-ai/tappas/blob/master/core/hailo/plugins/python/hailo_python_api_sanity.py
    
    Args:
        frame: gsthailo.video_frame.VideoFrame 
            (https://github.com/hailo-ai/tappas/blob/master/core/hailo/python/gsthailo/video_frame.py).

    Returns:
        Gst.FlowReturn.OK    
    """
    
    print('--------CALLBACK STARTED------------------')
    
    post_processing = PoseEstPostProcessing(
        max_detections=20,
        score_threshold=0.2,
        nms_iou_thresh=0.65,
        regression_length=15,
        strides=[8, 16, 32]
    )

    # roi = frame.roi
    # buffer = frame.buffer
    # video_info = frame.video_info
    # tensors = roi.get_tensors()
    # objects = roi.get_objects()
    
    roi = frame.roi
    tensors = roi.get_tensors()
    objects = roi.get_objects()
    
    for tensor in tensors:
        print(tensor.name())
        print(tensor.width())
        print(tensor.height())
        height = tensor.height()
        width = tensor.width()
        data = tensor.data()
        print(data)
        print(tensor.shape())
        # print(get_numpy_tensor(data))
        # vstream_info = tensor.vstream_info
    
    print('+++++++++++++++++++CALLBACK STARTED++++++++++++++++++++++')
    return Gst.FlowReturn.OK

Second Issue when I call the tensor.vstream_info() it throws error. But is defined tappas/core/hailo/plugins/python/hailo_python_api.cpp at daffd36ecab5110d47107255fd7ec4c779758f2e · hailo-ai/tappas · GitHub
I can not see the exact errors trace. because using gstreamer and it is just throwing errors. No clear information.

Hey @saurabh

1. Implementing the Callback Function with HailoPython

Below is an example of how to implement the callback function to properly handle the frame data, extract tensors, and convert them to NumPy arrays.

import gi
gi.require_version('Gst', '1.0')
from gi.repository import Gst, GLib
import hailo
import numpy as np

def get_numpy_tensor(tensor):
    """
    Convert a Hailo tensor to a numpy array.
    """
    data = np.frombuffer(tensor.data(), dtype=np.uint8, count=tensor.size())
    return data.reshape(tensor.shape())

def run(frame, **args):
    """
    Callback function invoked by the HailoPython GStreamer element.
    Processes tensors and detected objects from the frame.
    """
    print('--------CALLBACK STARTED------------------')

    # Extract ROI (Region of Interest) from the frame
    roi = frame.roi

    # Extract tensors and detected objects
    tensors = roi.get_tensors()
    objects = roi.get_objects()

    print(f"Tensors Found: {len(tensors)}, Objects Found: {len(objects)}")

    # Iterate over tensors and convert them to numpy arrays
    for tensor in tensors:
        print(f"Tensor Name: {tensor.name()}")
        print(f"Dimensions: {tensor.width()}x{tensor.height()}")
        np_tensor = get_numpy_tensor(tensor)
        print(f"Converted Tensor Data:\n{np_tensor}")

    print('+++++++++++++++++++CALLBACK ENDED++++++++++++++++++++++')
    return Gst.PadProbeReturn.OK

2. Key Changes and Clarifications

2.1 Metadata Management Issue

  • Problem: When using both HailoPython and HailoFilter in the same GStreamer pipeline, tensor metadata is removed by HailoFilter after processing, making it unavailable for subsequent elements.

  • Solution:

    • Place HailoPython before HailoFilter in the pipeline to ensure the tensors are available for processing.
    • Alternatively, if you don’t need HailoFilter, you can remove it from the pipeline to preserve tensor metadata.

2.2 Correct Pipeline Configuration

To avoid the loss of tensor metadata, structure the GStreamer pipeline as shown below:

  • Correct:
    hailopython module={module_path} function=run qos=false ! hailofilter function-name={post_function_name} so-path={postprocess_so} qos=false

  • Incorrect:
    hailofilter function-name={post_function_name} so-path={postprocess_so} qos=false ! hailopython module={module_path} function=run qos=false

By following this order, HailoPython will have access to the tensors, and HailoFilter will not interfere with metadata retrieval.


3. Handling Tensor Conversion to NumPy

The updated code extracts tensors correctly and converts them into NumPy arrays using the following logic:

data = np.frombuffer(tensor.data(), dtype=np.uint8, count=tensor.size())
np_tensor = data.reshape(tensor.shape())

This ensures you can easily manipulate tensor data for further processing or analysis.


4. Debugging Tensor Data Access and Errors

If you encounter issues such as:

  • Tensor data being unavailable
  • Errors when calling vstream_info()

Use the GST_DEBUG=5 environment variable to enable detailed logging for the GStreamer pipeline. This will provide more insight into the root cause of errors and help identify any interruptions in data flow.

GST_DEBUG=5 python your_gstreamer_script.py

I hope this helps resolve the issue. If you need any further assistance, please don’t hesitate to ask.

Best regards,
Omri

@omria
I have tried all the above mentioned codes… Unfortunately It does not work.

The correct way to access tensors are mentioned in docs. I found later on…

tensor = roi.get_tensor('layer_name')
tensor_np = np.array(tensor)

Although I have different Issue. Can you please take a look on that…