Custom python Post processing function in gstreamer performance issues

saurabh · January 24, 2025, 4:32pm

Now i feel you’re having same issue then. I will ask you to remove 16bits precision for keypoints. Then rebuild and try in a new environment.
You can make all 16bits but python API is not handling properly those still. I believe they have not taken a look at this issue. I used cpp post processing for 16 bits with some extra customization.

saurabh · January 24, 2025, 4:39pm

FYI, i used this post processing Hailo-Application-Code-Examples/runtime/python/pose_estimation/pose_estimation_utils.py at main · hailo-ai/Hailo-Application-Code-Examples · GitHub

I replaced 17(body kepoints) → my custom number of keypoints
And 51(17 x3) → my custom number of keypoints x 3.

And it worked.

One more point- the deafult model built and provided by hailo are in fully 8 bits precision mode. You can check their pose model using cli.

shashi · January 24, 2025, 4:39pm

@rosslote
YOLO models detect objects at three scales: large, medium, and small. If we consider the bounding boxes, you can see there are three outputs: 20x20x1 (large), 40x40x1 (medium) and 80x80x1 (small). These output sizes are related to input sizes with the stride parameter. In your case input size is 640x640 and yolov8 uses 3 strides: 32,16 and 8. Hence, outputs are 20x20, 40x40 and 80x80. Hope this clarifies.
The way you performed dequantization is correct but since keypoints are all negative, I think they are actually not in UINT16 format as printed by output info. Hence, you are getting all negative points as the zero point used for keypoints is a large number. We just need to find the correct zero point for keypoints.

rosslote · January 24, 2025, 4:42pm

For the cpp postprocessor Would it be enough to just change the values to match my own model?

Changing the 17 to 4 and adjusting the JOINT_PAIRS?

rosslote · January 24, 2025, 4:43pm

That’s exactly what I have done.

saurabh · January 24, 2025, 4:57pm

Yes, You can do same thing in cpp post process but will have to build the model with fully 8 bits precision mode.
I will say first try with 8 bits and if you find you need more accuracy then may be you can try fully 16bits. (But for 16 bits you will have to do some more customization, but it is easy)

saurabh · January 24, 2025, 4:59pm

Are you working on raspberry pi 5 ?

rosslote · January 24, 2025, 5:02pm

Yes. It’s likely I will need more accuracy as the use case requires as much precision as possible.

rosslote · January 24, 2025, 5:30pm

Does this mean you got it working with 16bit for python or was that premature?

rosslote · January 24, 2025, 5:31pm

Do you know of a way to make 16bit work with python or should I just compile the model to 8bit?

saurabh · January 24, 2025, 5:44pm

No i was able to use 16bits in cpp post process. In python only 8 bits.

saurabh · January 24, 2025, 5:51pm

This is something you can take a look for 16bits cpp…

I have not taken a look from a long time. I might have to take a look on my project for more details…

saurabh · January 24, 2025, 5:57pm

This 16bit working with python processing. But I can not use that approach. Due to speed and higher cpu usage(I have tested).

… i don’t remember exactly but i think it will be fully 16bits then. And passing the HAILO_FORMAT_TYPE_UINT16 in gstreamer pipeline.

shashi · January 24, 2025, 7:43pm

@rosslote
Since you were able to run your hef file with our PySDK, it means that there is a way to make it work. However, I am not sure I know an easy way to debug your code. We use Hailo’s runtime in C++ and get the quantization parameters. We have not experimented with their python bindings and do not know how to get these parameters and verify with the postprocessing code.

rosslote · January 24, 2025, 11:03pm

@saurabh @shashi
I’ve managed to get it working properly now with the model compiled to 8bit. I’d like to try with 16bit and use a cpp postprocessor. Currently I need to run this and a standard pose model in the same frame so the performance is a bit of an issue. The end goal is to train a yolo pose model to detect both the human pose and the bed corners as two classes in the same model but this will have to do for now.

saurabh · January 25, 2025, 9:13am

Good to hear.
I had also planned something similar to you initially. I planned to build the body key-points and my custom object key-points detection in same model.
But I am still at a beginner level in ML things, and am not aware of many things yet, so later I ended up using different approach with building a plugin for gstreamer using opencv.

But If you are unable to find any proper guide about this.
Then I can recommend you another way to build…

Build 2 separate pose models(one for body and another for bed).
Split your source into 2 separate sub-pipeline and apply hailo inference independently in parallel (use your models). and get the result back into python and can do whatever you want to do with points.

But if you want to view the points in a single image frame, Instead of using the hailooverlay you can get bboxes, keypoints in python you can build your own rendering.(It depends what and how you’re building). May be you can find something in hailo gstreamer plugins. tappas/core/hailo/plugins at master · hailo-ai/tappas · GitHub

rosslote · January 25, 2025, 10:29am

Yeah, it sounds like we’re going down the same road.

My gstreamer pipeline currently looks something like this:

"""
...
 ! queue name=hailo_pre_split leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0
 ! tee name=splitter hailomuxer name=muxer

 ! queue name=hailo_draw0 leaky=no max-size-buffers=30 max-size-bytes=0 max-size-time=0
 ! hailooverlay qos=false
 ! queue name=identity_callback_q leaky=no max-size-buffers=3 max-size-bytes=0 max-size-time=0
 ! identity name=identity_callback
 ! queue name=queue_videoconvert leaky=no max-size-buffers=3 max-size-bytes=0 max-size-time=0
 ! videoconvert n-threads=3 qos=false
 ! queue name=queue_hailo_display leaky=no max-size-buffers=3 max-size-bytes=0 max-size-time=0
 ! fpsdisplaysink name=display video-sink="xvimagesink" sync=true
 splitter.
    { POSE_DETECTION_PIPELINE }
    ! muxer.
 splitter.
    { BED_DETECTION_PIPELINE }
    ! muxer.
"""
"""

We actually don’t need the overlay other than to verify that the models are working correctly.

We’re in the process of annotating a load of video frames in a tool called cvat to attempt a pose model with two classes. We originally used roboflow to annotate the bed but when we trained the model we ended up losing the original human pose class so we realised we need to also annotate all of the human poses in the frames. This is why we moved to cvas because they have a feature where you can annotate the frames automatically for existing models. This way we can run auto annotation on all of the frames for human pose and then just focus on the training for the classes we’re adding.

We’re also totally new to this but it’s pretty engaging stuff and the community is really helping a lot.

I think when I get the entire thing running I’ll write up a full tutorial covering all of the issues I faced.

saurabh · January 25, 2025, 11:01am

I also used CVAT initially during tests. But after sometime I moved to local with Label Studio https://labelstud.io/. Because of Cost and to keep data private. It also offers partially automation.
I can build model → Annotate using new model(new frames) → Re build model → Annotate using new model(new frames) and so on.

We had to hold the project after tests for some reasons, But in coming months we will start again.

I also learned a lot from hailo examples. One big thing for me is Gstreamer, I learned how to build the plugins and now I am using this in another projects(without hailo). This solved my all kind performance issues vs using python opencv.

Topic		Replies	Views
Best approach for multi-model video inference in rpi General raspberry-pi	1	131	January 13, 2025
Multiple Network schedular post processing function for segmentation task General gstreamer , tappas , hailo8	0	56	December 9, 2024
How can I write alls script for yolov8n_pose General	2	207	September 30, 2024
Extracting Keypoints from a Custom Dataset Model for Pose Estimation General dfc , hailort , raspberry-pi	1	228	August 18, 2024
Hey I want to build my own custom postprocessing .so General	8	734	October 26, 2024

Custom python Post processing function in gstreamer performance issues

Related topics