Running Yolo Pose with nms postprocessing on Hailo8

Aaron_Frist · June 25, 2025, 8:11pm

I have been trying to run the yolov8_pose.hef model on a Raspberry Pi 5 with Hailo8 device for inference. Currently, all postprocessing is being done on the CPU from the raw output tensors of the model, which ends up to be quite costly in computation. This postprocessing time ends up taking more time than the inferencing itself. This is contrary to the object detection counterpart yolov8m.hef, which seems to integrate the nms postprocessing step into the hailo hef file, which leads to much faster pipeline speed. I am trying to run this as a loop with live camera feed, so fast computation from start to finish is ideal.

I see the line nms_postprocess("../../postprocess_config/yolov8m_nms_config.json", meta_arch=yolov8, engine=cpu) listed in the yolov8m.alls file in hailo_model_zoo for compilation, but I do not understand how the same can be applied to the pose estimation model since they have different model structures and output tensor shapes. Putting this step onto the Hailo device would likely double the overall speed of pose estimation.

Please let me know if there are steps I am missing, or if nms is simply unsupported on Hailo architecture for pose estimation currently. It feels like it should be transferrable to pose estimation though. Thanks for any help!

KlausK · June 25, 2025, 8:17pm

Welcome to the Hailo Community!

As you can see from the engine parameter. The post-processing with still run on the host but under the control of HailoRT.

You will need to run the post-processing under control of your application not HailoRT. The feature is not designed to be extended.

Aaron_Frist · June 25, 2025, 11:44pm

Thanks for your response! This makes sense, and is kind of what I was assuming. Does this mean that if I were to compile the yolov8m object detection model without the nms_postprocess line in the alls file, would that imply the inferencing on the hailo would return much faster? So the inferencing happens only for a short amount of time, and the rest spent on CPU time doing postprocessing? This would be pretty much the same as object detection, though I don’t understand why it is implemented for detection but not pose estimation.

I am trying to see what works best for performance speed, but I guess there is not really a way to configure the pose model for better performance, or to optimize the costly postprocessing which has become relevant with the speed of the Hailo inferencing..

KlausK · June 26, 2025, 1:28am

Yes, you would get the results faster but would need to do the post-processing in the application. The time difference depends on the host CPU performance.

The post-processing time depends on the host CPU performance and the implementation e.g. Python is usually slower than a CPP version.

That is a development resource question. The YOLO object detection models are a lot more popular so we implemented the NMS to make them even easier to use.

Topic		Replies	Views
Custom python Post processing function in gstreamer performance issues General gstreamer , gsthailopython , python	37	319	January 25, 2025
Best approach for multi-model video inference in rpi General raspberry-pi	1	136	January 13, 2025
How can I write alls script for yolov8n_pose General	2	221	September 30, 2024
Running inference on 2 Rpi5 camera feeds General raspberry-pi	10	668	February 21, 2025
YOLOv8 Pose Custom Model – Inference Check and Optimization Tips General dfc , hailort , hailo8	1	57	May 16, 2025

Running Yolo Pose with nms postprocessing on Hailo8

Related topics