Running Yolo Pose with nms postprocessing on Hailo8

I have been trying to run the yolov8_pose.hef model on a Raspberry Pi 5 with Hailo8 device for inference. Currently, all postprocessing is being done on the CPU from the raw output tensors of the model, which ends up to be quite costly in computation. This postprocessing time ends up taking more time than the inferencing itself. This is contrary to the object detection counterpart yolov8m.hef, which seems to integrate the nms postprocessing step into the hailo hef file, which leads to much faster pipeline speed. I am trying to run this as a loop with live camera feed, so fast computation from start to finish is ideal.

I see the line nms_postprocess("../../postprocess_config/yolov8m_nms_config.json", meta_arch=yolov8, engine=cpu) listed in the yolov8m.alls file in hailo_model_zoo for compilation, but I do not understand how the same can be applied to the pose estimation model since they have different model structures and output tensor shapes. Putting this step onto the Hailo device would likely double the overall speed of pose estimation.

Please let me know if there are steps I am missing, or if nms is simply unsupported on Hailo architecture for pose estimation currently. It feels like it should be transferrable to pose estimation though. Thanks for any help!

Welcome to the Hailo Community!

As you can see from the engine parameter. The post-processing with still run on the host but under the control of HailoRT.

You will need to run the post-processing under control of your application not HailoRT. The feature is not designed to be extended.

Thanks for your response! This makes sense, and is kind of what I was assuming. Does this mean that if I were to compile the yolov8m object detection model without the nms_postprocess line in the alls file, would that imply the inferencing on the hailo would return much faster? So the inferencing happens only for a short amount of time, and the rest spent on CPU time doing postprocessing? This would be pretty much the same as object detection, though I don’t understand why it is implemented for detection but not pose estimation.

I am trying to see what works best for performance speed, but I guess there is not really a way to configure the pose model for better performance, or to optimize the costly postprocessing which has become relevant with the speed of the Hailo inferencing..

Yes, you would get the results faster but would need to do the post-processing in the application. The time difference depends on the host CPU performance.

The post-processing time depends on the host CPU performance and the implementation e.g. Python is usually slower than a CPP version.

That is a development resource question. The YOLO object detection models are a lot more popular so we implemented the NMS to make them even easier to use.