Pose_estimation example runs on hailo-8r

I ran the pose estimation example from the Hailo GitHub repository(Hailo-Application-Code-Examples/tree/main/runtime/python/pose_estimation) on a board with the Hailo-8r. When there is only one person in the image, I observed that multiple bounding boxes and keypoints were drawn around that person.
After changing the nms_max_output_per_class value to 1, only one bounding box was drawn, and the keypoints were all correctly located. However, when I tested the script with an image containing multiple people, only one person had a bounding box and keypoints detected.

Could you please advise on how to modify the Python script so that it outputs one bounding box and the corresponding keypoints for each person in the image?

Hey @MaHG,

Welcome to the Hailo Community!


Let’s dive into the issues you mentioned by tweaking the post-processing steps in the script. Here are some recommendations to enhance the detection and keypoint estimation for multiple people:

  1. Adjust NMS Parameters: Instead of limiting nms_max_output_per_class to 1, consider using a higher value to allow multiple detections. Ensure this is balanced with appropriate NMS settings to avoid duplicate detections.

  2. Modify the NMS Function: The current NMS implementation might be overly aggressive. We can fine-tune it to better accommodate multiple people in the frame.

  3. Adjust Confidence Thresholds: Fine-tuning the confidence thresholds for both detection and keypoints can help in striking a balance between detecting all individuals and avoiding false positives.

Here’s a modified version of the non_max_suppression function in yolov8_pose_utils.py:

def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45,
                        max_det=300, n_kpts=17):
    """Non-Maximum Suppression (NMS) on inference results to reject overlapping detections"""
    # ... (existing code until the NMS part)

    for xi, x in enumerate(prediction):  # image index, image inference
        # ... (existing code until the NMS part)

        # Apply NMS
        keep = nms(preds, iou_thres)
        if keep.shape[0] > max_det:
            keep = keep[:max_det]

        out = x[keep]
        scores = out[:, 4]
        boxes = out[:, :4]
        kpts = out[:, 6:]
        kpts = np.reshape(kpts, (-1, n_kpts, 3))

        out = {'bboxes': boxes,
               'keypoints': kpts,
               'scores': scores,
               'num_detections': int(scores.shape[0])}

        output.append(out)
    return output

Next, let’s make some adjustments to the yolov8_pose_inference.py script:

  1. Update the kwargs Dictionary:
kwargs = {
    'classes': 1,
    'nms_max_output_per_class': 100,  # Increased from 300 to allow more detections
    'anchors': {'regression_length': 15, 'strides': [8, 16, 32]},
    'score_threshold': 0.25,  # Increased from 0.001 to filter out low-confidence detections
    'nms_iou_thresh': 0.45,  # Adjusted from 0.7 to allow for closer detections
    'meta_arch': 'nanodet_v8',
    'device_pre_post_layers': None
}
  1. Modify the visualize_pose_estimation_result Function Call in the Main Loop:
image = Image.fromarray(cv2.cvtColor(visualize_pose_estimation_result(results, processed_image, detection_threshold=0.25, joint_threshold=0.2, **kwargs), cv2.COLOR_BGR2RGB))

These adjustments should enhance the detection and visualization of multiple individuals in the image. The key changes include:

  1. Increasing nms_max_output_per_class to permit more detections.
  2. Adjusting the score_threshold to filter out low-confidence detections.
  3. Fine-tuning the nms_iou_thresh to balance detecting closely positioned people and avoiding duplicates.
  4. Adding detection_threshold and joint_threshold parameters to the visualization function for better control over which detections and keypoints are displayed.

Note: I haven’t tested these changes myself, but this approach should help. Give these modifications a try and see if they improve the results for both single-person and multi-person images. You may need to further tweak the thresholds based on your specific use case and the characteristics of your images.


Best Regards

Thank you for your quick and detailed response.

Adjusting the ‘score_threshold’, ‘nms_iou_thresh’, ‘detection_threshold’, and ‘joint_threshold’ does help to improve the display of the final detection results.
However, the effect is still not ideal. For example, sometimes the model mistakenly identifies a person’s forearm as a whole arm of another person, and then infers and draws other keypoints for this non-existent person. Additionally, the confidence score for this non-existent person can reach 0.92, which is only 0.02 lower than the confidence score (0.94) of the real person who shares the same arm. The larger the nms_iou_thresh value, the more false keypoints are generated.
I’m using ‘yolov8s_pose_mz.hef’, and suspect this might be a limitation of the model’s detection capability.