I’ve managed to train a custom yolov8s_pose model which detects the 4 corners of a bed. I’ve compiled it to HEF and it’s running successfully on the hailo 8 on a raspberry pi.
I’m running it through gstreamer and using a python script for the postprocessing.
I found this post and I managed to extract the keypoint data.
The thing is that this runs really slow. We’re talking 10 FPS. I managed to get it to around 20 fps with some improvements to the nms
function but it’s still not fast enough.
Is this just a limitation of python or is there a better way to do this?
Ultimately I’d like to detect the bed and the person in the same model so will need a way to do the post processing with multiple classes but I’ll figure that out later.
The run function looks like this:
def run(video_frame: VideoFrame):
class_num = 1
regression_length=15
raw_detections_keys = list(tensor.name() for tensor in video_frame.roi.get_tensors())
raw_detections = {tensor.name(): np.expand_dims(np.array(tensor), axis=0) for tensor in video_frame.roi.get_tensors()}
layer_from_shape = {raw_detections[key].shape: key for key in raw_detections_keys}
detection_output_channels = (regression_length + 1) * 4 # (regression length + 1) * num_coordinates
keypoints = 12 # 3 * number of corners of bed
endnodes = [
raw_detections[layer_from_shape[1, 20, 20, detection_output_channels]],
raw_detections[layer_from_shape[1, 20, 20, class_num]],
raw_detections[layer_from_shape[1, 20, 20, keypoints]],
raw_detections[layer_from_shape[1, 40, 40, detection_output_channels]],
raw_detections[layer_from_shape[1, 40, 40, class_num]],
raw_detections[layer_from_shape[1, 40, 40, keypoints]],
raw_detections[layer_from_shape[1, 80, 80, detection_output_channels]],
raw_detections[layer_from_shape[1, 80, 80, class_num]],
raw_detections[layer_from_shape[1, 80, 80, keypoints]]
]
predictions_dict = extract_pose_estimation_results(endnodes, 640, 640, class_num)
return Gst.FlowReturn.OK
Everything else is basically as it was in here except for the keypoint counts and no more self
as it’s now functional.