How can I decrease CPU utilization while running pose-estimation on TI's TDA4

I am running Tappas pose-estimation demo on TI’s TDA4VM and experience some high CPU utilization.
Is this demo optimized for the TDA4? Is there is any way offloading the CPU to improve it utilization?

Tappas pose-esimation demo, as any of the other demos is not optimized for a specific h\w platform, meaning that the different gstreamer elements are running on the CPU and not on the platform DSP or GPU.
In case of the TDA4 we can offload some elements from the CPU to the TDA4 DSP, for example we can use tiovxcolorconvert instead of videoconvert and tiovxmultiscaler instead of videoscale.
Here is an example how it can be done:
gst-launch-1.0 v4l2src device=/dev/video2 name=src_0 ! video/x-raw,format=YUY2,width=1280,height=720,framerate=30/1 ! queue leaky=downstream max-size-buffers=5 max-size-bytes=0 max-size-time=0 ! tiovxcolorconvert ! video/x-raw,format=NV12 ! tiovxmultiscaler ! video/x-raw,width=640,height=640 ! tiovxcolorconvert ! video/x-raw,pixel-aspect-ratio=1/1,format=RGBx ! queue ! videoconvert name=pre_hailonet_videoconvert n-threads=2 qos=false ! queue ! hailonet hef-path=/home/root/apps/pose_estimation/resources/centerpose_regnetx_1.6gf_fpn.hef vdevice-key=1 debug=False is-active=true batch-size=1 ! queue ! hailofilter so-path=/usr/lib/hailo-post-processes/libcenterpose_post.so qos=false function-name=centerpose ! queue ! hailooverlay qos=false ! queue ! videoconvert name=sink_videoconvert n-threads=1 qos=false ! queue ! fpsdisplaysink video-sink=autovideosink name=hailo_display sync=false text-overlay=false

We’d recommend checking TI’s Gstreamer plugins documentation before using it for any limitations and correct use (e.g. tiovxmultiscaler can only down scale resolution).