16-bit quantization on final layers

Hello,

I want to use the precompiled hef file from the model zoo with 16-bit quantization instead of 8-bit (that comes stock on the hailo-rpi5-examples) for reduced jittering:

(16 bit hef format):
Architecture HEF was compiled for: HAILO8L
Network group name: yolov8s_pose, Multi Context - Number of contexts: 4
Network name: yolov8s_pose/yolov8s_pose
VStream infos:
Input yolov8s_pose/input_layer1 UINT8, NHWC(640x640x3)
Output yolov8s_pose/conv70 UINT8, FCR(20x20x64)
Output yolov8s_pose/conv71 UINT8, NHWC(20x20x1)
Output yolov8s_pose/conv72 UINT16, FCR(20x20x51)
Output yolov8s_pose/conv57 UINT8, FCR(40x40x64)
Output yolov8s_pose/conv58 UINT8, NHWC(40x40x1)
Output yolov8s_pose/conv59 UINT16, FCR(40x40x51)
Output yolov8s_pose/conv43 UINT8, FCR(80x80x64)
Output yolov8s_pose/conv44 UINT8, FCR(80x80x1)
Output yolov8s_pose/conv45 UINT16, FCR(80x80x51)
(venv_hailo_rpi5_examples) (.openmmlab) obh@raspberrypi:~/hailo-rpi5-examples $ hailortcli parse-hef resources/yolov8s_pose_h8l_pi.hef

(8 bit hef format):
Architecture HEF was compiled for: HAILO8L
Network group name: yolov8s_pose, Multi Context - Number of contexts: 4
Network name: yolov8s_pose/yolov8s_pose
VStream infos:
Input yolov8s_pose/input_layer1 UINT8, NHWC(640x640x3)
Output yolov8s_pose/conv70 UINT8, FCR(20x20x64)
Output yolov8s_pose/conv71 UINT8, NHWC(20x20x1)
Output yolov8s_pose/conv72 UINT8, FCR(20x20x51)
Output yolov8s_pose/conv57 UINT8, FCR(40x40x64)
Output yolov8s_pose/conv58 UINT8, NHWC(40x40x1)
Output yolov8s_pose/conv59 UINT8, FCR(40x40x51)
Output yolov8s_pose/conv43 UINT8, FCR(80x80x64)
Output yolov8s_pose/conv44 UINT8, FCR(80x80x1)
Output yolov8s_pose/conv45 UINT8, FCR(80x80x51)

Due to the different output format (16-bit vs 8-bit) the post processing on gstreamer doesnt work. As a result I can’t get the keypoints, just the bounding box. Can you help with that?

Thanks

Hi @neoklisv,

For that, you would need to change this file and then re-compile it.

An easier solution would be to perform more complex post-quant algorithms during the model optimization, such as adaround.