16-bit quantization on final layers

neoklisv · August 5, 2024, 6:34pm

Hello,

I want to use the precompiled hef file from the model zoo with 16-bit quantization instead of 8-bit (that comes stock on the hailo-rpi5-examples) for reduced jittering:

(16 bit hef format):
Architecture HEF was compiled for: HAILO8L
Network group name: yolov8s_pose, Multi Context - Number of contexts: 4
Network name: yolov8s_pose/yolov8s_pose
VStream infos:
Input yolov8s_pose/input_layer1 UINT8, NHWC(640x640x3)
Output yolov8s_pose/conv70 UINT8, FCR(20x20x64)
Output yolov8s_pose/conv71 UINT8, NHWC(20x20x1)
Output yolov8s_pose/conv72 UINT16, FCR(20x20x51)
Output yolov8s_pose/conv57 UINT8, FCR(40x40x64)
Output yolov8s_pose/conv58 UINT8, NHWC(40x40x1)
Output yolov8s_pose/conv59 UINT16, FCR(40x40x51)
Output yolov8s_pose/conv43 UINT8, FCR(80x80x64)
Output yolov8s_pose/conv44 UINT8, FCR(80x80x1)
Output yolov8s_pose/conv45 UINT16, FCR(80x80x51)
(venv_hailo_rpi5_examples) (.openmmlab) obh@raspberrypi:~/hailo-rpi5-examples $ hailortcli parse-hef resources/yolov8s_pose_h8l_pi.hef

(8 bit hef format):
Architecture HEF was compiled for: HAILO8L
Network group name: yolov8s_pose, Multi Context - Number of contexts: 4
Network name: yolov8s_pose/yolov8s_pose
VStream infos:
Input yolov8s_pose/input_layer1 UINT8, NHWC(640x640x3)
Output yolov8s_pose/conv70 UINT8, FCR(20x20x64)
Output yolov8s_pose/conv71 UINT8, NHWC(20x20x1)
Output yolov8s_pose/conv72 UINT8, FCR(20x20x51)
Output yolov8s_pose/conv57 UINT8, FCR(40x40x64)
Output yolov8s_pose/conv58 UINT8, NHWC(40x40x1)
Output yolov8s_pose/conv59 UINT8, FCR(40x40x51)
Output yolov8s_pose/conv43 UINT8, FCR(80x80x64)
Output yolov8s_pose/conv44 UINT8, FCR(80x80x1)
Output yolov8s_pose/conv45 UINT8, FCR(80x80x51)

Due to the different output format (16-bit vs 8-bit) the post processing on gstreamer doesnt work. As a result I can’t get the keypoints, just the bounding box. Can you help with that?

Thanks

nina-vilela · August 11, 2024, 2:20pm

Hi @neoklisv,

For that, you would need to change this file and then re-compile it.

An easier solution would be to perform more complex post-quant algorithms during the model optimization, such as adaround.

Topic		Replies	Views
Can I run inference on a model that was quantized to have fully 16-bit weights? General python , cpp	0	286	April 9, 2024
Optimization/quantization problem General	1	143	March 5, 2025
Quantization YOLOv10s to UINT16 for Hailo8L General dfc , modelzoo , yolo	4	85	September 18, 2024
Where to quantize inputs General raspberry-pi , hailo8	3	323	September 25, 2024
Facing issue for onnx to hef for custom trained model General dfc	1	631	August 25, 2024

16-bit quantization on final layers

Related topics