DFC : v.3.32.0
HailoRT: v4.22.0
Hi Hailo Community,
For weeks I am trying to solve the following problem: I am trying to compile a customized version of Xfeat (Feature extractor). The original architecture is not suitable for hailo quantization but managed using the following steps of my Python Toolchain:
- Input: image of dim 1x608x800x1 where H,W are fixed
- Output:
- Feature tensor (1x76x100x64)
feats→ feeds into heatmap - Keypoints tensor (1x76x100x65)
kpts(directly fed by input) - Heatmap (1x76x100x1)
- Feature tensor (1x76x100x64)
- Sanitize ONNX (add kernel shape attributes to conv layers)
- Export sanitized ONNX to HAR and compare to origninal ONNX using test data → Looks good
- Floating point optimization of HAR and adding normalization layer as I want to feed uint8 images directly into model using (Results are good)
model_script_commands = [
"normalization1 = normalization([0.0], [255.0])\n",
]
self._client_runner.load_model_script("".join(model_script_commands))
- Optimize (quantize model) with:
Model script commands:
model_optimization_flavor(optimization_level=4, compression_level=0, batch_size=8)
quantization_param(accelerated_features_hailo_inferred_sanitized/output_layer1, precision_mode=a16_w16)
quantization_param(accelerated_features_hailo_inferred_sanitized/output_layer2, precision_mode=a16_w16)
quantization_param(accelerated_features_hailo_inferred_sanitized/output_layer3, precision_mode=a16_w16)
quantization_param(accelerated_features_hailo_inferred_sanitized/conv6, force_range_out=[-10.0, 10.0])
quantization_param(accelerated_features_hailo_inferred_sanitized/avgpool4, force_range_out=[-10.0, 10.0])
quantization_param(accelerated_features_hailo_inferred_sanitized/ew_add1, force_range_out=[-10.0, 10.0])
quantization_param(accelerated_features_hailo_inferred_sanitized/conv13, force_range_out=[-10.0, 10.0])
quantization_param(accelerated_features_hailo_inferred_sanitized/conv17, force_range_out=[-10.0, 10.0])
quantization_param(accelerated_features_hailo_inferred_sanitized/ew_add2, force_range_out=[-10.0, 10.0])
quantization_param(accelerated_features_hailo_inferred_sanitized/conv16, force_range_out=[-10.0, 10.0])
quantization_param(accelerated_features_hailo_inferred_sanitized/conv21, force_range_out=[-10.0, 10.0])
quantization_param(accelerated_features_hailo_inferred_sanitized/ew_add3, force_range_out=[-10.0, 10.0])
model_optimization_config(globals, output_encoding_vector=enabled)
allocator_param(enable_muxer=False)
quantization_param({conv*}, precision_mode=a16_w16)
model_optimization_config(calibration, batch_size=8, calibset_size=256)
pre_quantization_optimization(layer_norm_decomposition, mode=nn_core, eq_consumer=False)
pre_quantization_optimization(activation_clipping, layers={*}, mode=percentile, clipping_values=[0.01, 99.99])
post_quantization_optimization(finetune, policy=enabled, learning_rate=1.0e-4, batch_size=8, epochs=4, dataset_size=2048)
Note that this is a first try and force_range_out=[-10.0, 10.0] on certain layers was chosen on layers that feed into element wise add operations, see this post here. Also I chose per-channel-scales as model_optimization_config(globals, output_encoding_vector=enabled).
- Compare original ONNX to quantized HAR → Kind of OK (?)
Here an output of the first 5 values plus mean and max error over entire output tensors:
+-----------+----------------------------------------------------+----------------------------------------------------+-----------+------------+
| Tensor | ONNX first 5 | Hailo first 5 | Max diff | Mean diff |
+-----------+----------------------------------------------------+----------------------------------------------------+-----------+------------+
| features | [ 1.79381, -0.31311, 0.31845, 0.27129, 0.2671 ] | [ 1.56761, -0.38633, 0.5146 , 0.25054, 0.4213 ] | 11.8055 | 0.092895 |
| keypoints | [-1.84859, -3.11717, -1.93641, -2.12238, -1.73115] | [-1.79705, -2.98431, -1.7902 , -1.99766, -1.72035] | 3.40494 | 0.198497 |
| heatmap | [0.14378, 0.14145, 0.13122, 0.11162, 0.09499] | [0.14063, 0.13501, 0.11869, 0.10364, 0.09165] | 0.0654779 | 0.00188323 |
+-----------+----------------------------------------------------+----------------------------------------------------+-----------+------------+
Quantization noise of output layers:
Output layers signal-to-noise ratio (SNR): measures the quantization noise (higher is better)
accelerated_features_hailo_inferred_sanitized/keypoints SNR: 22.02 dB
accelerated_features_hailo_inferred_sanitized/heatmap SNR: 31.8 dB
accelerated_features_hailo_inferred_sanitized/features SNR: 21.42 dB
I think everything is still problematic but my focus is on the features. Possibly heatmap looks better than it actually is since the output scales correspond to a sigmoid activation, hence they are in [0.0, 1.0].
Even more problematic: When I infer with my HEF on actual hardware in C++ similarly to the vstreams C++ example (vstreams_example.cpp) I see a significant mismatch between my dequantized output (I leave dequantization to Hailo as I am asking for FLOAT32 in NHWC output format):
HAR (expected) : 1.567613 -0.386329 0.514596 0.250543 0.421296
HEF feat (first 5 NHWC flat): -0.907798 -0.0109657 0.377234 -0.0336807 0.33477
HEF feat (first 5 NHWC flat): -0.912227 -0.0194008 0.465392 -0.0357428 0.33477
HEF feat (first 5 NHWC flat): -0.910013 -0.0151832 0.393635 -0.0329934 0.339921
HAR (expected) : -1.797053 -2.984307 -1.790201 -1.997657 -1.720350
HEF kpts (first 5 NHWC flat): -1.79228 -2.97407 -1.77757 -1.98365 -1.71702
HEF kpts (first 5 NHWC flat): -1.79228 -2.97407 -1.77757 -1.98365 -1.71702
HEF kpts (first 5 NHWC flat): -1.79228 -2.97407 -1.77757 -1.98365 -1.71702
HAR (expected) : 0.140629 0.135014 0.118686 0.103641 0.091647
HEF Heatmap (first 5 NHWC flat): 0.146641 0.140934 0.140721 0.139775 0.139439
HEF Heatmap (first 5 NHWC flat): 0.14655 0.140477 0.140477 0.139592 0.13947
HEF Heatmap (first 5 NHWC flat): 0.146062 0.140843 0.140507 0.139592 0.139286
My questions:
- Would you deem the quantization resuts OK as a first shot? I am doing anything significantly wrong?
- I would expect to see very similar results comparing HEF vs quantized HAR outputs - independent of whether the HAR is a good quantization. Side info: I observe similar uint16 outputs but each inference run the HEF uint16 is slightly different for heatmap and features - funnily, not for the keypoints. Is this normal?
In case it helps: output of hailortcli parse-hef:
Architecture HEF was compiled for: HAILO8L
Network group name: accelerated_features_hailo_inferred_sanitized, Multi Context - Number of contexts: 4
Network name: accelerated_features_hailo_inferred_sanitized/accelerated_features_hailo_inferred_sanitized
VStream infos:
Input accelerated_features_hailo_inferred_sanitized/input_layer1 UINT8, NHWC(608x800x1)
Output accelerated_features_hailo_inferred_sanitized/conv9 UINT16, FCR(76x100x65)
Output accelerated_features_hailo_inferred_sanitized/conv27 UINT16, FCR(76x100x1)
Output accelerated_features_hailo_inferred_sanitized/conv24 UINT16, NHWC(76x100x64)
conv9 ~ keypoints (Format FCR?)
conv27 ~ heatmap (Format FCR?)
conv24 ~ features (Format NHWC?)
Sorry for the long post but I have the feeling this should be possible. But as I am not an export on quantization topics I may have missed some important details and I am definitely at a point at which I need help.
Cheers,
Konrad