Strange behavior with quantization

Hi
I’m using the Python API to quantize a ViT network.

Here is a portion of the code:

model_script_lines = [f"normalization_rule1 = normalization({mean}, {sdev})\n",
                      "model_optimization_config(calibration, batch_size=1, calibset_size=128)\n",
                      "quantization_param({*conv*}, precision_mode=a16_w16)\n",
                      "quantization_param({*dw*}, precision_mode=a16_w16)\n",
                      "quantization_param({*fc*}, precision_mode=a16_w16)\n",
                      ]
runner = ClientRunner(har=model_har_name)
runner.load_model_script(''.join(model_script_lines))
runner.optimize(calib_dataset)

I get an error massage:

hailo_model_optimization.acceleras.utils.acceleras_exceptions.AccelerasUnsupportedError: Unsupported layers for the provided optimization target. Review the log to see exact layers and configurations

and some weird messages on the screen:

[info] Loading model script commands to tiny_vit_5m_224_from_microsoft from string
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:02.07)
[info] Starting LayerNorm Decomposition
[info] Using dataset with 128 entries for calibration
[info] Model Optimization Algorithm LayerNorm Decomposition is done (completion time is 00:01:31.70)
[error] Unsupported layers for the target sage: tiny_vit_5m_224_from_microsoft/precision_change14:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change21:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change28:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change35:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change42:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change49:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change56:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change63:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change7:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true
tiny_vit_5m_224_from_microsoft/precision_change70:
bias_mode: single_scale_decomposition
precision_mode: a16_w16_a8
quantization_groups: 1
signed_output: true

The precision mode a16_w16_a8 is unexpected. What should I do to quantize correctly the network?

Hey @fabrice.auzanneau ,

You’re encountering an error because some layers in your ViT network are using a16_w16_a8, which is an unsupported precision mode during quantization.

Supported Precision Modes

According to the Hailo Dataflow Compiler documentation, these are the valid options:

  1. a8_w8 - Standard mode with 8-bit activations and weights
  2. a8_w4 - Memory-optimized mode with 8-bit activations and 4-bit weights
  3. a16_w16 - High-accuracy mode with 16-bit activations and weights (limited to specific layers)

Update Your Quantization Parameters

quantization_param({*conv*}, precision_mode=a16_w16)
quantization_param({*dw*}, precision_mode=a16_w16)
quantization_param({*fc*}, precision_mode=a16_w16)

Layer Compatibility

  • Not all layers support a16_w16 mode
  • Key limitations:
    • Some activation layers are incompatible
    • Special operations like softmax chains have restrictions
  • Solution: Only use a16_w16 on layers that explicitly support it

Remember to remove any mixed precision settings that combine a16_w16 with a8 activations, as this is causing the current error.

Thanks for your answer.
Actually, before choosing the quantization parameters, I have looked at the network and printed the various layer types:

# Get the graph of the model
graph = runner.get_hn_dict()
layers = [layer for layer in graph['layers']]
types = [graph['layers'][layer]['type'] for layer in layers]
types_ = list(set(types)) # remove duplicates
print(f'Layers types: {types_}')

which outputs:

Layers types: [‘input_layer’, ‘matmul’, ‘dense’, ‘reduce_sum’, ‘const_input’, ‘normalization’, ‘slice’, ‘ew_add’, ‘conv’, ‘ew_mult’, ‘output_layer’, ‘ew_sub’, ‘reduce_max’, ‘space_to_depth’, ‘concat’, ‘activation’]

Then, I can print the names of layers of a given type, for example for ‘conv’:

type: conv
layers: ['vit_base_patch16_224_unfilt/conv1', 'vit_base_patch16_224_unfilt/conv_slice1', 'vit_base_patch16_224_unfilt/conv_slice2', 'vit_base_patch16_224_unfilt/conv_slice3', 'vit_base_patch16_224_unfilt/conv3', 'vit_base_patch16_224_unfilt/conv4', 'vit_base_patch16_224_unfilt/conv5', 'vit_base_patch16_224_unfilt/conv_slice4', 'vit_base_patch16_224_unfilt/conv_slice5', 'vit_base_patch16_224_unfilt/conv_slice6', 'vit_base_patch16_224_unfilt/conv7', 'vit_base_patch16_224_unfilt/conv8', 'vit_base_patch16_224_unfilt/conv9', 'vit_base_patch16_224_unfilt/conv_slice7', 'vit_base_patch16_224_unfilt/conv_slice8', 'vit_base_patch16_224_unfilt/conv_slice9', 'vit_base_patch16_224_unfilt/conv11', 'vit_base_patch16_224_unfilt/conv12', 'vit_base_patch16_224_unfilt/conv13', 'vit_base_patch16_224_unfilt/conv_slice10', 'vit_base_patch16_224_unfilt/conv_slice11', 'vit_base_patch16_224_unfilt/conv_slice12', 'vit_base_patch16_224_unfilt/conv15', 'vit_base_patch16_224_unfilt/conv16', 'vit_base_patch16_224_unfilt/conv17', 'vit_base_patch16_224_unfilt/conv_slice13', 'vit_base_patch16_224_unfilt/conv_slice14', 'vit_base_patch16_224_unfilt/conv_slice15', 'vit_base_patch16_224_unfilt/conv19', 'vit_base_patch16_224_unfilt/conv20', 'vit_base_patch16_224_unfilt/conv21', 'vit_base_patch16_224_unfilt/conv_slice16', 'vit_base_patch16_224_unfilt/conv_slice17', 'vit_base_patch16_224_unfilt/conv_slice18', 'vit_base_patch16_224_unfilt/conv23', 'vit_base_patch16_224_unfilt/conv24', 'vit_base_patch16_224_unfilt/conv25', 'vit_base_patch16_224_unfilt/conv_slice19', 'vit_base_patch16_224_unfilt/conv_slice20', 'vit_base_patch16_224_unfilt/conv_slice21', 'vit_base_patch16_224_unfilt/conv27', 'vit_base_patch16_224_unfilt/conv28', 'vit_base_patch16_224_unfilt/conv29', 'vit_base_patch16_224_unfilt/conv_slice22', 'vit_base_patch16_224_unfilt/conv_slice23', 'vit_base_patch16_224_unfilt/conv_slice24', 'vit_base_patch16_224_unfilt/conv31', 'vit_base_patch16_224_unfilt/conv32', 'vit_base_patch16_224_unfilt/conv33', 'vit_base_patch16_224_unfilt/conv_slice25', 'vit_base_patch16_224_unfilt/conv_slice26', 'vit_base_patch16_224_unfilt/conv_slice27', 'vit_base_patch16_224_unfilt/conv35', 'vit_base_patch16_224_unfilt/conv36', 'vit_base_patch16_224_unfilt/conv37', 'vit_base_patch16_224_unfilt/conv_slice28', 'vit_base_patch16_224_unfilt/conv_slice29', 'vit_base_patch16_224_unfilt/conv_slice30', 'vit_base_patch16_224_unfilt/conv39', 'vit_base_patch16_224_unfilt/conv40', 'vit_base_patch16_224_unfilt/conv41', 'vit_base_patch16_224_unfilt/conv_slice31', 'vit_base_patch16_224_unfilt/conv_slice32', 'vit_base_patch16_224_unfilt/conv_slice33', 'vit_base_patch16_224_unfilt/conv43', 'vit_base_patch16_224_unfilt/conv44', 'vit_base_patch16_224_unfilt/conv45', 'vit_base_patch16_224_unfilt/conv_slice34', 'vit_base_patch16_224_unfilt/conv_slice35', 'vit_base_patch16_224_unfilt/conv_slice36', 'vit_base_patch16_224_unfilt/conv47', 'vit_base_patch16_224_unfilt/conv48', 'vit_base_patch16_224_unfilt/conv49']

So I expect that using:
quantization_param({*conv*}, precision_mode=a16_w16)
would quantize all convolutional layers as 16 bits for weights and activations.

The dataflow compiler user guide says that convolutional, depthwise convolutional, dense and activations support a16_w16 scheme. But the layers that pose the problem are precision_change layers, which are created I suppose by the quantization process. So I don’t think I can manage them.

I don’t know how to remove any mixed precision settings that combine a16_w16 with a8 activations. I can force activations layers to be quantized on 16 bits as well. But I can’t see the activation layers in the graph:

type: activation
layers:

So, how can I do this?

Actually, there are some activation layers, but they are all softmax:

type: activation
layers: [‘vit_base_patch16_224_unfilt/ne_activation_ew_sub_softmax1’, ‘vit_base_patch16_224_unfilt/ne_activation_ew_sub_softmax2’, ‘vit_base_patch16_224_unfilt/ne_activation_ew_sub_softmax3’, ‘vit_base_patch16_224_unfilt/ne_activation_ew_sub_softmax4’, ‘vit_base_patch16_224_unfilt/ne_activation_ew_sub_softmax5’]

I’ll try with another line in the model_script:
"quantization_param({*activ*}, precision_mode=a16_w16)\n",