Problem with noise analysis of Python API

fabrice.auzanneau · February 4, 2025, 3:19pm

I’m trying to quantize a ViT network.
I found the onnx file in the model zoo repository.

The performance of the original network is ok , but not as high as expected: Top1 is 61% on Imagenet1K (80% expected)

When I run the runner.optimize(calib_dataset), the performance drops to 3.9%. So I try the noise analysis:
runner.analyze_noise(image_dataset, batch_size=2, data_count=32)

There, the program crashes, with the following traceback:

[info] Starting Layer Noise Analysis
Full Quant Analysis:   6%|███▎                                                 | 1/16 [00:02<00:32,  2.17s/iterations]Traceback (most recent call last):
  File "/local/workspace/FA/tests_quant/test.py", line 90, in <module>
    runner.analyze_noise(image_dataset, batch_size=2, data_count=nimg)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2072, in analyze_noise
    self._analyze_noise(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2094, in _analyze_noise
    return self._sdk_backend.run_layer_analysis_tool(data, data_count, batch_size, analyze_mode, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1078, in run_layer_analysis_tool
    analyzer.run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/optimization_algorithm.py", line 54, in run
    return super().run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/algorithm_base.py", line 150, in run
    self._run_int()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 86, in _run_int
    self.analyze_full_quant_net()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 208, in analyze_full_quant_net
    lat_model.predict_on_batch(inputs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2603, in predict_on_batch
    outputs = self.predict_function(iterator)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filey5gzfqgw.py", line 15, in tf__predict_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function
    outputs = model.distribute_strategy.run(run_step, args=(data,))
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step
    outputs = model.predict_step(data)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
    return self(x, training=False)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filevaxh213w.py", line 188, in tf__call
    ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
  File "/tmp/__autograph_generated_filevaxh213w.py", line 167, in loop_body_5
    ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
  File "/tmp/__autograph_generated_filevaxh213w.py", line 163, in if_body_3
    ag__.for_stmt(ag__.converted_call(ag__.ld(enumerate), (ag__.converted_call(ag__.ld(zip), (ag__.ld(output_native), ag__.ld(output_numeric), ag__.ld(output_partial_numeric)), None, fscope),), None, fscope), None, loop_body_4, get_state_7, set_state_7, (), {'iterate_names': '(i, (native, numeric, partial_numeric))'})
  File "/tmp/__autograph_generated_filevaxh213w.py", line 162, in loop_body_4
    ag__.for_stmt(ag__.ld(metrics), None, loop_body_3, get_state_6, set_state_6, (), {'iterate_names': 'metric'})
  File "/tmp/__autograph_generated_filevaxh213w.py", line 161, in loop_body_3
    ag__.converted_call(ag__.ld(metric).update_state, (ag__.ld(native), ag__.ld(numeric)), dict(partial_numeric=ag__.ld(partial_numeric)), fscope)
  File "/tmp/__autograph_generated_file0p01gogs.py", line 10, in tf__update_state
    ag__.converted_call(ag__.ld(self).noise_energy.assign_add, (ag__.converted_call(ag__.ld(tf).reduce_mean, ((ag__.ld(native) - ag__.ld(numeric)) ** 2,), None, fscope),), None, fscope)
ValueError: in user code:

    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2169, in predict_function  *
        return step_function(self, iterator)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step  **
        outputs = model.predict_step(data)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
        return self(x, training=False)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_filevaxh213w.py", line 188, in tf__call
        ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
    File "/tmp/__autograph_generated_filevaxh213w.py", line 167, in loop_body_5
        ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
    File "/tmp/__autograph_generated_filevaxh213w.py", line 163, in if_body_3
        ag__.for_stmt(ag__.converted_call(ag__.ld(enumerate), (ag__.converted_call(ag__.ld(zip), (ag__.ld(output_native), ag__.ld(output_numeric), ag__.ld(output_partial_numeric)), None, fscope),), None, fscope), None, loop_body_4, get_state_7, set_state_7, (), {'iterate_names': '(i, (native, numeric, partial_numeric))'})
    File "/tmp/__autograph_generated_filevaxh213w.py", line 162, in loop_body_4
        ag__.for_stmt(ag__.ld(metrics), None, loop_body_3, get_state_6, set_state_6, (), {'iterate_names': 'metric'})
    File "/tmp/__autograph_generated_filevaxh213w.py", line 161, in loop_body_3
        ag__.converted_call(ag__.ld(metric).update_state, (ag__.ld(native), ag__.ld(numeric)), dict(partial_numeric=ag__.ld(partial_numeric)), fscope)
    File "/tmp/__autograph_generated_file0p01gogs.py", line 10, in tf__update_state
        ag__.converted_call(ag__.ld(self).noise_energy.assign_add, (ag__.converted_call(ag__.ld(tf).reduce_mean, ((ag__.ld(native) - ag__.ld(numeric)) ** 2,), None, fscope),), None, fscope)

    ValueError: Exception encountered when calling layer 'lat_model_1' (type LATModel).

    in user code:

        File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 392, in call  *
            metric.update_state(native, numeric, partial_numeric=partial_numeric)
        File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 22, in update_state  *
            self.noise_energy.assign_add(tf.reduce_mean((native - numeric) ** 2))

        ValueError: Dimensions must be equal, but are 768 and 780 for '{{node lat_model_1/sub_45}} = Sub[T=DT_FLOAT](lat_model_1/conv_slice2/act_op/Identity, lat_model_1/conv_slice2/output_op/mul)' with input shapes: [2,1,197,768], [2,1,197,780].


    Call arguments received by layer 'lat_model_1' (type LATModel):
      • inputs=tf.Tensor(shape=(2, 224, 224, 3), dtype=float32)

Full Quant Analysis:   6%|▋         | 1/16 [00:07<01:57,  7.81s/iterations]

Can anyone help?
Thank you.

nina-vilela · February 10, 2025, 11:54am

Hi @fabrice.auzanneau,

Thank you for your post.
This is a bug that we are aware of and working on fixing.

In any case, I wouldn’t recommend you run the full LAT as it is only useful in very specific cases in advanced debugging of degradation.

The simple LAT should be enough if you are interested in SNR, and it is enabled automatically when using optimization level > 1. The results are available in the Accuracy section of the profiler results.

What is the model script (alls) that you are using for the quantization?

fabrice.auzanneau · February 10, 2025, 12:35pm

Hi @nina-vilela, thanks for your answer.
I used the following model script:

model_script_lines = [f"normalization_rule1 = normalization({mean}, {sdev})\n",
                      "model_optimization_config(calibration, batch_size=1, calibset_size=128)\n"]

I’ll try to use optimization level > 1:

'model_optimization_flavor(optimization_level=2, compression_level=0, batch_size=8)\n'

nina-vilela · February 10, 2025, 1:21pm

You can see that in our ModelZoo we use a number of methods for the ViT_base quantization:

github.com/hailo-ai/hailo_model_zoo

hailo_model_zoo/cfg/alls/generic/vit_base.alls

master

norm_layer1 = normalization([127.5, 127.5, 127.5], [127.5, 127.5, 127.5])
model_optimization_config(calibration, batch_size=8, calibset_size=1024)
pre_quantization_optimization(equalization, policy=enabled)
pre_quantization_optimization(ew_add_fusing, policy=disabled)
model_optimization_flavor(optimization_level=0, compression_level=0)
pre_quantization_optimization(matmul_correction, layers={matmul*}, correction_type=zp_comp_block)
model_optimization_config(negative_exponent, layers={*}, rank=0)
quantization_param({vit_base/ew_add*}, precision_mode=a16_w16)
quantization_param({vit_base/ew_add1}, precision_mode=a8_w8)

resources_param(strategy=greedy, max_compute_utilization=0.8, max_control_utilization=0.85, max_memory_utilization=0.8)

The exact commands can vary depending on your version of vit, but I would try something similar.

fabrice.auzanneau · February 10, 2025, 1:34pm

Thanks.
First, I have tried the option optimization_level=2. I get a bunch of comments which you may understand better than me:

Optimizing the model
[info] Loading model script commands to vit_base_patch16_224_unfilt from string
[info] Starting Model Optimization
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.59)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 128 entries for calibration
Calibration: 100%|██████████████████████████████████████████████████| 128/128 [00:42<00:00,  3.00entries/s]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:44.62)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Starting Matmul Equalization
[info] Model Optimization Algorithm Matmul Equalization is done (completion time is 00:00:00.22)
[info] No shifts available for layer vit_base_patch16_224_unfilt/conv1/conv_op, using max shift instead. delta=1.8572
[info] No shifts available for layer vit_base_patch16_224_unfilt/conv1/conv_op, using max shift instead. delta=0.9286
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax1/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax2/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax3/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax4/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax5/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax6/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax7/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax8/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax9/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax10/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax11/act_op
[info] activation fitting started for vit_base_patch16_224_unfilt/reduce_sum_softmax12/act_op
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune
Epoch 1/4
128/128 [==============================] - 158s 350ms/step - total_distill_loss: 2.5147 - _distill_loss_vit_base_patch16_224_unfilt/fc1: 2.5147
Epoch 2/4
128/128 [==============================] - 45s 348ms/step - total_distill_loss: 3.9074 - _distill_loss_vit_base_patch16_224_unfilt/fc1: 3.9074
Epoch 3/4
128/128 [==============================] - 45s 348ms/step - total_distill_loss: 2.7705 - _distill_loss_vit_base_patch16_224_unfilt/fc1: 2.7705
Epoch 4/4
128/128 [==============================] - 44s 347ms/step - total_distill_loss: 2.6962 - _distill_loss_vit_base_patch16_224_unfilt/fc1: 2.6962
[info] Model Optimization Algorithm Quantization-Aware Fine-Tuning is done (completion time is 00:04:54.44)
[info] Starting Layer Noise Analysis
Full Quant Analysis: 100%|█████████████████████████████████████████| 16/16 [02:26<00:00,  9.14s/iterations]
[info] Model Optimization Algorithm Layer Noise Analysis is done (completion time is 00:02:29.25)
[info] Output layers signal-to-noise ratio (SNR): measures the quantization noise (higher is better)
[info]  vit_base_patch16_224_unfilt/output_layer1 SNR:  -9.065 dB
[info] Model Optimization is done
Duration: 735.9 seconds
Saving quantized model: ./vit_base_patch16_224_unfilt_quantized_model.har
[info] Saved HAR to: /local/workspace/FA/Classification/tests_quant/vit_base_patch16_224_unfilt_quantized_model.har
Compiling the model
[info] To achieve optimal performance, set the compiler_optimization_level to "max" by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Original names list for layer vit_base_patch16_224_unfilt/conv4 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv8 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv12 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv16 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv20 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv24 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv28 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv32 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv36 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv40 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv44 is too long (>10), Trimming it to 10
[info] Original names list for layer vit_base_patch16_224_unfilt/conv48 is too long (>10), Trimming it to 10
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding collapsed format conversion after const_input2
[info] Adding collapsed format conversion after conv_slice3
[info] Adding collapsed format conversion after conv_slice6
[info] Adding collapsed format conversion after conv_slice9
[info] Adding collapsed format conversion after conv_slice12
[info] Adding collapsed format conversion after conv_slice15
[info] Adding collapsed format conversion after conv_slice18
[info] Adding collapsed format conversion after conv_slice21
[info] Adding collapsed format conversion after conv_slice24
[info] Adding collapsed format conversion after conv_slice27
[info] Adding collapsed format conversion after conv_slice30
[info] Adding collapsed format conversion after conv_slice33
[info] Adding collapsed format conversion after conv_slice36
[info] Finding the best partition to contexts...
[...........................<==>.........] Duration: 00:03:15
Iteration Done
[.....................<==>...............] Duration: 00:00:47
Iteration Done, Performance improved by 16.3%
[...............<==>.....................] Duration: 00:00:24
Iteration Done, Performance improved by 10.8%
[...<==>.................................] Duration: 00:01:15
Iteration Done, Performance improved by 7.7%
[.......<==>.............................] Duration: 00:00:23
Iteration Done, Performance improved by 4.1%
[....................................<==>] Duration: 00:00:34
Iteration Done, Performance improved by 2.0%
[.......<==>.............................] Duration: 00:00:23
Iteration Done, Performance improved by 2.2%
[.......<==>.............................] Elapsed: 00:03:32
[info] Using Multi-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=60%, max_compute_utilization=60%, max_compute_16bit_utilization=60%, max_memory_utilization (weights)=60%, max_input_aligner_utilization=60%, max_apu_utilization=60%

Validating context_0 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +

● Finished


Validating context_1 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +

● Finished


Validating context_2 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +

● Finished


Validating context_3 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +

● Finished


Validating context_4 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +

● Finished


Validating context_5 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +

● Finished

[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/5 Iteration 4: Trying parallel mapping...
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost
 worker0  V          V          V          V          V          V          X          V          V
 worker1  X          V          X          V          V          V          V          V          V
 worker2  V          V          V          V          V          V          V          V          V
 worker3  V          V          V          X          V          V          V          V          V
Context:1/5 Iteration 4: Trying parallel mapping...
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost
 worker0  V          V          V          V          V          V          V          V          V
 worker1  V          V          V          V          V          V          V          V          V
 worker2  V          V          V          V          V          V          V          X          V
 worker3  *          *          *          *          *          *          *          *          V
Context:2/5 Iteration 4: Trying parallel mapping...
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost
 worker0  V          V          V          V          X          V          V          V          V
 worker1  V          V          V          V          V          V          V          V          V
 worker2  *          *          *          *          *          *          *          *          V
 worker3  V          V          V          V          V          V          V          V          V
Context:3/5 Iteration 4: Trying parallel mapping...
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost
 worker0  *          *          *          *          *          *          *          *          V
 worker1  *          *          *          *          *          *          *          *          V
 worker2  *          *          *          *          *          *          *          *          V
 worker3  V          V          V          V          V          V          V          V          V
Context:4/5 Iteration 4: Trying parallel mapping...
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost
 worker0  *          *          *          *          *          *          *          *          V
 worker1  V          V          V          V          V          V          V          V          V
 worker2  V          V          V          V          V          V          V          V          V
 worker3  V          V          V          V          V          V          X          V          V
Context:5/5 Iteration 4: Trying parallel splits...
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost
 worker0
 worker1
Context:5/5 Iteration 12: Trying parallel mapping...
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost
 worker0  *          *          *          *          *          *          *          *          V
 worker1  V          V          V          V          V          V          V          V          V
 worker2  *          *          *          *          *          *          *          *          V
 worker3  *          *          *          *          *          *          *          *          V
Reverts on pre-mapping validation: 1
  06:48 on split failed: 0
Reverts on cluster mapping: 1iteration: 8: Negative size is: 1
Reverts on inter-cluster connectivity: 26: Negative size is: 1
Reverts on pre-mapping validation: 1on: 7: Negative size is: 1
Reverts on split failed: 4a, iteration: 5: Negative size is: 1

[info] context_0 (context_0):
Iterations: 4
Reverts on cluster mapping: 2
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_1 (context_1):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_2 (context_2):
Iterations: 4
Reverts on cluster mapping: 1
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_3 (context_3):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 1
Reverts on pre-mapping validation: 2
Reverts on split failed: 0
[info] context_4 (context_4):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 1
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_5 (context_5):
Iterations: 12
Reverts on cluster mapping: 1
Reverts on inter-cluster connectivity: 2
Reverts on pre-mapping validation: 2
Reverts on split failed: 4
[info] context_0 utilization:
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 50%                 | 60.9%               | 98.4%              |
[info] | cluster_1 | 75%                 | 53.1%               | 95.3%              |
[info] | cluster_2 | 62.5%               | 23.4%               | 57%                |
[info] | cluster_3 | 75%                 | 39.1%               | 72.7%              |
[info] | cluster_4 | 37.5%               | 59.4%               | 100%               |
[info] | cluster_5 | 43.8%               | 18.8%               | 53.1%              |
[info] | cluster_6 | 75%                 | 59.4%               | 92.2%              |
[info] | cluster_7 | 31.3%               | 15.6%               | 42.2%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 56.3%               | 41.2%               | 76.4%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] context_1 utilization:
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 56.3%               | 40.6%               | 73.4%              |
[info] | cluster_1 | 43.8%               | 45.3%               | 85.2%              |
[info] | cluster_2 | 37.5%               | 39.1%               | 71.9%              |
[info] | cluster_3 | 62.5%               | 34.4%               | 51.6%              |
[info] | cluster_4 | 62.5%               | 48.4%               | 96.1%              |
[info] | cluster_5 | 75%                 | 54.7%               | 96.9%              |
[info] | cluster_6 | 25%                 | 26.6%               | 45.3%              |
[info] | cluster_7 | 87.5%               | 51.6%               | 96.9%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 56.3%               | 42.6%               | 77.1%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] context_2 utilization:
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 50%                 | 45.3%               | 88.3%              |
[info] | cluster_1 | 43.8%               | 25%                 | 44.5%              |
[info] | cluster_2 | 50%                 | 53.1%               | 97.7%              |
[info] | cluster_3 | 50%                 | 43.8%               | 71.9%              |
[info] | cluster_4 | 56.3%               | 45.3%               | 76.6%              |
[info] | cluster_5 | 50%                 | 54.7%               | 98.4%              |
[info] | cluster_6 | 75%                 | 28.1%               | 50.8%              |
[info] | cluster_7 | 37.5%               | 43.8%               | 82%                |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 51.6%               | 42.4%               | 76.3%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] context_3 utilization:
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 62.5%               | 43.8%               | 86.7%              |
[info] | cluster_1 | 6.3%                | 9.4%                | 16.4%              |
[info] | cluster_2 | 50%                 | 51.6%               | 92.2%              |
[info] | cluster_3 | 43.8%               | 50%                 | 87.5%              |
[info] | cluster_4 | 50%                 | 51.6%               | 87.5%              |
[info] | cluster_5 | 62.5%               | 46.9%               | 85.2%              |
[info] | cluster_6 | 68.8%               | 46.9%               | 87.5%              |
[info] | cluster_7 | 37.5%               | 26.6%               | 51.6%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 47.7%               | 40.8%               | 74.3%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] context_4 utilization:
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 62.5%               | 50%                 | 96.9%              |
[info] | cluster_1 | 43.8%               | 45.3%               | 75%                |
[info] | cluster_2 | 100%                | 60.9%               | 95.3%              |
[info] | cluster_3 | 25%                 | 18.8%               | 28.1%              |
[info] | cluster_4 | 50%                 | 14.1%               | 26.6%              |
[info] | cluster_5 | 37.5%               | 42.2%               | 78.9%              |
[info] | cluster_6 | 37.5%               | 56.3%               | 94.5%              |
[info] | cluster_7 | 50%                 | 60.9%               | 97.7%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 50.8%               | 43.6%               | 74.1%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] context_5 utilization:
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 68.8%               | 64.1%               | 53.9%              |
[info] | cluster_1 | 81.3%               | 75%                 | 93.8%              |
[info] | cluster_2 | 62.5%               | 71.9%               | 96.1%              |
[info] | cluster_3 | 81.3%               | 67.2%               | 96.9%              |
[info] | cluster_4 | 43.8%               | 29.7%               | 43%                |
[info] | cluster_5 | 18.8%               | 32.8%               | 43.8%              |
[info] | cluster_6 | 56.3%               | 70.3%               | 96.9%              |
[info] | cluster_7 | 56.3%               | 67.2%               | 81.3%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 58.6%               | 59.8%               | 75.7%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 19m 15s)
[info] Compiling context_0...
[info] Compiling context_1...
[info] Compiling context_2...
[info] Compiling context_3...
[info] Compiling context_4...
[info] Compiling context_5...
[info] Bandwidth of model inputs: 1.14844 Mbps, outputs: 0.00762939 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 36.3281 Mbps (for a single frame)
[info] Building HEF...
[info] Successful Compilation (compilation time: 40s)
[warning] Unexpected input shapes at matmul layer matmul2 (translated from /blocks/blocks.0/attn/Reshape_1), input_shapes=[[-1, 25, 8, 2364], [-1, 25, 64, 96]] (type=<class 'list'>)
Duration: 1330.7 seconds
Saving compiled model: ./vit_base_patch16_224_unfilt_quantized_model.hef

At then end, this may be a problem:

[warning] Unexpected input shapes at matmul layer matmul2 (translated from /blocks/blocks.0/attn/Reshape_1), input_shapes=[[-1, 25, 8, 2364], [-1, 25, 64, 96]] (type=<class ‘list’>)

When I compute the performances, I get very bad results:

Top1 score 26/25000 = 0.104%
Top5 score 117/25000 = 0.468%

When I execute profiling_results = runner.profile(), I get the same warning:

[info] Running profile for vit_base_patch16_224_unfilt in state compiled_model
[warning] Unexpected input shapes at matmul layer matmul2 (translated from /blocks/blocks.0/attn/Reshape_1), input_shapes=[[-1, 25, 8, 2364], [-1, 25, 64, 96]] (type=<class ‘list’>)
[info]

I have tried to read the graph of the network, I get the following types of layers:

Layers types: [‘ew_sub’, ‘normalization’, ‘reduce_sum’, ‘const_input’, ‘input_layer’, ‘dense’, ‘conv’, ‘ew_add’, ‘matmul’, ‘concat’, ‘reduce_max’, ‘space_to_depth’, ‘ew_mult’, ‘slice’, ‘activation’, ‘output_layer’]

I’m now going to try the options you suggested. First all of them to see the difference.

fabrice.auzanneau · February 10, 2025, 8:49pm

I have tried your suggestions, using the complete model script from : hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/vit_base.alls at master · hailo-ai/hailo_model_zoo · GitHub

But, after a very long computing time, I got finally:

[info] context_6 (context_6):
Iterations: 0 [ Did not run due to an error ]
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_7 (context_7):
Iterations: 0 [ Did not run due to an error ]
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[error] Mapping Failed (Timeout, allocation time: 3h 36m 7s)
[error] Mapping Failed (Timeout, allocation time: 3h 36m 7s)
Resolver didn’t find possible solution.
Watchdog expired after 1h 0m 0s
Mapping Failed (Timeout, allocation time: 3h 36m 7s)

[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: Resolver didn’t find possible solution.
Watchdog expired after 1h 0m 0s
Mapping Failed (Timeout, allocation time: 3h 36m 7s)

How can I change the options to improve this?

nina-vilela · February 11, 2025, 12:48pm

This error means that there was a timeout before the compiler converged. To increase the timeout, add the following line to the model script:
--model-script "allocator_param(timeout=100h)"

If that doesn’t solve it, remove:
resources_param(strategy=greedy, max_compute_utilization=0.8, max_control_utilization=0.85, max_memory_utilization=0.8)
This command sets a higher utilization of the device, which might not work for the Hailo device you are using.

If that still doesn’t work, add the following command:
performance_param(compiler_optimization_level=max)
This command enables the optimization flow for the compiler, which will search for the maximum feasible utilization of the device.

nina-vilela · February 11, 2025, 12:50pm

By the way, why are you going through the conversion flow yourself instead of downloading the already compiled version from the MZ?

fabrice.auzanneau · February 15, 2025, 6:03pm

Thanks for your help. I haven’t had the time to test that yet. To answer your question:
First: I don’t know how to do that.
Second: I want to learn to use the dataflow compiler to be able to use later some networks that may not exist in the zoo.

user155 · March 9, 2025, 5:43pm

Hi @nina-vilela, I have the problem above and I would really need to run the advanced noise analysis. Were you able to find any workaround for it?

fabrice.auzanneau · April 1, 2025, 8:37am

I have tried the following model script:

model_script_lines = [f'norm_layer1 = normalization({mean}, {sdev})\n',
      'model_optimization_config(calibration, batch_size=8, calibset_size=1024)\n',
      'pre_quantization_optimization(equalization, policy=enabled)\n',
      'pre_quantization_optimization(ew_add_fusing, policy=disabled)\n',
      'model_optimization_flavor(optimization_level=0, compression_level=0)\n',
      'pre_quantization_optimization(matmul_correction, layers={matmul*}, correction_type=zp_comp_block)\n',
      'model_optimization_config(negative_exponent, layers={*}, rank=0)\n',
      'quantization_param({vit_base_patch16_224_ops17/ew_add*}, precision_mode=a16_w16)\n',
      'quantization_param({vit_base_patch16_224_ops17/ew_add1}, precision_mode=a8_w8)\n',
      'performance_param(compiler_optimization_level=max)\n',]

using

mean = [123.675, 116.28, 103.53]
sdev = [ 58.395,  57.12, 57.375]

I got good performance results, but very poor compression. The full precision network has:

Network statistics:
Number of layers: 216
GOPS: 35.15018064
Size: 86377192.0 bytes (82.376 MB)

and I got after optimization:

Model Details

Operations per Input Tensor 35.75 GOPs
Operations per Input Tensor 17.89 GMACs
Pure Operations per Input Tensor 35.22 GOPs
Pure Operations per Input Tensor 17.63 GMACs
Model Parameters 80.23 M

So the quantization / compression / optimization phase did not work.

Topic		Replies	Views
Advanced Layer Noise Analysis General	4	59	March 21, 2025
Optimization/quantization problem General	1	134	March 5, 2025
Error while optimization for custom dataset General	1	306	August 25, 2024
Fail to run analyze noise on custom YOLOv11 General	0	34	January 22, 2025
Model compilation run-time error General compiler , hailo8 , error	8	255	September 11, 2024

Problem with noise analysis of Python API

Related topics