Advanced Layer Noise Analysis

Hi, I’m trying to run advanced Layer Noise Analysis on a ViT-Base model. I receive the following error when trying to run it. I know this is a known bug, were you able to find a workaround for it?

It’s crucial that I run the advanced analysis as I want to see what layer introduces noise when all of the others are not quantized

[info] Current Time: 23:58:50, 03/09/25
[info] CPU: Architecture: x86_64, Model: AMD EPYC 9B14, Number Of Cores: 8, Utilization: 0.3%
[info] Memory: Total: 62GB, Available: 58GB
[info] System info: OS: Linux, Kernel: 6.8.0-1025-gcp
[info] Hailo DFC Version: 3.30.0
[info] HailoRT Version: 4.20.0
[info] PCIe: No Hailo PCIe device was found
[info] Running `hailo analyze-noise --data-path /local/shared_with_docker/image_dataset_normalized_200images.npy --analyze-mode advanced /local/shared_with_docker/model_zoo_clip_vit_large_level0_docker2025_checker_cfg_enabled.har`
[info] Starting Layer Noise Analysis

Full Quant Analysis:   0%|          | 0/64 [00:00<?, ?iterations/s]
Full Quant Analysis:   2%|▏         | 1/64 [00:03<03:33,  3.39s/iterations]Traceback (most recent call last):
  File "/local/workspace/hailo_virtualenv/bin/hailo", line 8, in <module>
    sys.exit(main())
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/tools/cmd_utils/main.py", line 111, in main
    ret_val = client_command_runner.run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_platform/tools/hailocli/main.py", line 64, in run
    ret_val = self._run(argv)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_platform/tools/hailocli/main.py", line 111, in _run
    return args.func(args)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/tools/hailo_lat_cli.py", line 60, in run
    runner.analyze_noise(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2072, in analyze_noise
    self._analyze_noise(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2094, in _analyze_noise
    return self._sdk_backend.run_layer_analysis_tool(data, data_count, batch_size, analyze_mode, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1078, in run_layer_analysis_tool
    analyzer.run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/optimization_algorithm.py", line 54, in run
    return super().run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/algorithm_base.py", line 150, in run
    self._run_int()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 86, in _run_int
    self.analyze_full_quant_net()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 208, in analyze_full_quant_net
    lat_model.predict_on_batch(inputs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2603, in predict_on_batch
    outputs = self.predict_function(iterator)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filepzxgdn2x.py", line 15, in tf__predict_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function
    outputs = model.distribute_strategy.run(run_step, args=(data,))
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step
    outputs = model.predict_step(data)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
    return self(x, training=False)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 188, in tf__call
    ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 167, in loop_body_5
    ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 163, in if_body_3
    ag__.for_stmt(ag__.converted_call(ag__.ld(enumerate), (ag__.converted_call(ag__.ld(zip), (ag__.ld(output_native), ag__.ld(output_numeric), ag__.ld(output_partial_numeric)), None, fscope),), None, fscope), None, loop_body_4, get_state_7, set_state_7, (), {'iterate_names': '(i, (native, numeric, partial_numeric))'})
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 162, in loop_body_4
    ag__.for_stmt(ag__.ld(metrics), None, loop_body_3, get_state_6, set_state_6, (), {'iterate_names': 'metric'})
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 161, in loop_body_3
    ag__.converted_call(ag__.ld(metric).update_state, (ag__.ld(native), ag__.ld(numeric)), dict(partial_numeric=ag__.ld(partial_numeric)), fscope)
  File "/tmp/__autograph_generated_fileq1jh1_9w.py", line 10, in tf__update_state
    ag__.converted_call(ag__.ld(self).noise_energy.assign_add, (ag__.converted_call(ag__.ld(tf).reduce_mean, ((ag__.ld(native) - ag__.ld(numeric)) ** 2,), None, fscope),), None, fscope)
ValueError: in user code:

    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2169, in predict_function  *
        return step_function(self, iterator)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step  **
        outputs = model.predict_step(data)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
        return self(x, training=False)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 188, in tf__call
        ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 167, in loop_body_5
        ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 163, in if_body_3
        ag__.for_stmt(ag__.converted_call(ag__.ld(enumerate), (ag__.converted_call(ag__.ld(zip), (ag__.ld(output_native), ag__.ld(output_numeric), ag__.ld(output_partial_numeric)), None, fscope),), None, fscope), None, loop_body_4, get_state_7, set_state_7, (), {'iterate_names': '(i, (native, numeric, partial_numeric))'})
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 162, in loop_body_4
        ag__.for_stmt(ag__.ld(metrics), None, loop_body_3, get_state_6, set_state_6, (), {'iterate_names': 'metric'})
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 161, in loop_body_3
        ag__.converted_call(ag__.ld(metric).update_state, (ag__.ld(native), ag__.ld(numeric)), dict(partial_numeric=ag__.ld(partial_numeric)), fscope)
    File "/tmp/__autograph_generated_fileq1jh1_9w.py", line 10, in tf__update_state
        ag__.converted_call(ag__.ld(self).noise_energy.assign_add, (ag__.converted_call(ag__.ld(tf).reduce_mean, ((ag__.ld(native) - ag__.ld(numeric)) ** 2,), None, fscope),), None, fscope)

    ValueError: Exception encountered when calling layer 'lat_model' (type LATModel).
    
    in user code:
    
        File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 392, in call  *
            metric.update_state(native, numeric, partial_numeric=partial_numeric)
        File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 22, in update_state  *
            self.noise_energy.assign_add(tf.reduce_mean((native - numeric) ** 2))
    
        ValueError: Dimensions must be equal, but are 768 and 780 for '{{node lat_model/sub_45}} = Sub[T=DT_FLOAT](lat_model/conv_feature_splitter1_2/act_op/Identity, lat_model/conv_feature_splitter1_2/output_op/mul)' with input shapes: [1,1,197,768], [1,1,197,780].
    
    
    Call arguments received by layer 'lat_model' (type LATModel):
      • inputs=tf.Tensor(shape=(1, 224, 224, 3), dtype=float32)


Full Quant Analysis:   2%|▏         | 1/64 [00:15<16:35, 15.81s/iterations]

Hey @user155 ,

Issue Analysis and Workaround for hailo analyze-noise on ViT-Base

  • Your error suggests a mismatch in tensor dimensions during noise analysis:
ValueError: Dimensions must be equal, but are 768 and 780 for '{{node lat_model/sub_45}}'
  • This usually happens when:
    • Positional encodings in ViTs are not properly aligned with token embeddings.
    • Some layers were not correctly quantized while others were.
    • Tensor shape inconsistencies due to dynamic input handling.

Try Any of these workarounds :

A. Modify analyze-noise Run Parameters

  • Try running the analysis with batch size 1 to avoid shape mismatches:
hailo analyze-noise model.har --data-path dataset.npy --batch-size 1
  • Set a limited data count to reduce variance in quantization noise:
hailo analyze-noise model.har --data-path dataset.npy --data-count 64
  • We recommend at least 64 images, but not exceeding 200.

B. Adjust Model Optimization Settings

  • Before running noise analysis, optimize the model with additional constraints:
hailo optimize model.har --data-path dataset.npy --optimize-mode accuracy
  • This forces the compiler to preserve more numerical accuracy, potentially avoiding the tensor mismatch issue.

C. Reconfigure Quantization for Transformer Layers

  • Some Transformer layers may require manual precision settings:
    • Apply 16-bit output on layers known to introduce instability:
quantization_param(layer_name, precision_mode=a16_w16)
  • If the issue occurs in a specific attention block, try offloading activations:
change_output_activation(attn_layer, sigmoid)

D. Use Debug Mode to Identify Failing Layer

  • If the error persists, enable debug mode:
hailo analyze-noise model.har --debug
  • This provides a per-layer breakdown of quantization noise.

Let me know if any of these workarounds help! :rocket:

Hi,
I run in exactly the same issue and tried out your recommendations:

A

both didn’t helped (batch_size=1, data_count=64)

B

the hailo optimize command has no argument “optimize-mode”:

Running `hailo optimize --help`
usage: hailo optimize [-h]
                      [--hw-arch {hailo8,hailo8r,hailo8l,hailo15h,hailo15m,hailo10h}]
                      (--calib-set-path CALIBRATION_SET_PATH | --use-random-calib-set)
                      [--full-precision-only]
                      [--calib-random-max CALIB_RANDOM_MAX]
                      [--use-random-weights] [--work-dir WORK_DIR]
                      [--model-script MODEL_SCRIPT]
                      [--output-har-path OUTPUT_HAR_PATH]
                      [--compilation-only-har]
                      har_path

C

Hard to find out the critical layers in a big model … manually screening the profiler report and looking into the Weight/Activation Histogram and Scatter Plot for each layer?

kind regards,
Martin

Hi @Martin_Grossbichler ,

I solved it by running a line in the model script and enabling the checker_cfg during optimization, rather than running the analysis as a separate step. Check out the documentation for checker_cfg, enable it and put in advanced mode. Pay attention to run it with a low number of images. I run it with 200 images and it took somewhere around 10 hrs.

Hi, thank you, that worked for me!