Advanced Layer Noise Analysis

user155 · March 13, 2025, 2:42pm

Hi, I’m trying to run advanced Layer Noise Analysis on a ViT-Base model. I receive the following error when trying to run it. I know this is a known bug, were you able to find a workaround for it?

It’s crucial that I run the advanced analysis as I want to see what layer introduces noise when all of the others are not quantized

[info] Current Time: 23:58:50, 03/09/25
[info] CPU: Architecture: x86_64, Model: AMD EPYC 9B14, Number Of Cores: 8, Utilization: 0.3%
[info] Memory: Total: 62GB, Available: 58GB
[info] System info: OS: Linux, Kernel: 6.8.0-1025-gcp
[info] Hailo DFC Version: 3.30.0
[info] HailoRT Version: 4.20.0
[info] PCIe: No Hailo PCIe device was found
[info] Running `hailo analyze-noise --data-path /local/shared_with_docker/image_dataset_normalized_200images.npy --analyze-mode advanced /local/shared_with_docker/model_zoo_clip_vit_large_level0_docker2025_checker_cfg_enabled.har`
[info] Starting Layer Noise Analysis

Full Quant Analysis:   0%|          | 0/64 [00:00<?, ?iterations/s]
Full Quant Analysis:   2%|▏         | 1/64 [00:03<03:33,  3.39s/iterations]Traceback (most recent call last):
  File "/local/workspace/hailo_virtualenv/bin/hailo", line 8, in <module>
    sys.exit(main())
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/tools/cmd_utils/main.py", line 111, in main
    ret_val = client_command_runner.run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_platform/tools/hailocli/main.py", line 64, in run
    ret_val = self._run(argv)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_platform/tools/hailocli/main.py", line 111, in _run
    return args.func(args)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/tools/hailo_lat_cli.py", line 60, in run
    runner.analyze_noise(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2072, in analyze_noise
    self._analyze_noise(
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2094, in _analyze_noise
    return self._sdk_backend.run_layer_analysis_tool(data, data_count, batch_size, analyze_mode, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1078, in run_layer_analysis_tool
    analyzer.run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/optimization_algorithm.py", line 54, in run
    return super().run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/algorithm_base.py", line 150, in run
    self._run_int()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 86, in _run_int
    self.analyze_full_quant_net()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 208, in analyze_full_quant_net
    lat_model.predict_on_batch(inputs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2603, in predict_on_batch
    outputs = self.predict_function(iterator)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filepzxgdn2x.py", line 15, in tf__predict_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function
    outputs = model.distribute_strategy.run(run_step, args=(data,))
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step
    outputs = model.predict_step(data)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
    return self(x, training=False)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 188, in tf__call
    ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 167, in loop_body_5
    ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 163, in if_body_3
    ag__.for_stmt(ag__.converted_call(ag__.ld(enumerate), (ag__.converted_call(ag__.ld(zip), (ag__.ld(output_native), ag__.ld(output_numeric), ag__.ld(output_partial_numeric)), None, fscope),), None, fscope), None, loop_body_4, get_state_7, set_state_7, (), {'iterate_names': '(i, (native, numeric, partial_numeric))'})
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 162, in loop_body_4
    ag__.for_stmt(ag__.ld(metrics), None, loop_body_3, get_state_6, set_state_6, (), {'iterate_names': 'metric'})
  File "/tmp/__autograph_generated_file_7r4tlty.py", line 161, in loop_body_3
    ag__.converted_call(ag__.ld(metric).update_state, (ag__.ld(native), ag__.ld(numeric)), dict(partial_numeric=ag__.ld(partial_numeric)), fscope)
  File "/tmp/__autograph_generated_fileq1jh1_9w.py", line 10, in tf__update_state
    ag__.converted_call(ag__.ld(self).noise_energy.assign_add, (ag__.converted_call(ag__.ld(tf).reduce_mean, ((ag__.ld(native) - ag__.ld(numeric)) ** 2,), None, fscope),), None, fscope)
ValueError: in user code:

    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2169, in predict_function  *
        return step_function(self, iterator)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step  **
        outputs = model.predict_step(data)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
        return self(x, training=False)
    File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 188, in tf__call
        ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 167, in loop_body_5
        ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 163, in if_body_3
        ag__.for_stmt(ag__.converted_call(ag__.ld(enumerate), (ag__.converted_call(ag__.ld(zip), (ag__.ld(output_native), ag__.ld(output_numeric), ag__.ld(output_partial_numeric)), None, fscope),), None, fscope), None, loop_body_4, get_state_7, set_state_7, (), {'iterate_names': '(i, (native, numeric, partial_numeric))'})
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 162, in loop_body_4
        ag__.for_stmt(ag__.ld(metrics), None, loop_body_3, get_state_6, set_state_6, (), {'iterate_names': 'metric'})
    File "/tmp/__autograph_generated_file_7r4tlty.py", line 161, in loop_body_3
        ag__.converted_call(ag__.ld(metric).update_state, (ag__.ld(native), ag__.ld(numeric)), dict(partial_numeric=ag__.ld(partial_numeric)), fscope)
    File "/tmp/__autograph_generated_fileq1jh1_9w.py", line 10, in tf__update_state
        ag__.converted_call(ag__.ld(self).noise_energy.assign_add, (ag__.converted_call(ag__.ld(tf).reduce_mean, ((ag__.ld(native) - ag__.ld(numeric)) ** 2,), None, fscope),), None, fscope)

    ValueError: Exception encountered when calling layer 'lat_model' (type LATModel).
    
    in user code:
    
        File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 392, in call  *
            metric.update_state(native, numeric, partial_numeric=partial_numeric)
        File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 22, in update_state  *
            self.noise_energy.assign_add(tf.reduce_mean((native - numeric) ** 2))
    
        ValueError: Dimensions must be equal, but are 768 and 780 for '{{node lat_model/sub_45}} = Sub[T=DT_FLOAT](lat_model/conv_feature_splitter1_2/act_op/Identity, lat_model/conv_feature_splitter1_2/output_op/mul)' with input shapes: [1,1,197,768], [1,1,197,780].
    
    
    Call arguments received by layer 'lat_model' (type LATModel):
      • inputs=tf.Tensor(shape=(1, 224, 224, 3), dtype=float32)


Full Quant Analysis:   2%|▏         | 1/64 [00:15<16:35, 15.81s/iterations]

omria · March 16, 2025, 3:24pm

Hey @user155 ,

Issue Analysis and Workaround for `hailo analyze-noise` on ViT-Base

Your error suggests a mismatch in tensor dimensions during noise analysis:

ValueError: Dimensions must be equal, but are 768 and 780 for '{{node lat_model/sub_45}}'

This usually happens when:
- Positional encodings in ViTs are not properly aligned with token embeddings.
- Some layers were not correctly quantized while others were.
- Tensor shape inconsistencies due to dynamic input handling.

Try Any of these workarounds :

A. Modify `analyze-noise` Run Parameters

Try running the analysis with batch size 1 to avoid shape mismatches:

hailo analyze-noise model.har --data-path dataset.npy --batch-size 1

Set a limited data count to reduce variance in quantization noise:

hailo analyze-noise model.har --data-path dataset.npy --data-count 64

We recommend at least 64 images, but not exceeding 200.

B. Adjust Model Optimization Settings

Before running noise analysis, optimize the model with additional constraints:

hailo optimize model.har --data-path dataset.npy --optimize-mode accuracy

This forces the compiler to preserve more numerical accuracy, potentially avoiding the tensor mismatch issue.

C. Reconfigure Quantization for Transformer Layers

Some Transformer layers may require manual precision settings:
- Apply 16-bit output on layers known to introduce instability:

quantization_param(layer_name, precision_mode=a16_w16)

If the issue occurs in a specific attention block, try offloading activations:

change_output_activation(attn_layer, sigmoid)

D. Use Debug Mode to Identify Failing Layer

If the error persists, enable debug mode:

hailo analyze-noise model.har --debug

This provides a per-layer breakdown of quantization noise.

Let me know if any of these workarounds help!

Martin_Grossbichler · March 20, 2025, 1:53pm

Hi,
I run in exactly the same issue and tried out your recommendations:

A

both didn’t helped (batch_size=1, data_count=64)

B

the hailo optimize command has no argument “optimize-mode”:

Running `hailo optimize --help`
usage: hailo optimize [-h]
                      [--hw-arch {hailo8,hailo8r,hailo8l,hailo15h,hailo15m,hailo10h}]
                      (--calib-set-path CALIBRATION_SET_PATH | --use-random-calib-set)
                      [--full-precision-only]
                      [--calib-random-max CALIB_RANDOM_MAX]
                      [--use-random-weights] [--work-dir WORK_DIR]
                      [--model-script MODEL_SCRIPT]
                      [--output-har-path OUTPUT_HAR_PATH]
                      [--compilation-only-har]
                      har_path

C

Hard to find out the critical layers in a big model … manually screening the profiler report and looking into the Weight/Activation Histogram and Scatter Plot for each layer?

kind regards,
Martin

user155 · March 20, 2025, 2:41pm

Hi @Martin_Grossbichler ,

I solved it by running a line in the model script and enabling the checker_cfg during optimization, rather than running the analysis as a separate step. Check out the documentation for checker_cfg, enable it and put in advanced mode. Pay attention to run it with a low number of images. I run it with 200 images and it took somewhere around 10 hrs.

Martin_Grossbichler · March 21, 2025, 10:30pm

Hi, thank you, that worked for me!

Topic		Replies	Views
Problem with noise analysis of Python API General	10	146	April 1, 2025
Error in HailoAvgPool while optimizing model General hailo8	4	103	October 28, 2024
No available GPU in Hailo Software Suite Docker General	3	161	December 23, 2024
Hailo8 PCIe error corrected General hailort , hailo8 , error	2	46	November 8, 2024
Hailo Optimization Fine-Tuning not using GPU? General dfc	4	99	March 17, 2025