Error during quantization of custom model: "Shift delta is larger than 2, cannot quantize"

matteo.mastrogiusepp · October 2, 2024, 8:24am

Hi there,

I’m trying to convert a LRASPP model (LRASPP — Torchvision main documentation (pytorch.org). Here is the description of the onnx:

The conversion to a first .har file works fine. Our model was trained applying data normalization to the dataset (a simple /255), and indeed I get the same results as PyTorch feeding normalized data to the .har file.

I guess I have to specify a normalization in the model allocation script, for the optimization step.

If I use this line in the .alls:

normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])

I get this log in the optimization step:

[info] Loading model script to net_20k--lrsapp--b24-pretrain-pretrain from test.alls
[info] Starting Model Optimization
2024-10-02 10:09:22.576777: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1279] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-10-02 10:09:22.655374: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]
[info] Using default optimization level of 2
[info] Using default compression level of 1
[info] Model received quantization params from the hn
2024-10-02 10:09:23.264263: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1279] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-10-02 10:09:24.452911: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8900
2024-10-02 10:09:24.712823: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.03)
[info] Starting Stats Collector
[info] Using dataset with 64 entries for calibration
2024-10-02 10:09:27.719083: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]
Calibration:   0%|          | 0/64 [00:00<?, ?entries/s]2024-10-02 10:09:27.837390: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]
2024-10-02 10:09:27.878200: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  12%|█▎        | 8/64 [00:17<02:05,  2.24s/entries]2024-10-02 10:09:45.693065: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  25%|██▌       | 16/64 [00:18<00:44,  1.08entries/s]2024-10-02 10:09:45.813611: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  38%|███▊      | 24/64 [00:18<00:20,  1.96entries/s]2024-10-02 10:09:45.924852: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  50%|█████     | 32/64 [00:18<00:10,  3.18entries/s]2024-10-02 10:09:46.035461: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  62%|██████▎   | 40/64 [00:18<00:04,  4.85entries/s]2024-10-02 10:09:46.146382: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  75%|███████▌  | 48/64 [00:18<00:02,  7.10entries/s]2024-10-02 10:09:46.255254: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  88%|████████▊ | 56/64 [00:18<00:00, 10.07entries/s]2024-10-02 10:09:46.364852: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration: 100%|██████████| 64/64 [00:18<00:00,  3.43entries/s]
[info] Stats Collector is done (completion time is 00:00:19.09)
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/conv1/conv_op, using max shift instead. delta=4.4762826808642195
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/conv1/conv_op, using max shift instead. delta=2.238141349472385

File /local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/acceleras/atomic_ops/avgpool_op.py:93, in AvgPoolOp.create_hw_params(self, max_final_accumulator_by_channel)
     91 if shift_delta > 0:
     92     if shift_delta > 2 and self._ignore_hw_limitation_assertion != IgnoreHwLimitationAssertionPolicy.enabled:
---> 93         raise AccelerasNumerizationError(
     94             f'Shift delta in {self.name} is larger than 2 ({shift_delta:.2f}), cannot quantize.'
     95             'A possible solution is to use a pre-quantization model script command to reduce global '
     96             'average-pool spatial dimensions, please refer to the user guide for more info.')
     97     # HW can't provide a shift large enough to avoid final accumulator overflow,
     98     #  we need smaller numeric values by making kernel range wider
     99     shift_delta = np.ceil(shift_delta)

AccelerasNumerizationError: Shift delta in net_20k--lrsapp--b24-pretrain-pretrain/avgpool1/avgpool_op is larger than 2 (3.52), cannot quantize.A possible solution is to use a pre-quantization model script command to reduce global average-pool spatial dimensions, please refer to the user guide for more info.

If I apply this in the model script (to all avgpools):

pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[4, 4])

The error shifts to:

[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/conv1/conv_op, using max shift instead. delta=4.4762826808642195
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/conv1/conv_op, using max shift instead. delta=2.238141349472385
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/fc8/dense_op, using max shift instead. delta=0.010795199451855808
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/fc8/dense_op, using max shift instead. delta=0.005397650699886292
/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/acceleras/utils/opt_utils.py:87: RuntimeWarning: divide by zero encountered in log2
  desired_pre_acc_shift = np.log2(expected_max_accumulator / accumulator_max_val) + shift_buffer
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/fc12/dense_op, using max shift instead. delta=0.45969565700888193
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/fc12/dense_op, using max shift instead. delta=0.22984791716669406

...
File /local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/algorithms/create_hw_params/create_hw_params.py:180, in CreateHWParamsWithMatch._log_negative_exponent_shift(self, layer, fix_shift)
    177     self._logger.warning(f"{log_msg} (More than half)")
    178 elif layer.activation_atomic_op.assertion_negative_slope():
    179     # Required fix shift is more than the output bits (will zero out the results)
--> 180     raise NegativeSlopeExponentNonFixable(output_bits=output_bits, fix_shift=fix_shift, lname=layer.name)

NegativeSlopeExponentNonFixable: Quantization failed in layer net_20k--lrsapp--b24-pretrain-pretrain/fc13 due to unsupported required slope. Desired shift is 47.0, but op has only 8 data bits. This error raises when the data or weight range are not balanced. Mostly happens when using random calibration-set/weights, the calibration-set is not normalized properly or batch-normalization was not used during training.

Any tips on how to proceed? I tried to compile the model without any normalization in the .alls, and everything goes smoothly all the way to the .hef file. But the output of the network doesn’t make much sense, even if I feed normalized data (it’s calibrated on unnormalized data).

omria · October 6, 2024, 7:57am

Hey @matteo.mastrogiusepp

Welcome to the Hailo Community!

It looks like the optimization errors you’re encountering during the model quantization are related to shift delta issues, particularly in certain layers like avgpool and fc. Here are a few suggestions to resolve this:

Normalization: You mentioned using this normalization in the .alls file:

normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])

Since your model was trained with /255 normalization, make sure this is consistently applied during both the optimization and inference stages. Inconsistent normalization can cause quantization issues. Double-check if the dataset normalization matches the training process.
2. Pre-Quantization Optimization: The error with avgpool_op suggests trying a pre-quantization step. You already tried:

pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[4, 4])

If this didn’t fully resolve the issue, try adjusting the division_factors (e.g., [2, 2]), and make sure all relevant avgpool layers are included in this step.
3. Global Average Pooling: Global average pooling can often lead to quantization challenges due to high variance in activation ranges. You might want to reduce the spatial dimensions before this layer, which should help avoid large shift delta values.
4. Calibration Set: The error logs suggest that the calibration set might not match the required input dimensions ([8,480,640,3]). Ensure the calibration set is correctly normalized and represents the training data accurately.
5. Shift Delta and Layer Quantization: Layers like fc13 are having shift delta issues. You could apply additional pre-quantization optimizations or adjust the weight ranges for these layers in the model script to resolve this.
6. Optimization Levels: If you are using default optimization levels (opt_level=2), try lowering them to ease the quantization process. Adjusting optimization and compression levels in the .alls file might help.

By verifying these aspects—particularly the calibration dataset, normalization, and optimization steps—you should be able to resolve the quantization issues. Let me know if you need further assistance with specific settings!

matteo.mastrogiusepp · October 9, 2024, 3:27pm

Hi Omria,

Thanks for the guidelines. I’ve tried them but I can’t seem to make progress.

Applying pre-quantization optimization leads to the negative slope issue that I listed above, so it’s not helpful.

Calibration set should be fine, I’ve also tried to use directly the training set.

If anyone has extra ideas I would be super grateful!

matteo.mastrogiusepp · October 21, 2024, 11:53am

I trained a new model, using better pre-trained weights as a starting point and fostering regularization.

I still get the shift_delta issue, but this time, after the pre-quantization optimization, the compilation process went smoothly (so no unsupported slope error).

Thanks for the help!

Topic		Replies	Views
Help with Custom ONNX Model Optimization – matmul2 Layer Error General error	2	52	April 26, 2025
Deployment of Dinov2 model from torch hub General error	3	219	May 9, 2025
Input normalization error General	2	85	January 13, 2025
Convert onnx to hef using hailomz General dfc , error	4	447	August 27, 2024
Questions about calib dataset General network	7	465	November 19, 2024

Error during quantization of custom model: "Shift delta is larger than 2, cannot quantize"

Related topics