Error during quantization of custom model: "Shift delta is larger than 2, cannot quantize"

Hi there,

I’m trying to convert a LRASPP model (LRASPP — Torchvision main documentation (pytorch.org). Here is the description of the onnx:

The conversion to a first .har file works fine. Our model was trained applying data normalization to the dataset (a simple /255), and indeed I get the same results as PyTorch feeding normalized data to the .har file.

I guess I have to specify a normalization in the model allocation script, for the optimization step.

If I use this line in the .alls:

normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])

I get this log in the optimization step:

[info] Loading model script to net_20k--lrsapp--b24-pretrain-pretrain from test.alls
[info] Starting Model Optimization
2024-10-02 10:09:22.576777: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1279] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-10-02 10:09:22.655374: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]
[info] Using default optimization level of 2
[info] Using default compression level of 1
[info] Model received quantization params from the hn
2024-10-02 10:09:23.264263: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1279] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-10-02 10:09:24.452911: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8900
2024-10-02 10:09:24.712823: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.03)
[info] Starting Stats Collector
[info] Using dataset with 64 entries for calibration
2024-10-02 10:09:27.719083: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]
Calibration:   0%|          | 0/64 [00:00<?, ?entries/s]2024-10-02 10:09:27.837390: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]
2024-10-02 10:09:27.878200: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  12%|█▎        | 8/64 [00:17<02:05,  2.24s/entries]2024-10-02 10:09:45.693065: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  25%|██▌       | 16/64 [00:18<00:44,  1.08entries/s]2024-10-02 10:09:45.813611: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  38%|███▊      | 24/64 [00:18<00:20,  1.96entries/s]2024-10-02 10:09:45.924852: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  50%|█████     | 32/64 [00:18<00:10,  3.18entries/s]2024-10-02 10:09:46.035461: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  62%|██████▎   | 40/64 [00:18<00:04,  4.85entries/s]2024-10-02 10:09:46.146382: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  75%|███████▌  | 48/64 [00:18<00:02,  7.10entries/s]2024-10-02 10:09:46.255254: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration:  88%|████████▊ | 56/64 [00:18<00:00, 10.07entries/s]2024-10-02 10:09:46.364852: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [8,480,640,3]
	 [[{{node Placeholder/_0}}]]
Calibration: 100%|██████████| 64/64 [00:18<00:00,  3.43entries/s]
[info] Stats Collector is done (completion time is 00:00:19.09)
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/conv1/conv_op, using max shift instead. delta=4.4762826808642195
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/conv1/conv_op, using max shift instead. delta=2.238141349472385

File /local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/acceleras/atomic_ops/avgpool_op.py:93, in AvgPoolOp.create_hw_params(self, max_final_accumulator_by_channel)
     91 if shift_delta > 0:
     92     if shift_delta > 2 and self._ignore_hw_limitation_assertion != IgnoreHwLimitationAssertionPolicy.enabled:
---> 93         raise AccelerasNumerizationError(
     94             f'Shift delta in {self.name} is larger than 2 ({shift_delta:.2f}), cannot quantize.'
     95             'A possible solution is to use a pre-quantization model script command to reduce global '
     96             'average-pool spatial dimensions, please refer to the user guide for more info.')
     97     # HW can't provide a shift large enough to avoid final accumulator overflow,
     98     #  we need smaller numeric values by making kernel range wider
     99     shift_delta = np.ceil(shift_delta)

AccelerasNumerizationError: Shift delta in net_20k--lrsapp--b24-pretrain-pretrain/avgpool1/avgpool_op is larger than 2 (3.52), cannot quantize.A possible solution is to use a pre-quantization model script command to reduce global average-pool spatial dimensions, please refer to the user guide for more info.

If I apply this in the model script (to all avgpools):

pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[4, 4])

The error shifts to:

[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/conv1/conv_op, using max shift instead. delta=4.4762826808642195
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/conv1/conv_op, using max shift instead. delta=2.238141349472385
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/fc8/dense_op, using max shift instead. delta=0.010795199451855808
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/fc8/dense_op, using max shift instead. delta=0.005397650699886292
/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/acceleras/utils/opt_utils.py:87: RuntimeWarning: divide by zero encountered in log2
  desired_pre_acc_shift = np.log2(expected_max_accumulator / accumulator_max_val) + shift_buffer
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/fc12/dense_op, using max shift instead. delta=0.45969565700888193
[info] No shifts available for layer net_20k--lrsapp--b24-pretrain-pretrain/fc12/dense_op, using max shift instead. delta=0.22984791716669406

...
File /local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/algorithms/create_hw_params/create_hw_params.py:180, in CreateHWParamsWithMatch._log_negative_exponent_shift(self, layer, fix_shift)
    177     self._logger.warning(f"{log_msg} (More than half)")
    178 elif layer.activation_atomic_op.assertion_negative_slope():
    179     # Required fix shift is more than the output bits (will zero out the results)
--> 180     raise NegativeSlopeExponentNonFixable(output_bits=output_bits, fix_shift=fix_shift, lname=layer.name)

NegativeSlopeExponentNonFixable: Quantization failed in layer net_20k--lrsapp--b24-pretrain-pretrain/fc13 due to unsupported required slope. Desired shift is 47.0, but op has only 8 data bits. This error raises when the data or weight range are not balanced. Mostly happens when using random calibration-set/weights, the calibration-set is not normalized properly or batch-normalization was not used during training.

Any tips on how to proceed? I tried to compile the model without any normalization in the .alls, and everything goes smoothly all the way to the .hef file. But the output of the network doesn’t make much sense, even if I feed normalized data (it’s calibrated on unnormalized data).