Big drop in mAP after adding new images to the model.

Chuck_Rhodes · March 9, 2026, 2:59pm

Hi guys,

So I have my model training pipeline well established, I did few iterations of a model and it always gave me increase in model’s mAP.

Couple of days ago, as usual, I’ve added some new images, and trained my YOLOv8 model. I’ve evaluated .pt and .onnx and they are great. However, after compiling the model to .hef I’ve got awful results. I did not change anything in the .alls scripts, so I was very surprised.

After running a lot of training sessions, I’ve pinpointed the problem to quantization(?).

If I run the model compilation with this script:

normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv42, sigmoid)
change_output_activation(conv53, sigmoid)
change_output_activation(conv63, sigmoid)
nms_postprocess("../../postprocess_config/yolov8s_nms_config.json", meta_arch=yolov8, engine=cpu)
model_optimization_flavor(optimization_level=0, compression_level=0)
model_optimization_config(calibration, calibset_size=2500)

I am getting very good mAP but the model is very slow.

If I run this (so the only change is optimization_level=2)

normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv42, sigmoid)
change_output_activation(conv53, sigmoid)
change_output_activation(conv63, sigmoid)
nms_postprocess("../../postprocess_config/yolov8s_nms_config.json", meta_arch=yolov8, engine=cpu)
model_optimization_flavor(optimization_level=2, compression_level=0)
model_optimization_config(calibration, calibset_size=2500)

The model is fast but very poor in terms of mAP.

I did almost try every trick described in Hailo DFC v3.32.0 manual in section " Debugging Accuracy".

I’ve also tried forums and applied: quantization_param([conv42, conv53, conv63], force_range_out=[0.0, 1.0]) but it does not seem to help.

I have two text files. One comes from compiling on an onnx model that was trained prior addition of new images, and other is after adding the images. Unfortunately the forum does not accept uploading .txt files

Maybe someone could point to on what part of the compiling pipeline should I focus, to improve my results?

Regs,

Chuck

Michael · March 9, 2026, 3:33pm

Hi @Chuck_Rhodes ,

This might be is a classic quantization sensitivity issue that emerged after retraining. The fact that optimization_level=0 (essentially minimal quantization) works fine but optimization_level=2 doesn’t confirms that certain layers in the retrained model have weight/activation distributions that don’t quantize well to INT8.

Adding new images likely shifted the weight and activation distributions in some layers. Even though the floating-point model performs well, the new distributions may have:

Wider dynamic ranges in certain convolutions, making uniform INT8 quantization lose precision
Outlier activations triggered by the new data that skew the calibration statistics
Different output ranges on the detection heads (conv42/53/63) that the previous quantization parameters handled fine but now clip or saturate.

The root cause is that the retrained weights have distributions that are less “quantization-friendly” in specific layers. The fix is finding those layers (via the compilation logs diff and numeric emulation) and applying selective higher precision to them.

Thanks,

Chuck_Rhodes · March 9, 2026, 4:38pm

Hi @Michael

Thank you, for your answer. What specific information in the logs should I look for? Is there a debug log level when compilation happens?

I’ve compared my two logs and the only difference that is somewhat informative for me is here:

good:

[info] No shifts available for layer yolov8s/conv24/conv_op, using max shift instead. delta=0.0283                                       
[info] No shifts available for layer yolov8s/conv24/conv_op, using max shift instead. delta=0.0141                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.7938                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=1.2565                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.6282                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.3969                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.3969                                       
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.6282                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv58/conv_op, using max shift instead. delta=1.0215                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.6166
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.6166                                                                                                                                                                      [1025/1859]
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.3083
[info] No shifts available for layer yolov8s/conv58/conv_op, using max shift instead. delta=0.2025
[info] No shifts available for layer yolov8s/conv58/conv_op, using max shift instead. delta=0.2025
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.3083
I0000 00:00:1772807354.347536 2542335 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22337 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1772807356.705224 2542335 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22337 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune
Epoch 1/4
E0000 00:00:1772807405.987497 2542335 meta_optimizer.cc:966] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inStatefulPartitionedCall/SelectV2_6-2-TransposeNHWCToNCHW-LayoutOptimizer
I0000 00:00:1772807410.685034 2543008 cuda_dnn.cc:529] Loaded cuDNN version 90300
128/128 ━━━━━━━━━━━━━━━━━━━━ 97s 370ms/step - _distill_loss_yolov8s/conv35: 0.1784 - _distill_loss_yolov8s/conv41: 0.1157 - _distill_loss_yolov8s/conv42: 0.0931 - _distill_loss_yolov8s/conv46: 0.1628 - _distill_loss_yolov8s/conv52: 0.1300 - _distill_loss_yolov8s/conv53: 0.29
42 - _distill_loss_yolov8s/conv57: 0.2092 - _distill_loss_yolov8s/conv62: 0.1177 - _distill_loss_yolov8s/conv63: 1.0000 - total_distill_loss: 2.3011  
Epoch 2/4
128/128 ━━━━━━━━━━━━━━━━━━━━ 47s 371ms/step - _distill_loss_yolov8s/conv35: 0.5714 - _distill_loss_yolov8s/conv41: 0.3852 - _distill_loss_yolov8s/conv42: 0.3315 - _distill_loss_yolov8s/conv46: 0.5897 - _distill_loss_yolov8s/conv52: 0.4431 - _distill_loss_yolov8s/conv53: 1.28
77 - _distill_loss_yolov8s/conv57: 0.5755 - _distill_loss_yolov8s/conv62: 0.4408 - _distill_loss_yolov8s/conv63: 1.0000 - total_distill_loss: 5.6248
Epoch 3/4
128/128 ━━━━━━━━━━━━━━━━━━━━ 47s 370ms/step - _distill_loss_yolov8s/conv35: 0.6490 - _distill_loss_yolov8s/conv41: 0.4409 - _distill_loss_yolov8s/conv42: 0.3339 - _distill_loss_yolov8s/conv46: 0.7035 - _distill_loss_yolov8s/conv52: 0.4835 - _distill_loss_yolov8s/conv53: 0.90
45 - _distill_loss_yolov8s/conv57: 0.6374 - _distill_loss_yolov8s/conv62: 0.4149 - _distill_loss_yolov8s/conv63: 1.0000 - total_distill_loss: 5.5675
Epoch 4/4
128/128 ━━━━━━━━━━━━━━━━━━━━ 47s 370ms/step - _distill_loss_yolov8s/conv35: 0.5815 - _distill_loss_yolov8s/conv41: 0.3879 - _distill_loss_yolov8s/conv42: 0.2820 - _distill_loss_yolov8s/conv46: 0.6168 - _distill_loss_yolov8s/conv52: 0.4092 - _distill_loss_yolov8s/conv53: 0.78
92 - _distill_loss_yolov8s/conv57: 0.5275 - _distill_loss_yolov8s/conv62: 0.3188 - _distill_loss_yolov8s/conv63: 1.0000 - total_distill_loss: 4.9128
[info] Model Optimization Algorithm Quantization-Aware Fine-Tuning is done (completion time is 00:04:01.57)

bad:

[info] No shifts available for layer yolov8s/conv24/conv_op, using max shift instead. delta=0.2283
[info] No shifts available for layer yolov8s/conv24/conv_op, using max shift instead. delta=0.1141
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.4500
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.1300
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.0650
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.2250
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.2250
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.0650
[info] No shifts available for layer yolov8s/conv58/conv_op, using max shift instead. delta=0.3486
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.3552
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.1776
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.1776
I0000 00:00:1772816592.100962  255937 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22337 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1772816594.455878  255937 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22337 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune
Epoch 1/4
E0000 00:00:1772816643.852169  255937 meta_optimizer.cc:966] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inStatefulPartitionedCall/SelectV2_6-2-TransposeNHWCToNCHW-LayoutOptimizer
I0000 00:00:1772816648.590331  256740 cuda_dnn.cc:529] Loaded cuDNN version 90300
128/128 ━━━━━━━━━━━━━━━━━━━━ 97s 368ms/step - _distill_loss_yolov8s/conv35: 0.1965 - _distill_loss_yolov8s/conv41: 0.1219 - _distill_loss_yolov8s/conv42: 0.0816 - _distill_loss_yolov8s/conv46: 0.1739 - _distill_loss_yolov8s/conv52: 0.1020 - _distill_loss_yolov8s/conv53: 0.16
03 - _distill_loss_yolov8s/conv57: 0.1769 - _distill_loss_yolov8s/conv62: 0.1133 - _distill_loss_yolov8s/conv63: 0.7317 - total_distill_loss: 1.8579  
Epoch 2/4
128/128 ━━━━━━━━━━━━━━━━━━━━ 47s 369ms/step - _distill_loss_yolov8s/conv35: 0.4545 - _distill_loss_yolov8s/conv41: 0.2997 - _distill_loss_yolov8s/conv42: 0.2590 - _distill_loss_yolov8s/conv46: 0.4643 - _distill_loss_yolov8s/conv52: 0.3041 - _distill_loss_yolov8s/conv53: 0.73
26 - _distill_loss_yolov8s/conv57: 0.4782 - _distill_loss_yolov8s/conv62: 0.3384 - _distill_loss_yolov8s/conv63: 0.8819 - total_distill_loss: 4.2127
Epoch 3/4
128/128 ━━━━━━━━━━━━━━━━━━━━ 47s 370ms/step - _distill_loss_yolov8s/conv35: 0.6585 - _distill_loss_yolov8s/conv41: 0.4197 - _distill_loss_yolov8s/conv42: 0.3516 - _distill_loss_yolov8s/conv46: 0.6161 - _distill_loss_yolov8s/conv52: 0.3891 - _distill_loss_yolov8s/conv53: 15.1
853 - _distill_loss_yolov8s/conv57: 0.5916 - _distill_loss_yolov8s/conv62: 0.3872 - _distill_loss_yolov8s/conv63: 0.9723 - total_distill_loss: 19.5715
Epoch 4/4
128/128 ━━━━━━━━━━━━━━━━━━━━ 47s 370ms/step - _distill_loss_yolov8s/conv35: 0.7295 - _distill_loss_yolov8s/conv41: 0.4663 - _distill_loss_yolov8s/conv42: 0.4206 - _distill_loss_yolov8s/conv46: 0.6956 - _distill_loss_yolov8s/conv52: 0.3964 - _distill_loss_yolov8s/conv53: 0.87
37 - _distill_loss_yolov8s/conv57: 0.6334 - _distill_loss_yolov8s/conv62: 0.4106 - _distill_loss_yolov8s/conv63: 0.9390 - total_distill_loss: 5.5651
[info] Model Optimization Algorithm Quantization-Aware Fine-Tuning is done (completion time is 00:04:01.23)

Michael · March 9, 2026, 5:52pm

@Chuck_Rhodes ,

Hi,

Good model:
conv53: 0.9045 / total_distill_loss: 5.5675

Bad model:
conv53: 15.1853 / total_distill_loss: 19.5715

The distillation loss for conv53 explodes to 15x in the bad model. This means the quantized conv53 cannot approximate the float model’s output - the QAF process tries to recover but it’s too far gone. By epoch 4 the loss drops back to 0.87 but the optimization likely settled in a bad spot.

conv53 is your medium-scale detection head, so this directly tanks detection accuracy.

Try adding this to your .alls:

quantization_param(conv53, precision_mode=a16_w16)

If that recovers mAP, you can experiment with a8_w16 for conv53 to get some speed back. You might also want to bump the layers feeding into conv53 (conv52, conv46 area) to higher precision if conv53 alone isn’t enough.

Also worth noting - in the “bad” log, conv58 and conv59 shift deltas are noticeably different from the good model too (conv58: 0.35 vs 1.02, conv59: 0.35/0.18 vs 0.62/0.31). These are in the large-scale head path. Might be worth keeping an eye on those if fixing conv53 alone doesn’t fully close the gap.

Chuck_Rhodes · March 10, 2026, 1:05pm

Hi @Michael

many thanks for your reply, unfortunately adding quantization_param did not help.
First I’ve tried only with conv53, then I’ve added conv62.

Here are the results for the latest configuration:

normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv42, sigmoid)
change_output_activation(conv53, sigmoid)
change_output_activation(conv63, sigmoid)
quantization_param(conv53, precision_mode=a16_w16)
quantization_param(conv63, precision_mode=a16_w16)
nms_postprocess("../../postprocess_config/yolov8s_nms_config.json", meta_arch=yolov8, engine=cpu)
performance_param(compiler_optimization_level=max)
model_optimization_flavor(optimization_level=2, compression_level=0)
model_optimization_config(calibration, calibset_size=2707)

The log:

[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:46.03)                                                                                                                                                                                  
[info] Starting Fix zp_comp Encoding                                                                                                     
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)                                        
[info] Matmul Equalization skipped                                                                                                       
[info] Starting MatmulDecomposeFix                                                                                                       
[info] Model Optimization Algorithm MatmulDecomposeFix is done (completion time is 00:00:00.00)                                          
[info] No shifts available for layer yolov8s/conv24/conv_op, using max shift instead. delta=0.2283                                       
[info] No shifts available for layer yolov8s/conv24/conv_op, using max shift instead. delta=0.1141                                       
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.4500                                       
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.1300                                       
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.0650                                       
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.2250                                       
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.2250                                       
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.0650                                       
[info] No shifts available for layer yolov8s/conv58/conv_op, using max shift instead. delta=0.3486
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.3552                                       
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.1776                                       
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.1776                                       
I0000 00:00:1773141217.334940 3935362 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22337 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1773141219.705404 3935362 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22337 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
[info] Finetune encoding skipped                                                                                                         
[info] Bias Correction skipped                                                                                                           
[info] Adaround skipped                                                                                                                  
[info] Starting Quantization-Aware Fine-Tuning                                                                                           
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results                                                                                                                                                                    
[info] Using dataset with 1024 entries for finetune                 
Epoch 1/4                                                           
E0000 00:00:1773141269.518564 3935362 meta_optimizer.cc:966] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inStatefulPartitionedCall/SelectV2_6-2-TransposeNHWCToNCHW-LayoutOptimizer
I0000 00:00:1773141274.331861 3936088 cuda_dnn.cc:529] Loaded cuDNN version 90300
128/128 ━━━━━━━━━━━━━━━━━━━━ 98s 369ms/step - _distill_loss_yolov8s/conv35: 0.2298 - _distill_loss_yolov8s/conv41: 0.1431 - _distill_loss_yolov8s/conv42: 0.0886 - _distill_loss_yolov8s/conv46: 0.2101 - _distill_loss_yolov8s/conv52: 0.1249 - _distill_loss_yolov8s/conv53: 0.29
37 - _distill_loss_yolov8s/conv57: 0.2162 - _distill_loss_yolov8s/conv62: 0.1470 - _distill_loss_yolov8s/conv63: 0.7031 - total_distill_loss: 2.1564  
Epoch 2/4                                                                                                                                
128/128 ━━━━━━━━━━━━━━━━━━━━ 47s 370ms/step - _distill_loss_yolov8s/conv35: 0.8965 - _distill_loss_yolov8s/conv41: 0.6011 - _distill_loss_yolov8s/conv42: 0.7316 - _distill_loss_yolov8s/conv46: 0.8779 - _distill_loss_yolov8s/conv52: 0.6166 - _distill_loss_yolov8s/conv53: 53.6
770 - _distill_loss_yolov8s/conv57: 0.9190 - _distill_loss_yolov8s/conv62: 0.7432 - _distill_loss_yolov8s/conv63: 53.4102 - total_distill_loss: 112.4730  
Epoch 3/4                                                                                                                                
128/128 ━━━━━━━━━━━━━━━━━━━━ 47s 370ms/step - _distill_loss_yolov8s/conv35: 1.0693 - _distill_loss_yolov8s/conv41: 0.6999 - _distill_loss_yolov8s/conv42: 0.8397 - _distill_loss_yolov8s/conv46: 1.0556 - _distill_loss_yolov8s/conv52: 0.7401 - _distill_loss_yolov8s/conv53: 1.00
00 - _distill_loss_yolov8s/conv57: 1.0830 - _distill_loss_yolov8s/conv62: 0.7802 - _distill_loss_yolov8s/conv63: 1.0000 - total_distill_loss: 8.2678
Epoch 4/4                                                                                                                                
128/128 ━━━━━━━━━━━━━━━━━━━━ 48s 371ms/step - _distill_loss_yolov8s/conv35: 1.0630 - _distill_loss_yolov8s/conv41: 0.6860 - _distill_loss_yolov8s/conv42: 0.8072 - _distill_loss_yolov8s/conv46: 1.0515 - _distill_loss_yolov8s/conv52: 0.7365 - _distill_loss_yolov8s/conv53: 1.00
00 - _distill_loss_yolov8s/conv57: 1.0717 - _distill_loss_yolov8s/conv62: 0.7515 - _distill_loss_yolov8s/conv63: 1.0000 - total_distill_loss: 8.1674

If anything, the addition of these params made things worse?

And here are stats for my model:

No optimization:

                          Images      Vehicles    BG        mAP50   mAP50-95
Test                      676 (100%)  1887 (100%) 3 (0%)    98.6    87.6

With quantization_param(conv53, precision_mode=a16_w16)

                          Images      Vehicles    BG        mAP50   mAP50-95
Test                      676 (100%)  1887 (100%) 3 (0%)    13.7    11.9

With param a16_w16 for conv53, conv62

                          Images      Vehicles    BG        mAP50   mAP50-95
Test                      676 (100%)  1887 (100%) 3 (0%)    35.6    25.5

Vanilla config (nothing added)

                          Images      Vehicles    BG        mAP50   mAP50-95
Test                      676 (100%)  1887 (100%) 3 (0%)    37.3    26.6

quantization_param([conv42, conv53, conv63], force_range_out=[0.0, 1.0])

                          Images      Vehicles    BG        mAP50   mAP50-95
Test                      676 (100%)  1887 (100%) 3 (0%)    62.0    47.4

Any other ideas?

Thanks,
Chuck

Michael · March 10, 2026, 8:41pm

Hi @Chuck_Rhodes,

Thanks for testing that so thoroughly - this is really useful data.

In hindsight, my initial suggestion to apply a16_w16 on isolated detection heads was not the right approach for your case.

Your force_range_out=[0.0, 1.0] experiment going from 37.3 → 62.0 mAP50 tells maybe the issue is bad range estimation during calibration, not insufficient bit-width. Since those layers have sigmoid activations (true range [0, 1]), the calibration was estimating a wider range and wasting quantization bins on empty space.

Here’s what I’d suggest next - keep force_range_out and bump to optimization_level=4, because logs show Bias Correction, Adaround, and Finetune Encoding are all being skipped . Level 4 enables these passes, which should address the range/distribution issues model is hitting. Combined with the correct forced ranges on the sigmoid heads, hopefully this should close a good chunk of the remaining gap.

quantization_param([conv42, conv53, conv63], force_range_out=[0.0, 1.0])
model_optimization_flavor(optimization_level=4, compression_level=0)

Thanks,

Chuck_Rhodes · March 11, 2026, 12:29pm

Hi @Michael

It is me who is grateful for your help, I did try multiple optimization levels but only 0 and 2 are working for me. Here is the error after setting it to 4:

info] No shifts available for layer yolov8s/conv24/conv_op, using max shift instead. delta=0.2283
[info] No shifts available for layer yolov8s/conv24/conv_op, using max shift instead. delta=0.1141                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.4500
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.1300                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.0650
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.2250                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv56/conv_op, using max shift instead. delta=0.2250
[info] No shifts available for layer yolov8s/conv55/conv_op, using max shift instead. delta=0.0650                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv58/conv_op, using max shift instead. delta=0.3486
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.3552                                                                                                                                                                                 
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.1776
[info] No shifts available for layer yolov8s/conv59/conv_op, using max shift instead. delta=0.1776                                                                                                                                                                                 
I0000 00:00:1773231826.885957 1259021 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22337 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
I0000 00:00:1773231829.252390 1259021 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22337 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
[info] Finetune encoding skipped
[info] Bias Correction skipped                                                                                                                                                                                                                                                     
[warning] Dataset is larger than dataset_size in Adaround. Increasing the algorithm dataset size might improve the results
[info] Starting Adaround                                                                                                                 
[info] The algorithm Adaround will use up to 31.88 GB of storage space                  
[info] Using dataset with 1024 entries for Adaround                                                                                                                                                                                                                                
[info] Using dataset with 64 entries for bias correction
Adaround:   0%|                                                                                                                                                                                             | 0/73 [00:00<?, ?blocks/s, Layers=['yolov8s/normalization1_output_0']]
Traceback (most recent call last):
  File "/home/marcinc/.pyenv/versions/HailoDFC_3.32/bin/hailomz", line 33, in <module>
    sys.exit(load_entry_point('hailo-model-zoo', 'console_scripts', 'hailomz')())
  File "/opt/kiowa/hailo_model_zoo/hailo_model_zoo/main.py", line 122, in main                                                                                                                                                                                                     
    run(args)                       
  File "/opt/kiowa/hailo_model_zoo/hailo_model_zoo/main.py", line 111, in run                                                                                                                                                                                                      
    return handlers[args.command](args)              
  File "/opt/kiowa/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 248, in compile                                                                                                                                                                                           
    _ensure_optimized(runner, logger, args, network_info)
  File "/opt/kiowa/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 91, in _ensure_optimized
    optimize_model(                                                                                                                                                                                                                                                                  File "/opt/kiowa/hailo_model_zoo/hailo_model_zoo/core/main_utils.py", line 353, in optimize_model
    runner.optimize(calib_feed_callback)
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2206, in optimize
    result = self._optimize(
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2025, in _optimize
    checkpoint_info = self._sdk_backend.full_quantization(
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1196, in full_quantization
    new_checkpoint_info = self._full_acceleras_run(
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1434, in _full_acceleras_run
    new_checkpoint_info = self._optimization_flow_runner(optimization_flow, checkpoint_info)
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 2088, in _optimization_flow_runner
    optimization_flow.run()
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 239, in wrapper
    return func(self, *args, **kwargs)
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 357, in run
    step_func()
  File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_model_optimization/tools/subprocess_wrapper.py", line 140, in parent_wrapper
    raise SubprocessTracebackFailure(*child_messages)
hailo_model_optimization.acceleras.utils.acceleras_exceptions.SubprocessTracebackFailure: Subprocess failed with exception: in user code: 

    File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/hailo_model_optimization/algorithms/block_by_block/block_by_block.py", line 217, in call_block  *
        result = block_model(inputs)
    File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler  **
        raise e.with_traceback(filtered_tb) from None
    File "/home/marcinc/.pyenv/versions/3.10.6/envs/HailoDFC_3.32/lib/python3.10/site-packages/keras/src/layers/layer.py", line 1717, in update_shapes_dict_for_target_fn
        raise ValueError(

    ValueError: For a `build()` method with more than one argument, all arguments should have a `_shape` suffix and match an argument from `call()`. E.g. `build(self, foo_shape, bar_shape)`  For layer 'HailoModel', Received `build()` argument `self`, which does not end in `_
shape`.

I am using DFC in version 3.32 if that helps.

Michael · March 11, 2026, 3:15pm

Hi @Chuck_Rhodes,

That looks like a potential Keras 3.x compatibility issue in DFC 3.32’s Adaround implementation - the build() method signature changed in Keras 3. That’s why it works at level 2 (Adaround is skipped) but crashes at level 4.
While we are checking internally, In the meantime maybe downgrading Keras in your DFC environment pip install "keras<3.0" will work.

Thanks,

Chuck_Rhodes · March 24, 2026, 1:27pm

Hi @Michael

I’ve tried to downgrade keras, but the problems are pilling up:

(HailoDFC_3.32_test) marcinc@delores:/opt/kiowa/hailo_model_zoo$ pip install "keras<3.0"
Collecting keras<3.0                                                                                                                                                                                                                                                               
  Using cached keras-2.15.0-py3-none-any.whl (1.7 MB)                                                                                                                                                                                                                              
Installing collected packages: keras                                                                                                                                                                                                                                               
  Attempting uninstall: keras
    Found existing installation: keras 3.5.0                                                                                                                                                                                                                                       
    Uninstalling keras-3.5.0:                 
      Successfully uninstalled keras-3.5.0                                                                                                                                                                                                                                         
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.18.0 requires keras>=3.5.0, but you have keras 2.15.0 which is incompatible.                                                                                                                                                                                          hailo-dataflow-compiler 3.32.0 requires keras==3.5.0, but you have keras 2.15.0 which is incompatible.

I tried to downgrade to tensorflow 2.15 but then I get undefined symbols during execution of DFC, so this is a dead end.

Michael · March 24, 2026, 8:25pm

Hi @Chuck_Rhodes ,

We are investigating a possible solution.
I’ll come back to you here ASAP.

Thanks,