Optimiziation warnings meaning

Anton_Kumaigorodskyi · January 25, 2025, 10:02pm

While following this guide I get the following messages and warnings on optimization step:

[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won’t be optimized and compression won’t be used) because there’s no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.64)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:57<00:00, 1.11entries/s]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:59.68)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Starting Matmul Equalization
[info] Model Optimization Algorithm Matmul Equalization is done (completion time is 00:00:00.02)
[info] activation fitting started for detector/reduce_sum_sftmax1/act_op
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[info] Using dataset with 1024 entries for finetune
Epoch 1/4
1/512 […]

Can someone please explain the following:

[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU Why is this happening? Can it be fixed somehow? I have an Nvidia GPU with drivers/CUDA/CUDNN installed on a system.
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results Similarly, is this normal or should it be fixed somehow?
Using dataset with 1024 entries for finetune I have randomly selected 1024 images from training dataset, is this enough?
Epoch 1/4 1/512 [..............................] What is going on at this step? Why 4 epochs?

Thanks!

trieut415 · January 25, 2025, 11:53pm

Look at the replies further down this post, this warning is due to incompatible CUDA drivers, so I reverted to 11.8 later down in this post. It also answers your question about generating a calibration dataset.

trieut415 · January 25, 2025, 11:54pm

In my guide*, sorry forgot to mention.

Anton_Kumaigorodskyi · January 26, 2025, 8:02am

Thanks for the reply! Reading your thread further (as well as other threads) was helpful, in the end I’ve decided to go the sw-suite Docker way and it removed the warnings.

But I still have a few questions mainly to improve my understanding, here’s my current output:

$ python convert.py
[info] Loading model script commands to drone_detector from string
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Using default compression level of 1
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Assigning 4bit weights to layer drone_detector/conv22 with 2359.30k parameters
[info] Assigning 4bit weights to layer drone_detector/conv32 with 2359.30k parameters
[info] Ratio of weights in 4bit is 0.24
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.68)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:35<00:00, 1.82entries/s]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:37.62)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Starting Matmul Equalization
[info] Model Optimization Algorithm Matmul Equalization is done (completion time is 00:00:00.02)
[info] activation fitting started for drone_detector/reduce_sum_softmax1/act_op
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[info] Using dataset with 1024 entries for finetune
Epoch 1/4

What I’d like to be cleared:

How do [info] Using default optimization level of 2 and [info] Using default compression level of 1 affect final .hef model inference? Let’s say I’m interested in retaining as much accuracy/precision as possible, even to detriment of inference speed, which values should I choose?
What exactly do these mean: [info] Assigning 4bit weights to layer drone_detector/conv22 with 2359.30k parameters, [info] Assigning 4bit weights to layer drone_detector/conv32 with 2359.30k parameters. I was under impression that I’m quantizing to INT8 level, but this sounds like something INT4 (?). Again I’m concerned about performance effects of this.

thanks!

petar.ristic · January 26, 2025, 2:52pm

You can edit the .alls file and change the parameters of the optimizations, go to the hailo DFC compiler documentation and search for optimization and 4bit with CTRL+F, should be able to find something there, with the default settings for your model it seems that it sets .24 of the weights to 4 bits, you can set them to 16 bits if you dont care about the inference speed. But the compilation will take A LONG time

Anton_Kumaigorodskyi · January 27, 2025, 9:30am

Thanks, I’ve read the PDF and updated my alls config, however new issues were encountered, would also appreciate a feedback on those.

First, my alls config with INT16 added for final detection heads, max optimization level and min compression level:

alls = """
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])

change_output_activation(conv74, sigmoid)

change_output_activation(conv90, sigmoid)

change_output_activation(conv105, sigmoid)

quantization_param(conv71, precision_mode=a16_w16)

quantization_param(conv87, precision_mode=a16_w16)

quantization_param(conv102, precision_mode=a16_w16)

quantization_param(conv74, precision_mode=a16_w16)

quantization_param(conv90, precision_mode=a16_w16)

quantization_param(conv105, precision_mode=a16_w16)

model_optimization_config(calibration, batch_size=16)

model_optimization_flavor(optimization_level=4, compression_level=0, batch_size=16)

post_quantization_optimization(finetune, policy=enabled, learning_rate=0.00001, dataset_size=4096)

nms_postprocess("nms_layer_config.json", meta_arch=yolov8, engine=cpu)
"""

Once I’ve started the script I saw this output at some point:

Adaround: 1%|▎ | 1/125 [00:16<34:27, 16.67s/blocks, Layers=[‘drone_detector/conv1_output_0’]][warning] DALI is not installed, using tensorflow dataset for layer by layer train. Using DALI will improve train time significantly. To install it use: pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110 nvidia-dali-tf-plugin-cuda110
[warning] Dataset isn’t shuffled without DALI. To remove this warning add the following model script command: post_quantization_optimization(adaround, shuffle=False)

I suppose shuffling is beneficial and I’d like faster train times so I did this:


$ pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110 nvidia-dali-tf-plugin-cuda110

But now I get the following exception:

[info] Using dataset with 1024 entries for Adaround
[info] Using dataset with 64 entries for bias correction
Adaround: 1%|▎ | 1/125 [01:53<3:54:52, 113.65s/blocks, Layers=[‘drone_detector/conv1_output_0’]]

Traceback (most recent call last):
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node _wrapped__MultiDeviceIteratorInit_device/job:localhost/replica:0/task:0/device:CPU:0}} TF device and DALI device mismatch. TF device: CPU, DALI device: GPU for output 0 [Op:MultiDeviceIteratorInit]

Any ideas what do I have to change to fix this?

petar.ristic · January 27, 2025, 8:10pm

Hmmm I haven’t encountered this error yet in my testing, this is odd, could you try disabling adaround? It shouldn’t impact the performance too much, I am not sure how to get past this

But maybe the issue could be with the drivers, see it says
nvidia-dali-cuda110
and who knows what CUDA drivers you are running, see if there are any newer Dali things… I am not really proficient to help Im so sorry

ade · February 18, 2025, 8:49am

Hello,
I have the same issue (TF device and DALI device mismatch. TF device: CPU, DALI device: GPU for output 0 [Op:MultiDeviceIteratorInit]).
Did you find a solution (still using adaround)?

Anton_Kumaigorodskyi · February 19, 2025, 6:58am

I have “solved” it by just using a lower optimization level:

alls = """
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv120, sigmoid)
change_output_activation(conv143, sigmoid)
change_output_activation(conv165, sigmoid)

model_optimization_config(calibration, batch_size=16)
model_optimization_flavor(optimization_level=1, compression_level=0, batch_size=16)
nms_postprocess("nms_layer_config.json", meta_arch=yolov8, engine=cpu)
performance_param(compiler_optimization_level=max)
"""

I did this because higher optimization levels (2+) resulted in nothing being detected at all, and this one turned to detect stuff about as well as base .pt model.

user115 · March 13, 2025, 5:18pm

Which version of the docker container did you use? I did the same and it doesn’t work for me. Details are very welcome!

Anton_Kumaigorodskyi · March 17, 2025, 1:45pm

The version is 2025-01. At which exact stage do you have issues?

user115 · March 19, 2025, 2:49am

Thanks @Anton_Kumaigorodskyi. I think my problem was just solved. The issue was that I’ve tried to use the docker container inside WLS. It seems that the pass-through of the GPU is not supported in this situation. But I will soon build a native Linux machine, which will hopefully fix this.

Topic		Replies	Views
Error in HailoAvgPool while optimizing model General hailo8	4	103	October 28, 2024
Optimize Problem General	4	70	April 23, 2025
Should I use as many images as possible for calibration? General network	5	435	December 11, 2024
HailoDFC Does Not Detect GPU General dfc , optimization , hailo8	4	135	June 12, 2025
Problem With Model Optimization General dfc , hailo8	44	2274	March 17, 2025

Optimiziation warnings meaning

Related topics