Optimiziation warnings meaning

While following this guide I get the following messages and warnings on optimization step:

[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won’t be optimized and compression won’t be used) because there’s no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.64)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:57<00:00, 1.11entries/s]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:59.68)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Starting Matmul Equalization
[info] Model Optimization Algorithm Matmul Equalization is done (completion time is 00:00:00.02)
[info] activation fitting started for detector/reduce_sum_sftmax1/act_op
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[info] Using dataset with 1024 entries for finetune
Epoch 1/4
1/512 […]

Can someone please explain the following:

  • [warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU Why is this happening? Can it be fixed somehow? I have an Nvidia GPU with drivers/CUDA/CUDNN installed on a system.
  • [warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results Similarly, is this normal or should it be fixed somehow?
  • Using dataset with 1024 entries for finetune I have randomly selected 1024 images from training dataset, is this enough?
  • Epoch 1/4 1/512 [..............................] What is going on at this step? Why 4 epochs?

Thanks!

Look at the replies further down this post, this warning is due to incompatible CUDA drivers, so I reverted to 11.8 later down in this post. It also answers your question about generating a calibration dataset.

1 Like

In my guide*, sorry forgot to mention.

Thanks for the reply! Reading your thread further (as well as other threads) was helpful, in the end I’ve decided to go the sw-suite Docker way and it removed the warnings.

But I still have a few questions mainly to improve my understanding, here’s my current output:

$ python convert.py
[info] Loading model script commands to drone_detector from string
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Using default compression level of 1
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Assigning 4bit weights to layer drone_detector/conv22 with 2359.30k parameters
[info] Assigning 4bit weights to layer drone_detector/conv32 with 2359.30k parameters
[info] Ratio of weights in 4bit is 0.24
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.68)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:35<00:00, 1.82entries/s]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:37.62)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Starting Matmul Equalization
[info] Model Optimization Algorithm Matmul Equalization is done (completion time is 00:00:00.02)
[info] activation fitting started for drone_detector/reduce_sum_softmax1/act_op
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[info] Using dataset with 1024 entries for finetune
Epoch 1/4

What I’d like to be cleared:

  • How do [info] Using default optimization level of 2 and [info] Using default compression level of 1 affect final .hef model inference? Let’s say I’m interested in retaining as much accuracy/precision as possible, even to detriment of inference speed, which values should I choose?
  • What exactly do these mean: [info] Assigning 4bit weights to layer drone_detector/conv22 with 2359.30k parameters, [info] Assigning 4bit weights to layer drone_detector/conv32 with 2359.30k parameters. I was under impression that I’m quantizing to INT8 level, but this sounds like something INT4 (?). Again I’m concerned about performance effects of this.

thanks!

You can edit the .alls file and change the parameters of the optimizations, go to the hailo DFC compiler documentation and search for optimization and 4bit with CTRL+F, should be able to find something there, with the default settings for your model it seems that it sets .24 of the weights to 4 bits, you can set them to 16 bits if you dont care about the inference speed. But the compilation will take A LONG time

1 Like

Thanks, I’ve read the PDF and updated my alls config, however new issues were encountered, would also appreciate a feedback on those.

First, my alls config with INT16 added for final detection heads, max optimization level and min compression level:

alls = """
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])

change_output_activation(conv74, sigmoid)

change_output_activation(conv90, sigmoid)

change_output_activation(conv105, sigmoid)

quantization_param(conv71, precision_mode=a16_w16)

quantization_param(conv87, precision_mode=a16_w16)

quantization_param(conv102, precision_mode=a16_w16)

quantization_param(conv74, precision_mode=a16_w16)

quantization_param(conv90, precision_mode=a16_w16)

quantization_param(conv105, precision_mode=a16_w16)

model_optimization_config(calibration, batch_size=16)

model_optimization_flavor(optimization_level=4, compression_level=0, batch_size=16)

post_quantization_optimization(finetune, policy=enabled, learning_rate=0.00001, dataset_size=4096)

nms_postprocess("nms_layer_config.json", meta_arch=yolov8, engine=cpu)
"""

Once I’ve started the script I saw this output at some point:

Adaround: 1%|▎ | 1/125 [00:16<34:27, 16.67s/blocks, Layers=[‘drone_detector/conv1_output_0’]][warning] DALI is not installed, using tensorflow dataset for layer by layer train. Using DALI will improve train time significantly. To install it use: pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110 nvidia-dali-tf-plugin-cuda110
[warning] Dataset isn’t shuffled without DALI. To remove this warning add the following model script command: post_quantization_optimization(adaround, shuffle=False)

I suppose shuffling is beneficial and I’d like faster train times so I did this:


$ pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110 nvidia-dali-tf-plugin-cuda110

But now I get the following exception:

[info] Using dataset with 1024 entries for Adaround
[info] Using dataset with 64 entries for bias correction
Adaround: 1%|▎ | 1/125 [01:53<3:54:52, 113.65s/blocks, Layers=[‘drone_detector/conv1_output_0’]]

Traceback (most recent call last):
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node _wrapped__MultiDeviceIteratorInit_device/job:localhost/replica:0/task:0/device:CPU:0}} TF device and DALI device mismatch. TF device: CPU, DALI device: GPU for output 0 [Op:MultiDeviceIteratorInit]

Any ideas what do I have to change to fix this?

Hmmm I haven’t encountered this error yet in my testing, this is odd, could you try disabling adaround? It shouldn’t impact the performance too much, I am not sure how to get past this

But maybe the issue could be with the drivers, see it says
nvidia-dali-cuda110
and who knows what CUDA drivers you are running, see if there are any newer Dali things… I am not really proficient to help Im so sorry :sob:

Hello,
I have the same issue (TF device and DALI device mismatch. TF device: CPU, DALI device: GPU for output 0 [Op:MultiDeviceIteratorInit]).
Did you find a solution (still using adaround)?

I have “solved” it by just using a lower optimization level:

alls = """
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv120, sigmoid)
change_output_activation(conv143, sigmoid)
change_output_activation(conv165, sigmoid)

model_optimization_config(calibration, batch_size=16)
model_optimization_flavor(optimization_level=1, compression_level=0, batch_size=16)
nms_postprocess("nms_layer_config.json", meta_arch=yolov8, engine=cpu)
performance_param(compiler_optimization_level=max)
"""

I did this because higher optimization levels (2+) resulted in nothing being detected at all, and this one turned to detect stuff about as well as base .pt model.