Hailo Quantization Level 4 (AdaQuant) fails with unexpected error

Hi, I am trying out the AdaQuant algorithm to optimize my yolox detection model with the model-script outlined below. Unfortunately, I am getting an unexpected error throughout the optimization. Any ideas why this happens and how to solve these issues?

model_optimization_config(calibration, batch_size=16, calibset_size=4096)

model_optimization_flavor(optimization_level=4, compression_level=1)

post_quantization_optimization(adaround, policy=enabled, batch_size=8, dataset_size=4096)

nms_postprocess("nms.json", yolox, engine=cpu)
File "/local/workspace/hailo_virtualenv/bin/hailomz", line 8, in <module>loss: 0.5620 - round_loss: 0.0000 - annealing_b: 20.0000]
    sys.exit(main())
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_zoo/main.py", line 122, in main
    run(args)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_zoo/main.py", line 111, in run
    return handlers[args.command](args)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_zoo/main_driver.py", line 227, in optimize
    optimize_model(
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_zoo/core/main_utils.py", line 321, in optimize_model
    runner.optimize(calib_feed_callback)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/runner/client_runner.py", line 2093, in optimize
    self._optimize(calib_data, data_type=data_type, work_dir=work_dir)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/runner/client_runner.py", line 1935, in _optimize
    self._sdk_backend.full_quantization(calib_data, data_type=data_type, work_dir=work_dir)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1045, in full_quantization
    self._full_acceleras_run(self.calibration_data, data_type)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1229, in _full_acceleras_run
    optimization_flow.run()
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/tools/orchestator.py", line 306, in wrapper
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 316, in run
    step_func()
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped
    result = method(*args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_model_optimization/tools/subprocess_wrapper.py", line 113, in parent_wrapper
    raise SubprocessUnexpectedFailure(
hailo_model_optimization.acceleras.utils.acceleras_exceptions.SubprocessUnexpectedFailure: Subprocess step2 failed with unexpected error. exitcode -9

I am using the v3.28 Hailo DFC Docker container.

Hi @stwerner,
It’s difficult to know what exactly the issue is without seeing what the nms.json holds and without seeing the ONNX\HAR file you are using.

Can you please provide more details and\or the ONNX you used? Is it the yolox from the Hailo Model Zoo?

Regards,

Hi @Omer ,

I cannot share the ONNX file of my model at the moment. It is similar to the yolox_l_leaky model from the Hailo Model Zoo with a different number of classes. The NMS and post-processing file is basically the same as what is used for the Model Zoo.

Note that I’ve successfully optimized & compiled this model using other configurations of the model-script (i.e., various configurations using optimization level two and compression level one) already, so the NMS file and other configuration options should be fine.

Kind Regards,

Hi @stwerner,
What are the configurations you used that gave you successful compilation?

Regards,

Hi @Omer

This is the NMS File:

{
	"nms_scores_th": 0.01,
	"nms_iou_th": 0.65,
	"number_of_detection_heads": 3,
	"image_dims": [
		640,
		640
	],
	"max_proposals_per_class": 100,
	"classes": 1,
	"bbox_decoders": [
		{
			"name": "bbox_decoder_8",
			"stride": 8,
			"reg_layer": "conv95",
			"objectness_layer": "conv96",
			"cls_layer": "conv94"
		},
		{
			"name": "bbox_decoder_16",
			"stride": 16,
			"reg_layer": "conv113",
			"objectness_layer": "conv114",
			"cls_layer": "conv112"
		},
		{
			"name": "bbox_decoder_32",
			"stride": 32,
			"reg_layer": "conv130",
			"objectness_layer": "conv131",
			"cls_layer": "conv129"
		}
	]
}

Hi @stwerner,
Thanks for the info. What were the optimization commands you used with this JSON config that gave successful optimization?

Regards,

Hi @Omer ,

I use the command hailomz optimize --yaml model.yaml --model-script optim.alls --ckpt <...> --calib-path <...>.

Here is the model.yaml file:

base:
- networks/yolox_l_leaky.yaml

network:
  network_name: yolox_l_leaky

paths:
  alls_script: null

parser:
  nodes:
  - null
  - - Conv_307
    - Sigmoid_309
    - Sigmoid_310
    - Conv_323
    - Sigmoid_325
    - Sigmoid_326
    - Conv_339
    - Sigmoid_341
    - Sigmoid_342


postprocessing:
  device_pre_post_layers:
    nms: true
  postprocess_config_file: nms.json
  meta_arch: yolox
  hpp: true

evaluation:
  labels_offset: 1
  classes: 1

An optim.alls file that works:

model_optimization_config(calibration, batch_size=16, calibset_size=16384)
post_quantization_optimization(finetune, policy=enabled, dataset_size=16384, batch_size=8, epochs=20, learning_rate=0.0001)
nms_postprocess("nms.json", yolox, engine=cpu)

The optim.alls file that doesnt work:

model_optimization_config(calibration, batch_size=16, calibset_size=4096)

model_optimization_flavor(optimization_level=4, compression_level=1)

post_quantization_optimization(adaround, policy=enabled, batch_size=8, dataset_size=4096)
nms_postprocess("nms.json", yolox, engine=cpu)

Regards,

Hi @stwerner,
I believe that you are getting this error because of resource exhaustion.
Optimization level 4 activate the Adaround optimization algorithm, which is pretty costly in memory usage.
You can try lowering the batch size - it will take more time to run optimization, but it can reduce the memory usage significantly. Try batch_size=4 or batch_size=2.

BTW - model_optimization_flavor(optimization_level=4) and post_quantization_optimization(adaround, policy=enabled, batch_size=8) perform the same thing, so one of them is redundant. You can define the calibration size in model_optimization_flavor as well (see the Dataflow Compiler user guide for more info).

Regards,

Hi @Omer ,

Thanks for the suggestions! I’ve tested this config on a machine with ~ 64GB of RAM, so I am a bit surprised to learn that the AdaRound algorithm is so heavy in terms of compute.

Regards,