Model compilation run-time error

I have a yolov8n model trained on a custom dataset with 1 class label. I am running the following command:

The optimization and calibration process appears to run fine until I encounter the following error:

[info] Fine Tune is done (completion time is 00:04:07.25)
[info] Starting Layer Noise Analysis
Full Quant Analysis:  50%|███████████████████████████████████████████████████                                                   | 1/2 [00:00<00:00,  7.61iterations/s]Traceback (most recent call last):
  File "/root/hailo/bin/hailomz", line 33, in <module>
    sys.exit(load_entry_point('hailo-model-zoo', 'console_scripts', 'hailomz')())
  File "/root/hailo_model_zoo/hailo_model_zoo/main.py", line 122, in main
    run(args)
  File "/root/hailo_model_zoo/hailo_model_zoo/main.py", line 111, in run
    return handlers[args.command](args)
  File "/root/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 250, in compile
    _ensure_optimized(runner, logger, args, network_info)
  File "/root/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 91, in _ensure_optimized
    optimize_model(
  File "/root/hailo_model_zoo/hailo_model_zoo/core/main_utils.py", line 321, in optimize_model
    runner.optimize(calib_feed_callback)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2093, in optimize
    self._optimize(calib_data, data_type=data_type, work_dir=work_dir)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 1935, in _optimize
    self._sdk_backend.full_quantization(calib_data, data_type=data_type, work_dir=work_dir)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1045, in full_quantization
    self._full_acceleras_run(self.calibration_data, data_type)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1229, in _full_acceleras_run
    optimization_flow.run()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 306, in wrapper
    return func(self, *args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 316, in run
    step_func()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped
    result = method(*args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/subprocess_wrapper.py", line 111, in parent_wrapper
    raise SubprocessTracebackFailure(*child_messages)
hailo_model_optimization.acceleras.utils.acceleras_exceptions.SubprocessTracebackFailure: Subprocess failed with traceback

Traceback (most recent call last):
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/subprocess_wrapper.py", line 73, in child_wrapper
    func(self, *args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 347, in step3
    self.finalize_optimization()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped
    result = method(*args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 405, in finalize_optimization
    self._noise_analysis()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped
    result = method(*args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 585, in _noise_analysis
    algo.run()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/optimization_algorithm.py", line 50, in run
    return super().run()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/algorithm_base.py", line 151, in run
    self._run_int()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 83, in _run_int
    self.analyze_full_quant_net()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 197, in analyze_full_quant_net
    lat_model.predict_on_batch(inputs)
  File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2603, in predict_on_batch
    outputs = self.predict_function(iterator)
  File "/root/hailo/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filenvpcgmfo.py", line 15, in tf__predict_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
  File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function
    outputs = model.distribute_strategy.run(run_step, args=(data,))
  File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step
    outputs = model.predict_step(data)
  File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
    return self(x, training=False)
  File "/root/hailo/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_fileau2rkomm.py", line 188, in tf__call
    ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
  File "/tmp/__autograph_generated_fileau2rkomm.py", line 167, in loop_body_5
    ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
  File "/tmp/__autograph_generated_fileau2rkomm.py", line 94, in if_body_3
    n_ancestors = ag__.converted_call(ag__.ld(self)._native_model.flow.ancestors, (ag__.ld(lname),), None, fscope)
  File "/tmp/__autograph_generated_filewueztmji.py", line 12, in tf__ancestors
    retval_ = ag__.converted_call(ag__.ld(nx).ancestors, (ag__.ld(self), ag__.ld(source)), None, fscope)
TypeError: in user code:

    File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2169, in predict_function  *
        return step_function(self, iterator)
    File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step  **
        outputs = model.predict_step(data)
    File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
        return self(x, training=False)
    File "/root/hailo/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_fileau2rkomm.py", line 188, in tf__call
        ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
    File "/tmp/__autograph_generated_fileau2rkomm.py", line 167, in loop_body_5
        ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
    File "/tmp/__autograph_generated_fileau2rkomm.py", line 94, in if_body_3
        n_ancestors = ag__.converted_call(ag__.ld(self)._native_model.flow.ancestors, (ag__.ld(lname),), None, fscope)
    File "/tmp/__autograph_generated_filewueztmji.py", line 12, in tf__ancestors
        retval_ = ag__.converted_call(ag__.ld(nx).ancestors, (ag__.ld(self), ag__.ld(source)), None, fscope)

    TypeError: Exception encountered when calling layer 'lat_model' (type LATModel).
    
    in user code:
    
        File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 340, in call  *
            n_ancestors = self._native_model.flow.ancestors(lname)
        File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/acceleras/model/hailo_model/model_flow.py", line 31, in ancestors  *
            return nx.ancestors(self, source)
    
        TypeError: outer_factory.<locals>.inner_factory.<locals>.tf__func() missing 1 required keyword-only argument: '__wrapper'
    
    
    Call arguments received by layer 'lat_model' (type LATModel):
      • inputs=tf.Tensor(shape=(8, 640, 640, 3), dtype=float32)

I did search the forum for similar issues and encountered this post Problem With Model Optimization - #31 by klausk which seemed relevant to my model, since it only has one output class.

I attempted to re-run the steps with a modified yolov8n.alls file but got the same error, so I am not sure how to proceed. I also tried other model sizes and got a similar error to the above.

Sorry, I omitted the command I was running. It is here:

hailomz compile --ckpt aicam-v5n.onnx --calib-path hailo-pkgs/val --yaml hailo_model_zoo/hailo_model_zoo/cfg/networks/yolov8n.yaml --hw-arch hailo8l --classes 1 --performance

I also should point out that I ran my converted onnx file through onnx-simplifier as recommended here: Dataflow compiler best practice

Hi, Alex here, did you try to run without --performance ?
It may activate some additional tools one of which throws.

Yes, I tried without --performance and got the same result.

Hi @liamw9534,

We saw this happening sometimes due to a GPU driver incompatibility.

Are you following the requirements below?

Hello,

I am using a cloud instance with:

  • Ubuntu 22.04 64-bit
  • RTX 3090 GPU
  • 24 GB RAM

My docker is derived from nvidia/cuda:11.8.0-devel-ubuntu22.04 with some apt package additions as follows:

  • unzip
  • python3.10-venv
  • python3.10-dev
  • graphviz-dev
  • libgl1-mesa-glx
  • libcudnn8=8.9.0.*-1+cuda11.8

I don’t know which GPU driver version is installed by the base docker image. If there is a way I can check then please let me know.

Ok, nvidia-smi is indicating a GPU driver version of 550. Since this is a cloud instance I won’t be able to update this. Is the GPU driver version 525 the only one that is working?

Hello,

I setup a WSL2 instance (because I can’t control the driver version when running in the cloud). I got exactly the same error. Below is my setup:

  • Windows 11 PC x86 64-bit
  • GTX 1050 Ti GPU (Windows Driver Version 31.0.15.2879)
  • Ubuntu 22.04.3 LTS
  • CUDA 11.8
  • CUDANN 8.9.0.131
  • NVIDIA SMI 525.104
  • Driver Version: 528.79 (allows up to CUDA Version: 12.0)
  • Python 3.10
  • DFC 3.28.0

I don’t think this issue is related to the driver version because I am fairly sure this is a compatible line-up of software based on NVIDIA’s compatibility charts. How to proceed?

Hi @liamw9534,

From the cases we’ve seen, this issue is solved once the driver incompatibility is resolved. We know that the error message is currently very unclear and are working on solving it.

I’ll confirm if the quantization is succesful from my end using the model you sent me. Could you please also share your model script commands?