Model compilation run-time error

I have a yolov8n model trained on a custom dataset with 1 class label. I am running the following command:

The optimization and calibration process appears to run fine until I encounter the following error:

[info] Fine Tune is done (completion time is 00:04:07.25)
[info] Starting Layer Noise Analysis
Full Quant Analysis:  50%|███████████████████████████████████████████████████                                                   | 1/2 [00:00<00:00,  7.61iterations/s]Traceback (most recent call last):
  File "/root/hailo/bin/hailomz", line 33, in <module>
    sys.exit(load_entry_point('hailo-model-zoo', 'console_scripts', 'hailomz')())
  File "/root/hailo_model_zoo/hailo_model_zoo/main.py", line 122, in main
    run(args)
  File "/root/hailo_model_zoo/hailo_model_zoo/main.py", line 111, in run
    return handlers[args.command](args)
  File "/root/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 250, in compile
    _ensure_optimized(runner, logger, args, network_info)
  File "/root/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 91, in _ensure_optimized
    optimize_model(
  File "/root/hailo_model_zoo/hailo_model_zoo/core/main_utils.py", line 321, in optimize_model
    runner.optimize(calib_feed_callback)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2093, in optimize
    self._optimize(calib_data, data_type=data_type, work_dir=work_dir)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 1935, in _optimize
    self._sdk_backend.full_quantization(calib_data, data_type=data_type, work_dir=work_dir)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1045, in full_quantization
    self._full_acceleras_run(self.calibration_data, data_type)
  File "/root/hailo/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1229, in _full_acceleras_run
    optimization_flow.run()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 306, in wrapper
    return func(self, *args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 316, in run
    step_func()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped
    result = method(*args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/subprocess_wrapper.py", line 111, in parent_wrapper
    raise SubprocessTracebackFailure(*child_messages)
hailo_model_optimization.acceleras.utils.acceleras_exceptions.SubprocessTracebackFailure: Subprocess failed with traceback

Traceback (most recent call last):
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/subprocess_wrapper.py", line 73, in child_wrapper
    func(self, *args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 347, in step3
    self.finalize_optimization()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped
    result = method(*args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 405, in finalize_optimization
    self._noise_analysis()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped
    result = method(*args, **kwargs)
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 585, in _noise_analysis
    algo.run()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/optimization_algorithm.py", line 50, in run
    return super().run()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/algorithm_base.py", line 151, in run
    self._run_int()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 83, in _run_int
    self.analyze_full_quant_net()
  File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 197, in analyze_full_quant_net
    lat_model.predict_on_batch(inputs)
  File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2603, in predict_on_batch
    outputs = self.predict_function(iterator)
  File "/root/hailo/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filenvpcgmfo.py", line 15, in tf__predict_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
  File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function
    outputs = model.distribute_strategy.run(run_step, args=(data,))
  File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step
    outputs = model.predict_step(data)
  File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
    return self(x, training=False)
  File "/root/hailo/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_fileau2rkomm.py", line 188, in tf__call
    ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
  File "/tmp/__autograph_generated_fileau2rkomm.py", line 167, in loop_body_5
    ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
  File "/tmp/__autograph_generated_fileau2rkomm.py", line 94, in if_body_3
    n_ancestors = ag__.converted_call(ag__.ld(self)._native_model.flow.ancestors, (ag__.ld(lname),), None, fscope)
  File "/tmp/__autograph_generated_filewueztmji.py", line 12, in tf__ancestors
    retval_ = ag__.converted_call(ag__.ld(nx).ancestors, (ag__.ld(self), ag__.ld(source)), None, fscope)
TypeError: in user code:

    File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2169, in predict_function  *
        return step_function(self, iterator)
    File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2155, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2143, in run_step  **
        outputs = model.predict_step(data)
    File "/root/hailo/lib/python3.10/site-packages/keras/engine/training.py", line 2111, in predict_step
        return self(x, training=False)
    File "/root/hailo/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_fileau2rkomm.py", line 188, in tf__call
        ag__.for_stmt(ag__.converted_call(ag__.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'})
    File "/tmp/__autograph_generated_fileau2rkomm.py", line 167, in loop_body_5
        ag__.if_stmt(ag__.not_(continue__1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0)
    File "/tmp/__autograph_generated_fileau2rkomm.py", line 94, in if_body_3
        n_ancestors = ag__.converted_call(ag__.ld(self)._native_model.flow.ancestors, (ag__.ld(lname),), None, fscope)
    File "/tmp/__autograph_generated_filewueztmji.py", line 12, in tf__ancestors
        retval_ = ag__.converted_call(ag__.ld(nx).ancestors, (ag__.ld(self), ag__.ld(source)), None, fscope)

    TypeError: Exception encountered when calling layer 'lat_model' (type LATModel).
    
    in user code:
    
        File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 340, in call  *
            n_ancestors = self._native_model.flow.ancestors(lname)
        File "/root/hailo/lib/python3.10/site-packages/hailo_model_optimization/acceleras/model/hailo_model/model_flow.py", line 31, in ancestors  *
            return nx.ancestors(self, source)
    
        TypeError: outer_factory.<locals>.inner_factory.<locals>.tf__func() missing 1 required keyword-only argument: '__wrapper'
    
    
    Call arguments received by layer 'lat_model' (type LATModel):
      • inputs=tf.Tensor(shape=(8, 640, 640, 3), dtype=float32)

I did search the forum for similar issues and encountered this post Problem With Model Optimization - #31 by klausk which seemed relevant to my model, since it only has one output class.

I attempted to re-run the steps with a modified yolov8n.alls file but got the same error, so I am not sure how to proceed. I also tried other model sizes and got a similar error to the above.

Sorry, I omitted the command I was running. It is here:

hailomz compile --ckpt aicam-v5n.onnx --calib-path hailo-pkgs/val --yaml hailo_model_zoo/hailo_model_zoo/cfg/networks/yolov8n.yaml --hw-arch hailo8l --classes 1 --performance

I also should point out that I ran my converted onnx file through onnx-simplifier as recommended here: Dataflow compiler best practice

Hi, Alex here, did you try to run without --performance ?
It may activate some additional tools one of which throws.

Yes, I tried without --performance and got the same result.

Hi @liamw9534,

We saw this happening sometimes due to a GPU driver incompatibility.

Are you following the requirements below?

Hello,

I am using a cloud instance with:

  • Ubuntu 22.04 64-bit
  • RTX 3090 GPU
  • 24 GB RAM

My docker is derived from nvidia/cuda:11.8.0-devel-ubuntu22.04 with some apt package additions as follows:

  • unzip
  • python3.10-venv
  • python3.10-dev
  • graphviz-dev
  • libgl1-mesa-glx
  • libcudnn8=8.9.0.*-1+cuda11.8

I don’t know which GPU driver version is installed by the base docker image. If there is a way I can check then please let me know.

Ok, nvidia-smi is indicating a GPU driver version of 550. Since this is a cloud instance I won’t be able to update this. Is the GPU driver version 525 the only one that is working?

Hello,

I setup a WSL2 instance (because I can’t control the driver version when running in the cloud). I got exactly the same error. Below is my setup:

  • Windows 11 PC x86 64-bit
  • GTX 1050 Ti GPU (Windows Driver Version 31.0.15.2879)
  • Ubuntu 22.04.3 LTS
  • CUDA 11.8
  • CUDANN 8.9.0.131
  • NVIDIA SMI 525.104
  • Driver Version: 528.79 (allows up to CUDA Version: 12.0)
  • Python 3.10
  • DFC 3.28.0

I don’t think this issue is related to the driver version because I am fairly sure this is a compatible line-up of software based on NVIDIA’s compatibility charts. How to proceed?