Compilation hef from onnx (or har) stucks or takes too long (> 20 hours)

I run the compilation model from onnx via the next command:

hailomz compile yolov8l --ckpt yolov8l.onnx --hw-arch hailo8l --calib-path calib_data_images --classes 1 --performance

with modified yolov8l.alls:

yolov8l/normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])

model_optimization_config(calibration, batch_size=4)
model_optimization_config(compression_params, auto_16bit_weights_ratio=1)
post_quantization_optimization(finetune, batch_size=4, epochs=4, loss_factors=[1, 1, 1, 1, 2, 2, 2, 2, 2, 2], loss_layer_names=[yolov8l/conv97, yolov8l/conv82, yolov8l/conv67, yolov8l/conv25, yolov8l/conv100, yolov8l/conv88, yolov8l/conv73, yolov8l/conv103, yolov8l/conv89, yolov8l/conv74], loss_types=[l2rel, l2rel, l2rel, l2rel, l2rel, l2rel, l2rel, ce, ce, ce], policy=enabled)

model_optimization_flavor(optimization_level=4,compression_level=0)

change_output_activation(yolov8l/conv74,sigmoid)
change_output_activation(yolov8l/conv89,sigmoid)
change_output_activation(yolov8l/conv103,sigmoid)
nms_postprocess(“{har}”, meta_arch=yolov8, engine=cpu)

Before saving the har file, it is clear what it does. After the har file was created and saved successfully, it started Finding the best partition to contexts... and stuck

[info] Model Optimization Algorithm Layer Noise Analysis is done (completion time is 00:05:16.52)
[info] Model Optimization is done
[info] Saved HAR to: /local/workspace/hailo_model_zoo/yolov8l.har
[info] Using generic alls script found in /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8l.alls because there is no specific hardware allse[0m
[info] Loading model script commands to yolov8l from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8l.alls
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Appending model script commands to yolov8l from string
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv73
[info] Adding an output layer after conv74
[info] Adding an output layer after conv88
[info] Adding an output layer after conv89
[info] Adding an output layer after conv100
[info] Adding an output layer after conv103
[info]] Finding the best partition to contexts…
[.<==>…] Elapsed: 00:00:00

[…<==>…] Elapsed: 15:31:13
So, I have several questions:

  • What does it do?
  • Why it takes too long time?
  • Can I see more logs about this operation ( Finding the best partition to contexts... )?
  • While it was compiling the har file, it was using GPU, while it was doing the hef file from har, it was using only CPU. Is compiling hef from har can be used GPU to speed up?

Besides, I investigated a little bit, that while compiling hef from har, it was run ../hailo_tools/build/compiler and it stuck inside this running. I did not find any information about this compiler and how to verbose more logs or others.

Hey @dmytro.babenko,

Welcome to the Hailo Community!
Let me address each of your questions and provide some suggestions for your current situation.

  1. What Does “Finding the Best Partition to Contexts” Do?

    • This step in the Hailo Dataflow Compiler (DFC) optimally maps the model’s layers to the available compute resources on the target Hailo hardware.
    • For complex models like YOLOv8l, it involves solving a large optimization problem across multiple partitions of the neural network to maximize performance and resource utilization.
  2. Why Does It Take So Long?

    • The partitioning process is computationally intensive, especially for larger models like YOLOv8l, and the high optimization level (optimization_level=4) you selected increases complexity.
    • The delay could be due to a large search space for partitioning layers across contexts and limited CPU resources during the HAR-to-HEF compilation step, which currently doesn’t utilize GPU acceleration.
  3. How to Enable More Logs for “Finding the Best Partition to Contexts”?

    • Enable verbose logging in the Dataflow Compiler:
      hailomz compile yolov8l --ckpt yolov8l.onnx --hw-arch hailo8l --calib-path calib_data_images --classes 1 --performance 
      
    • Set the HAILO_LOG_LEVEL environment variable to debug for even more detailed output:
      export HAILO_LOG_LEVEL=debug
      
    • Check the logs for bottlenecks or errors during partitioning.
  4. Can GPU Be Used for HAR-to-HEF Compilation?

    • Currently, the HAR-to-HEF compilation process primarily uses CPU resources for allocation and resource mapping rather than GPU-heavy operations.
    • GPU utilization may be enhanced in future releases, but for now, it’s CPU-only.
  5. Information About the Hailo Compiler

    • The hailo_tools/build/compiler is part of the Hailo Dataflow Compiler and translates the HAR file into a HEF file.
    • To understand its internals or investigate issues, refer to the Model Compilation and Optimization section of the Hailo Dataflow Compiler documentation and use the --debug flag with hailomz compile for more insights into the compilation flow.

Here are some tips to speed up compilation:

  1. Reduce Optimization Level:

    • Lower the optimization level in your yolov8l.alls:
      model_optimization_flavor(optimization_level=2, compression_level=0)
      
  2. Simplify Model Configurations:

    • Use smaller batch_size values during calibration or fine-tuning:
      model_optimization_config(calibration, batch_size=2)
      
  3. Hardware Considerations:

    • Compile the HAR file on a machine with more CPU cores and RAM.
  4. Partition Troubleshooting:

    • If the process continues to hang, review the conv layers being added as output layers in the logs. Complex outputs may increase partitioning time.

Let me know if you need any further clarification! These steps should help reduce the compilation time and debug the process effectively.

Hello @omria, thanks a lot for the quick and detailed response to each of my questions. I would like to clarify more about your suggestions.

3. How to Enable More Logs for “Finding the Best Partition to Contexts”?
I tried to set up the –verbose argument for hailomz compile command, but it raised me error that unrecognized arguments: --verbose. I did find where this argument can be parsed for hailomz compile command.

Besides, I set the environment variable HAILO_LOG_LEVEL=debug, but it did not provide me additional information in the log:

(hailo_virtualenv) hailo@remote:/local/workspace/build$ echo $HAILO_LOG_LEVEL
debug
(hailo_virtualenv) hailo@remote:/local/workspace/build$ hailomz compile yolov8l --hw-arch hailo8l --har /local/shared_with_docker/data/models/a_16_w_16_all_layers/yolov8l.har --classes 1 --performance
Start run for network yolov8l …
Initializing the hailo8l runner…
Using generic alls script found in /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8l.alls because there is no specific hardware alls
[info] Loading model script commands to yolov8l from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8l.alls
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Appending model script commands to yolov8l from string
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv73
[info] Adding an output layer after conv74
[info] Adding an output layer after conv88
[info] Adding an output layer after conv89
[info] Adding an output layer after conv100
[info] Adding an output layer after conv103
[info] Finding the best partition to contexts…
[.<==>…] Elapsed: 16:10:34
[info] Finding the best partition to contexts…
[…<==>…] Elapsed: 03:19:01
…<==>…] Elapsed: 00:50:25

Maybe I need to add the –verbose argument to the /local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_tools/build/compiler executable?

Hardware Considerations

I re-run compilation from har to hef on another machine with 96 CPU (before I run with 8 CPU) with more RAM and did not notice speed improvement. Maybe it is necessary to add some flag to mention more cores?
There are many running …/compiler (~96) processes.


but only 1 CPU is loaded up to 100%

Hi @dmytro.babenko,

Let me address each of your points:

  1. I apologize for the confusion - the --verbose flag is deprecated. I’ve edited my previous response to reflect this.

  2. For the log level, try using echo instead of export:

echo "debug" > /var/log/hailort.log
  1. Try one of these approaches:
    • Run without the --performance flag
    • Set the optimization level to 0 in your alls file

I’ll look into the compiler’s CPU usage behavior and get back to you on that specific issue.

Best regards,
Omria