Compile yolov10n onnx to hef

I have tested the yolov10n and yolov10s hef models from hailo model zoo on hailo 8. Now, I try to compile the yolov10n ONNX model to hef model.

I use the following code:

import numpy as np
import os
from hailo_sdk_client import ClientRunner
#Define model information
model_name = 'yolov10n'
onnx_path = '../onnx_models/yolov10n.onnx'
start_node = 'images'
end_node = ['output0']
input_shape = {'images': [1, 3, 640, 640]}
chosen_hw_arch = 'hailo8'
input_height = 640
input_width = 640
input_ch = 3

alls_lines = [
    'model_optimization_flavor(optimization_level=0, compression_level=1)',
    'resources_param(max_control_utilization=0.8, max_compute_utilization=0.8,max_memory_utilization=0.8)',
    'performance_param(fps=10)'
]

# Join lines with newline separator
# script_content = '\n'.join(alls_lines)

#Parsing
runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_onnx_model(onnx_path, model_name, start_node_names=[start_node], end_node_names=end_node, net_input_shapes=input_shape)
parsed_model_har_path = f'{model_name}_parsed_model.har'
runner.save_har(parsed_model_har_path)
#Optimize
calibData = np.random.randint(0, 255, (1024, input_height, input_width, input_ch))
runner.load_model_script('\n'.join(alls_lines))
runner.optimize(calibData)
quantized_model_har_path = f'{model_name}_quantized_model.har'
runner.save_har(quantized_model_har_path)
#Compile
hef = runner.compile()
file_name = f'{model_name}.hef'
with open(file_name, 'wb') as f:
    f.write(hef)
compiled_model_har_path = f'{model_name}_compiled_model.har'
runner.save_har(compiled_model_har_path)

But it reports following error:

hailo_sdk_client.model_translator.exceptions.ParsingWithRecommendationException: Parsing failed. The errors found in the graph are:
 UnsupportedShuffleLayerError in op /model.23/dfl/Reshape: Failed to determine type of layer to create in node /model.23/dfl/Reshape
 UnsupportedOperationError in op /model.23/GatherElements: GatherElements operation is unsupported
 UnsupportedOperationError in op /model.23/GatherElements_1: GatherElements operation is unsupported
 UnsupportedReduceMaxLayerError in op /model.23/ReduceMax: Failed to create reduce max layer at vertex /model.23/ReduceMax. Reduce max is only supported on the features axis, and with keepdim=True
 UnsupportedShuffleLayerError in op /model.23/Flatten: Failed to determine type of layer to create in node /model.23/Flatten
 UnsupportedOperationError in op /model.23/TopK: TopK operation is unsupported
 UnsupportedGatherLayerError in op /model.23/Gather_3: Can't find index
 UnsupportedOperationError in op /model.23/TopK_1: TopK operation is unsupported
 UnsupportedOperationError in op /model.23/Mod: Mod operation is unsupported
 UnsupportedConcatLayerError in op /model.23/Tile: Unsupported concat over axis batch
 UnsupportedConcatLayerError in op /model.23/Tile_1: Unsupported concat over axis batch
 UnsupportedModelError in op /model.23/Sub: In vertex /model.23/Sub_input the constant value shape (1, 2, 8400) must be broadcastable to the output shape [2, 8400, 1]
 UnsupportedModelError in op /model.23/Add_1: In vertex /model.23/Add_1_input the constant value shape (1, 2, 8400) must be broadcastable to the output shape [2, 8400, 1]

And my SDK version is :

(hailo_virtualenv) hailo@simws08:/local/shared_with_docker/compile_test$ hailo --version
[info] Current Time: 16:02:04, 10/14/24
[info] CPU: Architecture: x86_64, Model: Intel(R) Core(TM) i9-10900F CPU @ 2.80GHz, Number Of Cores: 20, Utilization: 1.0%
[info] Memory: Total: 31GB, Available: 28GB
[info] System info: OS: Linux, Kernel: 5.15.0-122-generic
[info] Hailo DFC Version: 3.29.0
[info] HailoRT Version: 4.19.0
[info] PCIe: No Hailo PCIe device was found
[info] Running `hailo --version`
HailoRT v4.19.0
Hailo Dataflow Compiler v3.29.0

  1. In case where you have provided compiled yolov10n model on hailo model zoo. I wonder how you deal with those unsupported operations.
  2. I tried to bypass the unsupported operations by changing the end node to /model.23/Concat_3, but it report:
[info] Model Optimization is done
[info] Saved HAR to: /local/shared_with_docker/compile_test/yolov10n_h_quantized_model.har
[info] To achieve optimal performance, set the compiler_optimization_level to "max" by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[error] Failed to reach fps on following nodes: 
concat18 max reached fps: 0 required_fps: 10
concat18 errors:
	Agent infeasible by resources sanity.: Memory units capacity exceeded (available: 128, required: 150).


it should be a small model, but it reports that memory units capacity exceeded. Why and how to deal with it?

Hey @desuikong,

Welcome to the Hailo Community!

It looks like you’re running into two key issues when trying to compile your YOLOv10n ONNX model for Hailo8L: unsupported operations and memory capacity errors. Let’s break these down and explore some solutions:

1. Unsupported Operations in YOLOv10n

The error you’re seeing lists several operations that are unsupported by the Hailo SDK, such as GatherElements, ReduceMax, TopK, and Mod. These operations are typically complex and not supported on Hailo’s hardware.

Possible Solutions:

  • Simplify the Model: Modify the ONNX model to remove or replace unsupported operations. For instance, you could try using a simpler YOLO version like YOLOv5n or YOLOv8n, which are known to be compatible with the Hailo architecture. You can also inspect the model using a tool like Netron to identify and remove problematic nodes.

  • Use a Different Export Configuration: When exporting your model from a framework like PyTorch, try adjusting the export configuration to avoid these unsupported operations. Some export flags might help simplify the model or remove unnecessary operations.

  • Leverage Pre-Optimized Models: If the YOLOv10n model from the Hailo Model Zoo meets your needs, you can skip custom model compilation and use these pre-optimized models, which are already tested and optimized for Hailo’s architecture.

2. Memory Capacity Exceeded Error

When trying to compile the model, you received a memory-related error stating:

Memory units capacity exceeded (available: 128, required: 150).

This means the model exceeds the available memory on the Hailo8L chip, despite being a relatively small model.

Possible Solutions:

  • Reduce Input Resolution: You can try lowering the input resolution from 640x640 to something smaller like 320x320 or 416x416. Reducing the input size should help fit the model within Hailo8L’s memory constraints.

  • Simplify the Model Architecture: If lowering the resolution isn’t enough, try reducing the number of layers in the model or switching to a lighter version like YOLOv5n or YOLOv8n.

  • Tune Model Optimization Parameters: You can experiment with stricter optimization parameters, such as further reducing memory and compute utilization. Here’s an example of how you can adjust the optimization script to reduce memory requirements:

    alls_lines = [
        'model_optimization_flavor(optimization_level=1, compression_level=2)',
        'resources_param(max_control_utilization=0.6, max_compute_utilization=0.6, max_memory_utilization=0.6)',
        'performance_param(fps=5)'
    ]
    

This will help manage memory usage more aggressively and could resolve the memory limit issue.

3. Handling End Node Changes

You mentioned trying to bypass the unsupported operations by changing the end node to /model.23/Concat_3. While this is a valid approach to bypass certain layers, you still ran into memory capacity issues.

Recommendations:

  • Further Model Simplification: Consider removing additional layers if possible, or switch to a smaller, simpler model.
  • Lower Input Resolution: Reducing the input size will help the model fit into the available memory.

4. Potential Workaround

If simplifying the model or lowering the input resolution doesn’t work, consider splitting the model into smaller sub-networks and running them separately on the Hailo8L device.

Conclusion:

  • Unsupported Operations: Simplify the model or switch to a compatible version like YOLOv8n or YOLOv5n to avoid unsupported operations.
  • Memory Capacity Exceeded: Reduce the input resolution or further optimize the model to fit within the memory constraints of the Hailo8L chip.
  • End Nodes: Changing the end node can help bypass unsupported operations, but you will still need to address the memory issues.

Let me know how it goes, and feel free to ask for further assistance if needed!

Best regards,
Omri

Thanks for your reply!
Now I am trying your suggestions. For Memory Capacity Exceeded Error, I try to change model optimization parameter as what you write. But it seems that it reports the same error no matter what I change. I tried many other parameters but it always require 150 memory units and report the same error.

[error] Failed to reach fps on following nodes: 
concat18 max reached fps: 0 required_fps: 5
concat18 errors:
	Agent infeasible by resources sanity.: Memory units capacity exceeded (available: 128, required: 150).


[error] Mapping Failed (allocation time: 0s)
Failed to reach fps on following nodes: 
concat18 max reached fps: 0 required_fps: 5
concat18 errors:
	Agent infeasible by resources sanity.: Memory units capacity exceeded (available: 128, required: 150).



[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: Failed to reach fps on following nodes:

Besides, I am curious that whether I could use dataset (calibration data) to do optimize if I change the end node or split the model into 2 small models. As I think, optimize couldn’t be done in this case because the model archi would be incomplete in this case.

Hey @desuikong

Ok let expand more on The Memory Capacity Exceeded error. This suggests that your model requires more memory units than the device can provide. Here are some strategies to help resolve the issue:


1. Optimize Model Configuration to Reduce Memory Usage

  1. Use Lower Precision Models:

    • Switch to INT8 quantization to reduce memory demands:
      export HAILO_QUANT_MODE=int8
      
    • Ensure the model uses quantized inference throughout the pipeline.
  2. Disable Non-Essential Layers:

    • If layers like concat18 aren’t critical, try pruning or removing them.
  3. Reduce Batch Size:

    • Set the batch size to 1, as higher batch sizes consume more memory:
      model.compile(batch_size=1)
      
  4. Reorder Model Operations:

    • Adjust the model graph to improve how resources are allocated across nodes. This can make the compilation more memory efficient.
  5. Utilize TAPPAS Framework Optimizations:

    • Verify your configurations in the TAPPAS framework, which provides optimized pipelines for Hailo devices.

2. Splitting the Model and Using Calibration Data

Yes, splitting the model into smaller sub-models is a viable solution. However, it requires careful handling:

  1. Model Splitting:

    • Divide the model at logical points (e.g., before or after concat18).
    • Use the output from the first part as the input for the second part.
  2. Calibration Data for Optimization:

    • Calibration data is necessary for optimizing both sub-models after splitting.
    • Ensure each sub-model is trained and optimized independently with corresponding datasets.
  3. Accuracy Consideration:

    • Be aware that splitting a model can slightly affect accuracy, so validate both sub-models separately and together.
  4. Compiling Split Models:

    • You can compile each sub-model individually using the Hailo AI Suite. Detailed steps on model management are available in the Model Zoo User Guide.

3. Explore Alternative Solutions

  1. Offload Operations:

    • Consider offloading some tasks to the CPU if the Hailo device alone cannot handle the full workload.
  2. Use a Higher-Capacity Device:

    • If your current device has only 128 memory units, switching to a more capable device like the Hailo-10H might solve the issue.
  3. Leverage Virtual Devices:

    • Use HailoRT virtual devices to distribute processing across multiple devices, helping alleviate memory bottleneck.

4. Check Resource Allocation with Hailo Tools

You can use Hailo’s CLI tools to monitor and profile the model:

hailo_monitor
hailo_profiler --graph model.hef

These tools will help identify which nodes are consuming excessive resources, allowing you to make targeted optimizations.

Thanks for you reply again.

I am now using hailomz to do all tasks. With the support of hailo model zoo, my compilation works well now.

regards.