Failing to compile onnx to hef

I’m running the following command:

hailomz compile yolov8n --ckpt /home/erez/train/runs/detect/train11/weights/best.onnx \
--hw-arch hailo8 \
--calib-path ../train/datasets/images/train \
--classes 3 --performance

And getting the following warnings and errors during the run:

[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[warning] Reducing output bits of yolov8n/conv63 by 7.0 bits (More than half)
[error] Mapping Failed (Timeout, allocation time: 47m 29s)

I’m not using your docker to compile and run it.
I’m able to train a model with Yolo CLI and utilize the GPU.

Any advice?

Please review the full log

hailomz compile yolov8n --ckpt /home/erez/train/runs/detect/train11/weights/best.onnx \
--hw-arch hailo8 \
--calib-path ../train/datasets/images/train \
--classes 3 --performance
<Hailo Model Zoo INFO> Start run for network yolov8n ...
<Hailo Model Zoo INFO> Initializing the hailo8 runner...
[info] Translation started on ONNX model yolov8n
[info] Restored ONNX model yolov8n (completion time: 00:00:00.08)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.36)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: 'images': 'yolov8n/input_layer1'.
[info] End nodes mapped from original model: '/model.22/cv2.0/cv2.0.2/Conv', '/model.22/cv3.0/cv3.0.2/Conv', '/model.22/cv2.1/cv2.1.2/Conv', '/model.22/cv3.1/cv3.1.2/Conv', '/model.22/cv2.2/cv2.2.2/Conv', '/model.22/cv3.2/cv3.2.2/Conv'.
[info] Translation completed on ONNX model yolov8n (completion time: 00:00:00.99)
[info] Saved HAR to: /home/erez/hailo_model_zoo/yolov8n.har
<Hailo Model Zoo INFO> Using generic alls script found in /home/erez/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n.alls because there is no specific hardware alls
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to yolov8n from /home/erez/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n.alls
[info] Loading model script commands to yolov8n from string
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.69)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 64/64 [00:40<00:00,  1.60entries/s]
[info] Statistics Collector is done (completion time is 00:00:42.40)
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[warning] Reducing output bits of yolov8n/conv63 by 7.0 bits (More than half)
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Quantization-Aware Fine-Tuning skipped
[info] Layer Noise Analysis skipped
[info] Model Optimization is done
[info] Saved HAR to: /home/erez/hailo_model_zoo/yolov8n.har
<Hailo Model Zoo INFO> Using generic alls script found in /home/erez/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n.alls because there is no specific hardware alls
[info] Loading model script commands to yolov8n from /home/erez/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n.alls
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Appending model script commands to yolov8n from string
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv41
[info] Adding an output layer after conv42
[info] Adding an output layer after conv52
[info] Adding an output layer after conv53
[info] Adding an output layer after conv62
[info] Adding an output layer after conv63
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=120%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=120%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%

Validating context_0 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 

● Finished                                       

[info] Solving the allocation (Mapping), time per context: 9m 59s
Context:0/0 Iteration 0: Trying parallel splits...   
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost 
 worker0                                                                                                  
 worker1  V          V          V          V          V          V          V          V          V       
 worker2                                                                                                  
 worker3                                                                                                  

  02:40
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0

[info] Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=117.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=117.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=115%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=115%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=112.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=112.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=110%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=110%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=107.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=107.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=105%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=105%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=102.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=102.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=100%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=100%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%

Validating context_0 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
                                         
● Finishedcshortcut_conv4_to_conv5_postprocess

                                       [info] Solving the allocation (Mapping), time per context: 9m 59s
Context:0/0 Iteration 4: Trying parallel mapping...  
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost 
 worker0  *          *          *          *          *          *          *          *          V       
 worker1  V          V          V          V          V          V          V          X          V       
 worker2  *          *          *          *          *          *          *          *          V       
 worker3                                                                                                  

  18:03
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] Iterations: 4
Reverts on cluster mapping: 1
Reverts on inter-cluster connectivity: 1
Reverts on pre-mapping validation: 1
Reverts on split failed: 0
[error] Mapping Failed (Timeout, allocation time: 47m 29s)
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=97.5%, max_compute_utilization=97.5%, max_compute_16bit_utilization=97.5%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=97.5%, max_apu_utilization=97.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=97.5%, max_compute_utilization=97.5%, max_compute_16bit_utilization=97.5%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=97.5%, max_apu_utilization=97.5%
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=95%, max_compute_utilization=95%, max_compute_16bit_utilization=95%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=95%, max_apu_utilization=95%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=95%, max_compute_utilization=95%, max_compute_16bit_utilization=95%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=95%, max_apu_utilization=95%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=92.5%, max_compute_utilization=92.5%, max_compute_16bit_utilization=92.5%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=92.5%, max_apu_utilization=92.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=92.5%, max_compute_utilization=92.5%, max_compute_16bit_utilization=92.5%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=92.5%, max_apu_utilization=92.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=90%, max_compute_utilization=90%, max_compute_16bit_utilization=90%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=90%, max_apu_utilization=90%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=90%, max_compute_utilization=90%, max_compute_16bit_utilization=90%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=90%, max_apu_utilization=90%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=87.5%, max_compute_utilization=87.5%, max_compute_16bit_utilization=87.5%, max_memory_utilization (weights)=87.5%, max_input_aligner_utilization=87.5%, max_apu_utilization=87.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=87.5%, max_compute_utilization=87.5%, max_compute_16bit_utilization=87.5%, max_memory_utilization (weights)=87.5%, max_input_aligner_utilization=87.5%, max_apu_utilization=87.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=85%, max_compute_utilization=85%, max_compute_16bit_utilization=85%, max_memory_utilization (weights)=85%, max_input_aligner_utilization=85%, max_apu_utilization=85%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=85%, max_compute_utilization=85%, max_compute_16bit_utilization=85%, max_memory_utilization (weights)=85%, max_input_aligner_utilization=85%, max_apu_utilization=85%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=82.5%, max_compute_utilization=82.5%, max_compute_16bit_utilization=82.5%, max_memory_utilization (weights)=82.5%, max_input_aligner_utilization=82.5%, max_apu_utilization=82.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=82.5%, max_compute_utilization=82.5%, max_compute_16bit_utilization=82.5%, max_memory_utilization (weights)=82.5%, max_input_aligner_utilization=82.5%, max_apu_utilization=82.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=80%, max_compute_utilization=80%, max_compute_16bit_utilization=80%, max_memory_utilization (weights)=80%, max_input_aligner_utilization=80%, max_apu_utilization=80%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=80%, max_compute_utilization=80%, max_compute_16bit_utilization=80%, max_memory_utilization (weights)=80%, max_input_aligner_utilization=80%, max_apu_utilization=80%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=77.5%, max_compute_utilization=77.5%, max_compute_16bit_utilization=77.5%, max_memory_utilization (weights)=77.5%, max_input_aligner_utilization=77.5%, max_apu_utilization=77.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=77.5%, max_compute_utilization=77.5%, max_compute_16bit_utilization=77.5%, max_memory_utilization (weights)=77.5%, max_input_aligner_utilization=77.5%, max_apu_utilization=77.5%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 100%                | 75%                 | 36.7%              |
[info] | cluster_1 | 87.5%               | 79.7%               | 36.7%              |
[info] | cluster_2 | 100%                | 76.6%               | 35.9%              |
[info] | cluster_3 | 100%                | 70.3%               | 50.8%              |
[info] | cluster_4 | 93.8%               | 57.8%               | 49.2%              |
[info] | cluster_5 | 100%                | 73.4%               | 37.5%              |
[info] | cluster_6 | 100%                | 95.3%               | 28.1%              |
[info] | cluster_7 | 100%                | 82.8%               | 25.8%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 97.7%               | 76.4%               | 37.6%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 17s)
[info] Compiling context_0...
[info] Bandwidth of model inputs: 9.375 Mbps, outputs: 4.29382 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Building HEF...
[info] Successful Compilation (compilation time: 6s)
[info] Saved HAR to: /home/erez/hailo_model_zoo/yolov8n.har
<Hailo Model Zoo INFO> HEF file written to yolov8n.hef```

Any advice?
I’m wondering what is the issue

Can you assist please?

Hi @Erez,

I see that you used the --performance parameter when running the hailomz compile command. This means that the compilation will be run in β€œperformance mode” , so the compiler will run an exhaustive search to find the solution that guarantees the highest throughput.
In more details, the compiler will set different resources utilization levels and will try to allocate the model many times, and then it will select the best solution. For some of these utilization levels, it may happen that the compilation fails, and the compiler moves to the next iteration.
I see from your log that you got a yolov8n.hef at the end of the process, which means that the compilation was successful.

As a side note, I suggest you run compilation with performance mode only when you really need to optimize the model’s performance, it is generally not needed in earlier phase of development (as it takes much more time than the standard compilation).

Thanks for the reply.

What is the reason for the 2 first warnings?
How can I solve it?

And I’m in the process that I need to compile the most robust hef file… What are the next steps that I should do?

@Erez those warning indicate that the model optimization level was set to 0 (standard calibration with Equalization), since a GPU was not detected or not set up correctly.
Please look at the DataFlow Compiler User Guide (requires Developer Zone login) to verify the requirements to run optimization algorithms on the GPU.

If you installed only the DataFlow Compiler or the Self-extractable SW Suite, you will need to install the right CUDA dependencies manually.

If you installed the SW Suite using the Docker (see SW Suite User Guide), the CUDA dependencies will be included already.

I’m trying to work with docker and following your procedures.
I was able to train yolo model correctly with a GPU available for me.

What could be the reason that your code is not able to identify the GPU?

I used your docker image following the examples that you sent me and still getting the following error:

[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results

additional info about my CPU details:

nvidia-smi
Wed Jan 15 08:26:11 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000000:00:04.0 Off |                    0 |
| N/A   36C    P8              9W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

If you are able to train a model using your GPU, but not to optimize the model when running Hailo DFC, the issue is likely a mismatch of the CUDA/CUDNN and driver version with respect to the Tensorflow version used by the DFC. In fact, the DFC uses Tensorflow (version 2.12.0 for DFC 3.30) as backend, which has its own requirements (Build from source  |  TensorFlow).

When using the Docker, you are sure that you have the right CUDA/CUDNN dependencies. If you still have the issue, you can try the following:

  • Installing Nvidia driver 525 outside the docker, which offers better compatibility with CUDA 11.8.
  • Make sure that to install nvidia-docker2, as explained in the Hailo AI SW Suite User Guide
1 Like

Thanks for this info. This is explains why I have the issue with my GPU in WSL2. I have the following driver versions:
NVIDIA-SMI 565.77.01
Driver Version: 566.36
CUDA Version: 12.7

In the case of WSL2, is there any alternative besides downgrading the drivers on the Windows side?

Hi @gt-robot,

This is a more complicated scenario.
here is a link to enable GPU support under WSL (CUDA on WSL).
However, to run optimization using Hailo DataFlow Compiler (instlled under WSL) on the GPU, you have to stick to compatible CUDA/CUDNN/Driver versions as explained above.
If possible, we recommend installing the suite on a native Ubuntu 20/22 (soon 24) system, since that simplifies the flow and allows you to test other components (like TAPPAS demos, …), which are not available in WSL

Thanks for your reply. I was able to enable the GPU support in WSL2 for training only using a docker container Training - hailo model zoo . I wonder if there’s a docker file available to build a container with DFC and the compatible CUDA version inside of it.

Sorry, I just realized that you mentioned this above. I guess even if I use a container, it won’t work.

Hi

I installed nvidia 252 via

sudo apt install nvidia-driver-525

and got the following SMI

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P8              11W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

for some reason i can see that 535 driver is installed. I’m using Ubuntu 22.04.
Should I aim to see CUDA version 11.8? is it a must?

I’ve install my machine for scratch. Followed your instructions step by step without success

I’m using your docker image and still getting the same errors about not finding relevant GPU

@Erez could you let me know which Ubuntu and DFC versions are you using.
Make sure that you followed the commands in the Hailo AI SW Suite User Guide exactly in the order they are written, without missing anything. It is extremely important that the nvidia-docker2 and a driver version >= 525.0 are installed before running the docker for the first time.

I replicated the main steps on an Ubuntu 22 machine, please check them and and look for possible mismatches.

  • First, I added my user to the docker group, logged out and logged back in.

    sudo usermod -aG docker ${USER}
    
  • Then, I looked for the driver version outside on my machine. The version is already greater than 525.0, so I sticked with it (despite 525.0 offers better compatibility, according to CUDA documentation)

  • Installed nvidia-docker2, as per instructions:



    nvidia_docker2 restart docker

  • I ran the hailo_ai_sw_suite_docker_run.sh script to install the Hailo Docker:


    As you can see, the –gpus all argument was used in the container creation. This argument will be used ONLY if the GPU was successfully detected. In more details, the docker run scripts checks for the output of the following commands:

    lspci | grep "VGA compatible controller: NVIDIA"
    dpkg -l | grep 'nvidia-docker\|nvidia-container-toolkit'
    

    Please check if these commands give a non-empty output when you run them outside the docker, and also create a new container to verify the arguments passed to the docker run command.

thanks for the detailed explanation!

I ran the script and saw that docker is not running with the -gpus all flag.
Looks like that your code is not able to identify this PCI entry:

00:04.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

I made a small change in the script to use the --gpus all flag regardless to the GPU verification.

looks like I’m able to use the GPU but got the following error:

This is the log that I’ve:

<Hailo Model Zoo INFO> Start run for network yolov8n ...
<Hailo Model Zoo INFO> Initializing the hailo8 runner...
[info] Translation started on ONNX model yolov8n
[info] Restored ONNX model yolov8n (completion time: 00:00:00.11)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.33)
[info] Simplified ONNX model for a parsing retry attempt (completion time: 00:00:00.84)
[info] According to recommendations, retrying parsing with end node names: ['/model.22/Concat_3'].
[info] Translation started on ONNX model yolov8n
[info] Restored ONNX model yolov8n (completion time: 00:00:00.05)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.22)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: 'images': 'yolov8n/input_layer1'.
[info] End nodes mapped from original model: '/model.22/Concat_3'.
[info] Translation completed on ONNX model yolov8n (completion time: 00:00:01.08)
[info] Translation started on ONNX model yolov8n
[info] Restored ONNX model yolov8n (completion time: 00:00:00.04)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.25)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: 'images': 'yolov8n/input_layer1'.
[info] End nodes mapped from original model: '/model.22/cv2.0/cv2.0.2/Conv', '/model.22/cv3.0/cv3.0.2/Conv', '/model.22/cv2.1/cv2.1.2/Conv', '/model.22/cv3.1/cv3.1.2/Conv', '/model.22/cv2.2/cv2.2.2/Conv', '/model.22/cv3.2/cv3.2.2/Conv'.
[info] Translation completed on ONNX model yolov8n (completion time: 00:00:00.88)
[info] Appending model script commands to yolov8n from string
[info] Added nms postprocess command to model script.
[info] Saved HAR to: /local/workspace/yolov8n.har
<Hailo Model Zoo INFO> Using generic alls script found in /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n.alls because there is no specific hardware alls
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to yolov8n from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n.alls
[info] Loading model script commands to yolov8n from string
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.68)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 64/64 [00:35<00:00,  1.81entries/s]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:38.23)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[info] No shifts available for layer yolov8n/conv60/conv_op, using max shift instead. delta=0.4878
[info] No shifts available for layer yolov8n/conv60/conv_op, using max shift instead. delta=0.2439
[warning] Reducing output bits of yolov8n/conv63 by 6.0 bits (More than half)
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune
Epoch 1/4
.
.
.
.
.

128/128 [==============================] - 90s 702ms/step - total_distill_loss: 2.2202 - _distill_loss_yolov8n/conv41: 0.0606 - _distill_loss_yolov8n/conv42: 0.0604 - _distill_loss_yolov8n/conv52: 0.1267 - _distill_loss_yolov8n/conv53: 0.1365 - _distill_loss_yolov8n/conv62: 0.1905 - _distill_loss_yolov8n/conv63: 1.0000 - _distill_loss_yolov8n/conv35: 0.0997 - _distill_loss_yolov8n/conv57: 0.3012 - _distill_loss_yolov8n/conv46: 0.2446
[info] Model Optimization Algorithm Quantization-Aware Fine-Tuning is done (completion time is 00:08:32.74)
[info] Starting Layer Noise Analysis
Full Quant Analysis: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [02:59<00:00, 89.58s/iterations]
[info] Model Optimization Algorithm Layer Noise Analysis is done (completion time is 00:03:04.35)
[info] Model Optimization is done
[info] Saved HAR to: /local/workspace/yolov8n.har
<Hailo Model Zoo INFO> Using generic alls script found in /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n.alls because there is no specific hardware alls
[info] Loading model script commands to yolov8n from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n.alls
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Appending model script commands to yolov8n from string
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv41
[info] Adding an output layer after conv42
[info] Adding an output layer after conv52
[info] Adding an output layer after conv53
[info] Adding an output layer after conv62
[info] Adding an output layer after conv63
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=120%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=120%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%

Validating context_0 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 

● Finished                                       

[info] Solving the allocation (Mapping), time per context: 9m 59s
Context:0/0 Iteration 4: Trying parallel mapping...  
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost 
 worker0  *          *          *          *          *          *          *          *          V       
 worker1  V          V          X          V          V          V          V          V          V       
 worker2                                                                                                  
 worker3                                                                                                  

  17:39
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] Iterations: 4
Reverts on cluster mapping: 1
Reverts on inter-cluster connectivity: 1
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[error] Mapping Failed (Timeout, allocation time: 21m 7s)
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Running Auto-Merger
[info] Auto-Merger is done
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=117.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=117.5%, max_compute_utilization=100%, max_compute_16bit_utilization=100%, max_memory_utilization (weights)=90%, max_input_aligner_utilization=100%, max_apu_utilization=100%

Validating context_0 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
                                         
● Finished1chortcut_conv4_to_conv539v38_sd0-1s

The training is still running. Looks like that
[error] Mapping Failed (Timeout, allocation time: 21m 7s)

should take around 9 mins according to the log

Hi @Erez,

Interesting. This seems to be related to the class of the GPU card. As you mentioned, since the device is listed as 3D controller, it is not detected by the lspci + grep command used in the docker script. I will check this with R&D and will get back to you. Thanks for running the test, this will help us improving future versions of the SW Suite.
Could you please confirm that the GPU is working correctly and you can see some activity on the GPU when running optimization? (open nvtop from another terminal while the optimization is running)