Hello hailo community,\
I have a question reagarding to ONNX to hef conversion. I have a custom dataset which consists of single label (car).
When I try to convert it to the hef file the total distill loss is not decreasing. I am suspecting that this why I cant
get good accuracy when I run my model in runtime in hailo8. Detection performance is pretty good when I infer directly from torch.
Can you help me about it ? How can I solve this issue ?
This how I export my trained yolov8s model:
$: yolo export model=best.pt format=onnx opset=11 imgsz=640
I also use docker enviroment for hef conversion. These are versions of each tools:
$: hailo --version
[info] Current Time: 12:16:11, 12/27/24
[info] CPU: Architecture: x86_64, Model: Intel(R) Xeon(R) W-2195 CPU @ 2.30GHz, Number Of Cores: 36, Utilization: 0.8%
[info] Memory: Total: 125GB, Available: 52GB
[info] System info: OS: Linux, Kernel: 5.19.0-46-generic
[info] Hailo DFC Version: 3.29.0
[info] HailoRT Version: 4.19.0
[info] PCIe: No Hailo PCIe device was found
[info] Running `hailo --version`
HailoRT v4.19.0
Hailo Dataflow Compiler v3.29.0
The following one is the model script that I use:
quantization_param([conv42, conv53, conv63], force_range_out=[0.0, 1.0])
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv42, sigmoid)
change_output_activation(conv53, sigmoid)
change_output_activation(conv63, sigmoid)
nms_postprocess("../../postprocess_config/yolov8s_nms_config.json", meta_arch=yolov8, engine=cpu)
I use this command to compile:
hailomz compile --ckpt /local/shared_with_docker/best.onnx --calib-path /local/shared_with_docker/images/ --yaml hailo_model_zoo/hailo_model_zoo/cfg/networks/yolov8s.yaml --start-node-names images --classes 1 --hw-arch hailo8
<Hailo Model Zoo INFO> Start run for network yolov8s ...
<Hailo Model Zoo INFO> Initializing the hailo8 runner...
[info] Translation started on ONNX model yolov8s
[info] Restored ONNX model yolov8s (completion time: 00:00:00.22)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.77)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: 'images': 'yolov8s/input_layer1'.
[info] End nodes mapped from original model: '/model.22/cv2.0/cv2.0.2/Conv', '/model.22/cv3.0/cv3.0.2/Conv', '/model.22/cv2.1/cv2.1.2/Conv', '/model.22/cv3.1/cv3.1.2/Conv', '/model.22/cv2.2/cv2.2.2/Conv', '/model.22/cv3.2/cv3.2.2/Conv'.
[info] Translation completed on ONNX model yolov8s (completion time: 00:00:01.24)
[info] Saved HAR to: /local/workspace/yolov8s.har
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to yolov8s from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8s.alls
[info] Loading model script commands to yolov8s from string
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.76)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:26<00:00, 2.39entries/s]
[info] Statistics Collector is done (completion time is 00:00:28.84)
[warning] The force_range command has been used, notice that its behavior was changed on this version. The old behavior forced the range on the collected calibration set statistics, but allowed the range to change during the optimization algorithms.
The new behavior forces the range throughout all optimization stages.
The old method could be restored by adding the flag weak_force_range_out=enabled to the force_range command on the following layers ['yolov8s/conv42', 'yolov8s/conv53', 'yolov8s/conv63']
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune
127/128 [============================>.] - ETA: 0s - total_distill_loss: 16.7721 - _distill_loss_yolov8s/conv41: 1.0086 - _distill_loss_yolov8s/conv42: 1.3785 - _distill_loss_yolov8s/conv52: 0.6110 - _distill_loss_yolov8s/conv53: 3.1424 - _distill_loss_yolov8s/conv62: 0.4986 - _distill_loss_yolov8s/conv63: 3.9918 - _distill_loss_yolov8s/conv46: 2.2524 - _distill_loss_yolov8s/conv35: 1.6849 - _distill_loss_yolov8s/conv5128/128 [==============================] - ETA: 0s - total_distill_loss: 16.7585 - _distill_loss_yolov8s/conv41: 1.0093 - _distill_loss_yolov8s/conv42: 1.3750 - _distill_loss_yolov8s/conv52: 0.6116 - _distill_loss_yolov8s/conv53: 3.1264 - _distill_loss_yolov8s/conv62: 0.5003 - _distill_loss_yolov8s/conv63: 3.9918 - _distill_loss_yolov8s/conv46: 2.2532 - _distill_loss_yolov8s/conv35: 1.6854 - _distill_loss_yolov8s/conv5128/128 [==============================] - 34s 265ms/step - total_distill_loss: 16.7451 - _distill_loss_yolov8s/conv41: 1.0100 - _distill_loss_yolov8s/conv42: 1.3716 - _distill_loss_yolov8s/conv52: 0.6122 - _distill_loss_yolov8s/conv53: 3.1106 - _distill_loss_yolov8s/conv62: 0.5021 - _distill_loss_yolov8s/conv63: 3.9919 - _distill_loss_yolov8s/conv46: 2.2540 - _distill_loss_yolov8s/conv35: 1.6860 - _distill_loss_yolov8s/conv57: 2.2067
[info] Quantization-Aware Fine-Tuning is done (completion time is 00:09:19.10)
[info] Starting Layer Noise Analysis
Full Quant Analysis: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:10<00:00, 65.12s/iterations]
[info] Layer Noise Analysis is done (completion time is 00:02:14.49)
[info] Model Optimization is done
[info] Saved HAR to: /local/workspace/yolov8s.har
[info] Loading model script commands to yolov8s from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8s.alls
[info] To achieve optimal performance, set the compiler_optimization_level to "max" by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv41
[info] Adding an output layer after conv42
[info] Adding an output layer after conv52
[info] Adding an output layer after conv53
[info] Adding an output layer after conv62
[info] Adding an output layer after conv63
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
Validating context_0 layer by layer (100%)
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
● Finished
[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/0 Iteration 4: Trying parallel mapping...
cluster_0 cluster_1 cluster_2 cluster_3 cluster_4 cluster_5 cluster_6 cluster_7 prepost
worker0 * * * * * * * * V
worker1 * * * * * * * * V
worker2 V V V V V V V V V
worker3 V V V V V V V V V
00:10
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 2
Reverts on split failed: 0
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 100% | 62.5% | 57% |
[info] | cluster_1 | 100% | 81.3% | 93% |
[info] | cluster_2 | 87.5% | 62.5% | 33.6% |
[info] | cluster_3 | 100% | 87.5% | 41.4% |
[info] | cluster_4 | 100% | 79.7% | 71.9% |
[info] | cluster_5 | 31.3% | 35.9% | 12.5% |
[info] | cluster_6 | 75% | 87.5% | 69.5% |
[info] | cluster_7 | 6.3% | 6.3% | 1.6% |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total | 75% | 62.9% | 47.6% |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 38s)
[info] Compiling context_0...
[info] Bandwidth of model inputs: 9.375 Mbps, outputs: 4.16565 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 12.5 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Building HEF...
[info] Successful Compilation (compilation time: 22s)
[info] Saved HAR to: /local/workspace/yolov8s.har
<Hailo Model Zoo INFO> HEF file written to yolov8s.hef