I am experiencing a significant accuracy drop after converting my YOLOv8 model to HEF using the DFC. Below are the details of my workflow, including training parameters, conversion commands, and calibration dataset preparation.
Training Details
I trained my model using YOLOv8 with the following parameters:
def train():
model = YOLO(“yolov8n.pt”)
model.train(
data=“train.yaml”,
device=0,
optimizer=“RAdam”,
epochs=700,
patience=100,
save_period=50,
save=True,
val=True,
plots=True,
cos_lr=True,
lr0=0.00091,
lrf=0.00618,
momentum=0.79263,
weight_decay=0.001,
warmup_epochs=1.24174,
warmup_momentum=0.69373,
box=8.93352,
cls=0.71801,
dfl=0.69542,
hsv_h=0.01244,
hsv_s=0.61256,
hsv_v=0.27994,
degrees=0.0,
translate=0.08731,
scale=0.14756,
shear=0.0,
perspective=0.0,
flipud=0.0,
fliplr=0.26356,
bgr=0.0,
mosaic=1.0,
mixup=0.0,
copy_paste=0.0
)
ONNX Export
I exported the trained PyTorch model to ONNX using:
yolo export model=fire.pt imgsz=640 format=onnx opset=11
Conversion to HEF
I followed the Hailo toolchain to convert the model as follows:
- Parse ONNX to HAR
hailo parser onnx yolov8n.onnx --net-name yolov8n --har-path yolov8n.har --start-node-names images --hw-arch hailo8l
- Optimize Model
hailo optimize yolov8n.har --hw-arch hailo8l --calib-set-path calib_set.npy --output-har-path yolov8n_quantized_model.har
- Compile to HEF
hailo compiler yolov8n_quantized_model.har --hw-arch hailo8l --output-dir .
Calibration Dataset Details
I used 1200 diverse images for calibration and created the calibration dataset (calib_set.npy
) using the following script:
import os
import numpy as np
import cv2
input_dir = “calib-set/” # Image folder path
output_file = “calib-set.npy”
target_size = (640, 640)
image_files = sorted([f for f in os.listdir(input_dir) if f.lower().endswith((‘.png’, ‘.jpg’, ‘.jpeg’))])
image_list =
for img_name in image_files:
img_path = os.path.join(input_dir, img_name)
image = cv2.imread(img_path)
if image is None:
print(f"Skipping {img_name}, unable to read.")
continue
image = cv2.resize(image, target_size, interpolation=cv2.INTER_AREA)
image = np.array(image, dtype=np.uint8)
image_list.append(image)
if len(image_list) > 0:
calib_array = np.stack(image_list, axis=0)
np.save(output_file, calib_array)
print(f"Saved {output_file} with shape {calib_array.shape}")
else:
print(“No valid images found!”)
###Logs of the dfc compiler commands
(venv) rao@RL-23:$ hailo parser onnx yolo8n.onnx --net-name yolov8n --har-path yolo8n.har --start-node-names images --hw-arch hailo8l
[info] Current Time: 18:45:51, 03/25/25
[info] CPU: Architecture: x86_64, Model: 13th Gen Intel(R) Core™ i7-13620H, Number Of Cores: 16, Utilization: 0.2%
[info] Memory: Total: 7GB, Available: 6GB
[info] System info: OS: Linux, Kernel: 5.15.146.1-microsoft-standard-WSL2
[info] Hailo DFC Version: 3.30.0
[info] HailoRT Version: Not Installed
[info] PCIe: No Hailo PCIe device was found
[info] Running hailo parser onnx yolo8n.onnx --net-name yolov8n --har-path yolo8n.har --start-node-names images --hw-arch hailo8l
[info] Translation started on ONNX model yolov8n
[info] Restored ONNX model yolov8n (completion time: 00:00:00.08)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.22)
[info] Simplified ONNX model for a parsing retry attempt (completion time: 00:00:00.50)
Parsing failed with recommendations for end node names: [‘/model.22/Concat_3’].
Would you like to parse again with the recommendation? (y/n)
y
[info] According to recommendations, retrying parsing with end node names: [‘/model.22/Concat_3’].
[info] Translation started on ONNX model yolov8n
[info] Restored ONNX model yolov8n (completion time: 00:00:00.03)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.12)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: ‘images’: ‘yolov8n/input_layer1’.
[info] End nodes mapped from original model: ‘/model.22/Concat_3’.
[info] Translation completed on ONNX model yolov8n (completion time: 00:00:00.45)
Would you like to parse the model again with the mentioned end nodes and add nms postprocess command to the model script? (y/n)
y
[info] Translation started on ONNX model yolov8n
[info] Restored ONNX model yolov8n (completion time: 00:00:00.03)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.14)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: ‘images’: ‘yolov8n/input_layer1’.
[info] End nodes mapped from original model: ‘/model.22/cv2.0/cv2.0.2/Conv’, ‘/model.22/cv3.0/cv3.0.2/Conv’, ‘/model.22/cv2.1/cv2.1.2/Conv’, ‘/model.22/cv3.1/cv3.1.2/Conv’, ‘/model.22/cv2.2/cv2.2.2/Conv’, ‘/model.22/cv3.2/cv3.2.2/Conv’.
[info] Translation completed on ONNX model yolov8n (completion time: 00:00:00.47)
[info] Appending model script commands to yolov8n from string
[info] Added nms postprocess command to model script.
(venv) rao@RL-23:$ hailo optimize new.har --hw-arch hailo8l --calib-set-path ./calib-set.npy --output-har-path quantized_model.har
[info] Current Time: 15:13:13, 03/25/25
[info] CPU: Architecture: x86_64, Model: 13th Gen Intel(R) Core™ i7-13620H, Number Of Cores: 16, Utilization: 0.3%
[info] Memory: Total: 7GB, Available: 6GB
[info] System info: OS: Linux, Kernel: 5.15.146.1-microsoft-standard-WSL2
[info] Hailo DFC Version: 3.30.0
[info] HailoRT Version: Not Installed
[info] PCIe: No Hailo PCIe device was found
[info] Running hailo optimize new.har --hw-arch hailo8l --calib-set-path ./calib-set.npy --output-har-path quantized_model.har
[info] For NMS architecture yolov8 the default engine is cpu. For other engine please use the ‘engine’ flag in the nms_postprocess model script command. If the NMS has been added during parsing, please parse the model again without confirming the addition of the NMS, and add the command manually with the desired engine.
[info] The layer yolov8n/conv41 was detected as reg_layer.
[info] The layer yolov8n/conv42 was detected as cls_layer.
[info] The layer yolov8n/conv52 was detected as reg_layer.
[info] The layer yolov8n/conv53 was detected as cls_layer.
[info] The layer yolov8n/conv62 was detected as reg_layer.
[info] The layer yolov8n/conv63 was detected as cls_layer.
[info] Using the default score threshold of 0.001 (range is [0-1], where 1 performs maximum suppression) and IoU threshold of 0.7 (range is [0-1], where 0 performs maximum suppression).
Changing the values is possible using the nms_postprocess model script command.
[info] The activation function of layer yolov8n/conv42 was replaced by a Sigmoid
[info] The activation function of layer yolov8n/conv53 was replaced by a Sigmoid
[info] The activation function of layer yolov8n/conv63 was replaced by a Sigmoid
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won’t be optimized and compression won’t be used) because there’s no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.33)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:19<00:00, 3.32entries/s]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:20.27)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Quantization-Aware Fine-Tuning skipped
[info] Layer Noise Analysis skipped
[info] Model Optimization is done
[info] Saved HAR to: /quantized_model.har
(venv) rao@RL-23:$ hailo compiler quantized_model.har --hw-arch hailo8l --output-har-path .
[info] Current Time: 15:15:48, 03/25/25
[info] CPU: Architecture: x86_64, Model: 13th Gen Intel(R) Core™ i7-13620H, Number Of Cores: 16, Utilization: 1.1%
[info] Memory: Total: 7GB, Available: 6GB
[info] System info: OS: Linux, Kernel: 5.15.146.1-microsoft-standard-WSL2
[info] Hailo DFC Version: 3.30.0
[info] HailoRT Version: Not Installed
[info] PCIe: No Hailo PCIe device was found
[info] Running hailo compiler quantized_model.har --hw-arch hailo8l --output-har-path .
[info] Compiling network
[info] To achieve optimal performance, set the compiler_optimization_level to “max” by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv41
[info] Adding an output layer after conv42
[info] Adding an output layer after conv52
[info] Adding an output layer after conv53
[info] Adding an output layer after conv62
[info] Adding an output layer after conv63
[info] Finding the best partition to contexts…
[…<==>] Duration: 00:00:11
Iteration Done
[…<==>…] Duration: 00:00:22
Iteration Done, Performance improved by 1.8%
[..<==>…] Duration: 00:00:07
Iteration Done, Performance improved by 0.4%
[…<==>…] Duration: 00:00:13
Iteration Done, Performance improved by 1.8%
[…<==>…] Duration: 00:00:08
Iteration Done, Performance improved by 21.5%
[…<==>…] Duration: 00:00:42
Iteration Done, Performance improved by 9.9%
[…<==>…] Duration: 00:00:06
Iteration Done, Performance improved by 5.0%
[…<==>…] Duration: 00:00:05
Iteration Done, Performance improved by 0.4%
[…<==>] Elapsed: 00:01:33
[info] Using Multi-context flow
[info] Resources optimization guidelines: Strategy → GREEDY Objective → MAX_FPS
[info] Resources optimization params: max_control_utilization=60%, max_compute_utilization=60%, max_compute_16bit_utilization=60%, max_memory_utilization (weights)=60%, max_input_aligner_utilization=60%, max_apu_utilization=60%
Validating context_0 layer by layer (100%)
● Finished
Validating context_1 layer by layer (100%)
● Finished
Validating context_2 layer by layer (100%)
● Finished
[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/2 Iteration 4: Trying parallel mapping…
cluster_0 cluster_1 cluster_2 cluster_3 cluster_4 cluster_5 cluster_6 cluster_7 prepost
worker0 V V * * V V * * V
worker1 V V * * V V * * V
worker2 * * * * V * * * V
worker3 V V * * V V * * V
Context:1/2 Iteration 4: Trying parallel mapping…
cluster_0 cluster_1 cluster_2 cluster_3 cluster_4 cluster_5 cluster_6 cluster_7 prepost
worker0 V V * * V V * * V
worker1 V V * * V V * * V
worker2 V V * * V V * * V
worker3 V V * * V V * * V
Context:2/2 Iteration 4: Trying parallel mapping…
cluster_0 cluster_1 cluster_2 cluster_3 cluster_4 cluster_5 cluster_6 cluster_7 prepost
worker0 V V * * V V * * V
worker1 V V * * V V * * V
worker2 V V * * V V * * V
worker3 V V * * V V * * V
00:07
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_0 (context_0):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_1 (context_1):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_2 (context_2):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_0 utilization:
[info] ±----------±--------------------±--------------------±-------------------+
[info] | Cluster | Control Utilization | Compute Utilization | Memory Utilization |
[info] ±----------±--------------------±--------------------±-------------------+
[info] | cluster_0 | 62.5% | 21.9% | 25% |
[info] | cluster_1 | 50% | 29.7% | 40.6% |
[info] | cluster_4 | 43.8% | 25% | 23.4% |
[info] | cluster_5 | 87.5% | 60.9% | 58.6% |
[info] ±----------±--------------------±--------------------±-------------------+
[info] | Total | 60.9% | 34.4% | 36.9% |
[info] ±----------±--------------------±--------------------±-------------------+
[info] context_1 utilization:
[info] ±----------±--------------------±--------------------±-------------------+
[info] | Cluster | Control Utilization | Compute Utilization | Memory Utilization |
[info] ±----------±--------------------±--------------------±-------------------+
[info] | cluster_0 | 87.5% | 67.2% | 67.2% |
[info] | cluster_1 | 43.8% | 15.6% | 30.5% |
[info] | cluster_4 | 25% | 21.9% | 14.1% |
[info] | cluster_5 | 87.5% | 68.8% | 58.6% |
[info] ±----------±--------------------±--------------------±-------------------+
[info] | Total | 60.9% | 43.4% | 42.6% |
[info] ±----------±--------------------±--------------------±-------------------+
[info] context_2 utilization:
[info] ±----------±--------------------±--------------------±-------------------+
[info] | Cluster | Control Utilization | Compute Utilization | Memory Utilization |
[info] ±----------±--------------------±--------------------±-------------------+
[info] | cluster_0 | 81.3% | 87.5% | 53.9% |
[info] | cluster_1 | 62.5% | 29.7% | 30.5% |
[info] | cluster_4 | 62.5% | 82.8% | 55.5% |
[info] | cluster_5 | 31.3% | 25% | 18.8% |
[info] ±----------±--------------------±--------------------±-------------------+
[info] | Total | 59.4% | 56.3% | 39.6% |
[info] ±----------±--------------------±--------------------±-------------------+
[info] Successful Mapping (allocation time: 4m 3s)
[info] Compiling context_0…
[info] Compiling context_1…
[info] Compiling context_2…
[info] Bandwidth of model inputs: 9.375 Mbps, outputs: 4.22974 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 32.0312 Mbps (for a single frame)
[info] Compiling context_0…
[info] Compiling context_1…
[info] Compiling context_2…
[info] Bandwidth of model inputs: 9.375 Mbps, outputs: 4.22974 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 32.0312 Mbps (for a single frame)
[info] Building HEF…
[info] Successful Compilation (compilation time: 6s)
[info] Compilation complete
[info] Saved HEF to: /yolov8n.hef
Issue Observed
After running inference on the HEF file, the accuracy has dropped drastically compared to the original ONNX model. The detections are either missing or incorrect.
System Details:
- OS: Linux (WSL2)
- Kernel: 5.15.146.1-microsoft-standard-WSL2
- CPU: 13th Gen Intel(R) Core™ i7-13620H
- Memory: 7GB Total, 6GB Available
- Hailo DFC Version: 3.30.0
- HailoRT: Not Installed
- No GPU available
Questions:
- What could be causing the severe accuracy drop after the HEF conversion?
- Are there any recommended quantization settings or calibration strategies to improve accuracy?
- Is using WSL2 an issue for Hailo’s toolchain in this case?
- Should I modify my calibration dataset or use different preprocessing techniques?
Any guidance or troubleshooting steps would be highly appreciated. Thanks!