YOLOV8 convert to HEF

The detection performance of the YOLOV8S model that self trained two categories and converted it to HEF is very poor. I have put the conversion process at a later stage and hope to receive help. Thank you very much

My compilation command is as follows:

hailomz compile --ckpt best.onnx --calib-path phone_clothes_val/ --yaml ./hailo_model_zoo/hailo_model_zoo/cfg/networks/yolov8s.yaml --classes 2 --hw-arch hailo8

The content of the yolov8.alls file is as follows:

normalization1 = normalization([127.5, 127.5, 127.5], [128.0, 128.0, 128.0])
change_output_activation(conv42, sigmoid)
change_output_activation(conv53, sigmoid)
change_output_activation(conv63, sigmoid)
model_optimization_flavor(optimization_level=2)
nms_postprocess(“../../postprocess_config/yolov8s_nms_config.json”, meta_arch=yolov8, engine=cpu)

The content of the yolov8s.yaml file is as follows:

base:

The content of the yolov8s_nms_config.json file is as follows:

{
“nms_scores_th”: 0.2,
“nms_iou_th”: 0.7,
“image_dims”: [
640,
640
],
“max_proposals_per_class”: 100,
“classes”: 80,
“regression_length”: 16,
“background_removal”: false,
“background_removal_index”: 0,
“bbox_decoders”: [
{
“name”: “bbox_decoder41”,
“stride”: 8,
“reg_layer”: “conv41”,
“cls_layer”: “conv42”
},
{
“name”: “bbox_decoder52”,
“stride”: 16,
“reg_layer”: “conv52”,
“cls_layer”: “conv53”
},
{
“name”: “bbox_decoder62”,
“stride”: 32,
“reg_layer”: “conv62”,
“cls_layer”: “conv63”
}
]
}

The output result is:

Start run for network yolov8s …
Initializing the hailo8 runner…
[info] Translation started on ONNX model yolov8s
[info] Restored ONNX model yolov8s (completion time: 00:00:00.29)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:01.51)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3.2/cv3.2.2/Conv.
[info] Start nodes mapped from original model: ‘images’: ‘yolov8s/input_layer1’.
[info] End nodes mapped from original model: ‘/model.22/cv2.0/cv2.0.2/Conv’, ‘/model.22/cv3.0/cv3.0.2/Conv’, ‘/model.22/cv2.1/cv2.1.2/Conv’, ‘/model.22/cv3.1/cv3.1.2/Conv’, ‘/model.22/cv2.2/cv2.2.2/Conv’, ‘/model.22/cv3.2/cv3.2.2/Conv’.
[info] Translation completed on ONNX model yolov8s (completion time: 00:00:02.85)
[info] Appending model script commands to yolov8s from string
[info] Added nms postprocess command to model script.
[info] Saved HAR to: /home/bbibm/Project/03_mading/hailo_v/yolov8s.har
Preparing calibration data…
[info] Loading model script commands to yolov8s from /home/bbibm/Project/03_mading/hailo_v/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8s.alls
[info] Loading model script commands to yolov8s from string
[info] Starting Model Optimization
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.75)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:39<00:00, 1.64entries/s]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:42.67)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune
Epoch 1/4
128/128 [==============================] - 327s 1s/step - total_distill_loss: 4.4894 - _distill_loss_yolov8s/conv41: 0.2064 - _distill_loss_yolov8s/conv42: 0.6920 - _distill_loss_yolov8s/conv52: 0.2222 - _distill_loss_yolov8s/conv53: 0.6396 - _distill_loss_yolov8s/conv62: 0.2488 - _distill_loss_yolov8s/conv63: 1.3986 - _distill_loss_yolov8s/conv46: 0.3972 - _distill_loss_yolov8s/conv35: 0.2671 - _distill_loss_yolov8s/conv57: 0.4176
Epoch 2/4
128/128 [==============================] - 161s 1s/step - total_distill_loss: 7.0507 - _distill_loss_yolov8s/conv41: 0.4138 - _distill_loss_yolov8s/conv42: 0.9390 - _distill_loss_yolov8s/conv52: 0.4335 - _distill_loss_yolov8s/conv53: 0.9305 - _distill_loss_yolov8s/conv62: 0.5666 - _distill_loss_yolov8s/conv63: 1.4972 - _distill_loss_yolov8s/conv46: 0.7388 - _distill_loss_yolov8s/conv35: 0.5749 - _distill_loss_yolov8s/conv57: 0.9563
Epoch 3/4
128/128 [==============================] - 161s 1s/step - total_distill_loss: 6.5922 - _distill_loss_yolov8s/conv41: 0.4091 - _distill_loss_yolov8s/conv42: 0.9982 - _distill_loss_yolov8s/conv52: 0.4148 - _distill_loss_yolov8s/conv53: 1.0001 - _distill_loss_yolov8s/conv62: 0.5205 - _distill_loss_yolov8s/conv63: 1.0000 - _distill_loss_yolov8s/conv46: 0.7590 - _distill_loss_yolov8s/conv35: 0.6152 - _distill_loss_yolov8s/conv57: 0.8753
Epoch 4/4
128/128 [==============================] - 161s 1s/step - total_distill_loss: 6.3592 - _distill_loss_yolov8s/conv41: 0.3663 - _distill_loss_yolov8s/conv42: 0.9990 - _distill_loss_yolov8s/conv52: 0.3890 - _distill_loss_yolov8s/conv53: 1.0001 - _distill_loss_yolov8s/conv62: 0.4769 - _distill_loss_yolov8s/conv63: 1.0000 - _distill_loss_yolov8s/conv46: 0.7297 - _distill_loss_yolov8s/conv35: 0.5925 - _distill_loss_yolov8s/conv57: 0.8055
[info] Model Optimization Algorithm Quantization-Aware Fine-Tuning is done (completion time is 00:13:34.82)
[info] Starting Layer Noise Analysis
Full Quant Analysis: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [03:08<00:00, 94.44s/iterations]
[info] Model Optimization Algorithm Layer Noise Analysis is done (completion time is 00:03:14.92)
[info] Model Optimization is done
[info] Saved HAR to: /home/bbibm/Project/03_mading/hailo_v/yolov8s.har
[info] Loading model script commands to yolov8s from /home/bbibm/Project/03_mading/hailo_v/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8s.alls
[info] To achieve optimal performance, set the compiler_optimization_level to “max” by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Adding an output layer after conv41
[info] Adding an output layer after conv42
[info] Adding an output layer after conv52
[info] Adding an output layer after conv53
[info] Adding an output layer after conv62
[info] Adding an output layer after conv63
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy → GREEDY Objective → MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy → GREEDY Objective → MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%

Validating context_0 layer by layer (100%)






● Finished

[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/0 Iteration 4: Trying parallel mapping…
cluster_0 cluster_1 cluster_2 cluster_3 cluster_4 cluster_5 cluster_6 cluster_7 prepost
worker0 * * * * * * * * V
worker1 V V V V V V V V V
worker2 * * * * * * * * V
worker3 V V V V V V V V V

00:08
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0

[info] Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 1
Reverts on split failed: 0
[info] ±----------±--------------------±--------------------±-------------------+
[info] | Cluster | Control Utilization | Compute Utilization | Memory Utilization |
[info] ±----------±--------------------±--------------------±-------------------+
[info] | cluster_0 | 62.5% | 75% | 53.1% |
[info] | cluster_1 | 56.3% | 50% | 23.4% |
[info] | cluster_2 | 75% | 71.9% | 61.7% |
[info] | cluster_3 | 87.5% | 57.8% | 39.8% |
[info] | cluster_4 | 75% | 57.8% | 73.4% |
[info] | cluster_5 | 62.5% | 76.6% | 51.6% |
[info] | cluster_6 | 100% | 73.4% | 50% |
[info] | cluster_7 | 81.3% | 40.6% | 47.7% |
[info] ±----------±--------------------±--------------------±-------------------+
[info] | Total | 75% | 62.9% | 50.1% |
[info] ±----------±--------------------±--------------------±-------------------+
[info] Successful Mapping (allocation time: 41s)
[info] Compiling context_0…
[info] Bandwidth of model inputs: 9.375 Mbps, outputs: 4.22974 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Building HEF…
[info] Successful Compilation (compilation time: 28s)
[info] Saved HAR to: /home/bbibm/Project/03_mading/hailo_v/yolov8s.har
HEF file written to yolov8s.hef

Hi @_2312882685 Welcome to Hailo’s community.

Your issue—poor detection performance after converting a self-trained YOLOv8s model (with 2 classes) to HEF for Hailo-8—is a known challenge, especially with custom YOLOv8 models trained on a small number of classes or limited data. This often results from quantization issues during model optimization, which can cause some output nodes to be “almost nullified,” leading to poor accuracy on the Hailo device.

Key recommendations:

1. Adjust Quantization Parameters in the Model Script

For YOLOv8 models with few classes, it’s recommended to explicitly set the output range for the final convolution layers before NMS. In your .alls script, replace the change_output_activation lines with a quantization_param command as follows:
Your required output layer numbers may differ

normalization1 = normalization([127.5, 127.5, 127.5], [128.0, 128.0, 128.0])
quantization_param([conv42, conv53, conv63], force_range_out=[0.0, 1.0])
model_optimization_flavor(optimization_level=2)
nms_postprocess("../../postprocess_config/yolov8s_nms_config.json", meta_arch=yolov8, engine=cpu)

This change helps maintain a proper dynamic range for the outputs, which is especially important for models with a low number of classes. Multiple users have reported that this adjustment significantly improves detection accuracy after conversion to HEF, as confirmed by Hailo support and community members see this guidance and here.

2. Update the NMS Config

Make sure your yolov8s_nms_config.json reflects the correct number of classes (in your case, "classes": 2). If it is set to 80 (the COCO default), NMS may not work as expected for your custom model.

3. Calibration Dataset

Ensure your calibration dataset is representative of your real data and is properly normalized. Poor calibration data can also lead to suboptimal quantization and poor runtime accuracy.

4. Additional Notes
  • The rest of your conversion pipeline and YAML configuration appear correct.

  • If you continue to see poor results, try increasing the size and diversity of your calibration set, and double-check that your input preprocessing (normalization) matches what the model expects.