Questions about Output Shapes and NMS Config in yolov8n.onnx -> hef Conversion

Ohjun_Kwon · September 2, 2025, 10:43am

Hello,

When converting yolov8n.onnx to hef with the following NMS configuration options:

[yolov8n_nms_config.json]
{
    "nms_scores_th": 0.3,
    "nms_iou_th": 0.7,
    "image_dims": [
        640,
        640
    ],
    "max_proposals_per_class": 25,
    "classes": 1,
    "regression_length": 16,
    "background_removal": false,
    "background_removal_index": 0,
    "bbox_decoders": [
        {
            "name": "bbox_decoder41",
            "stride": 8,
            "reg_layer": "conv41",
            "cls_layer": "conv42"
        },
        {
            "name": "bbox_decoder52",
            "stride": 16,
            "reg_layer": "conv52",
            "cls_layer": "conv53"
        },
        {
            "name": "bbox_decoder62",
            "stride": 32,
            "reg_layer": "conv62",
            "cls_layer": "conv63"
        }
    ]
}

When checking the output shape of the converted yolov8n.hef model using hailort (C++), the output shape is (1, 80, 25, 2000). However, when checking the output shape using the hailo profiler (model.har), it shows (80, 5, 25).

I have two questions:

Setting aside the reshape, why is there memory allocated for 2000?
When applying the NMS config option in yolov8n, how is bbox decoding supposed to be done from the output?

Thank you for your help.

omria · September 4, 2025, 10:57am

Hey @Ohjun_Kwon,

Regarding Q1 - The 2000 dimension explanation:

The 2000 you’re seeing is the result of your NMS configuration being set to 80 classes with 25 maximum detections per class (80 × 25 = 2000 total detection slots).

While your JSON shows "classes": 1, the tensor dimensions indicate the model was compiled with the standard 80-class COCO configuration. This suggests either:

The JSON configuration wasn’t properly applied during model compilation
You’re referencing a different model file than intended

The tensor layout (1, 80, 25, 2000) represents:

Logical structure: [num_classes, attributes_per_detection, max_detections_per_class]
The 5 attributes per detection: [x_min, y_min, x_max, y_max, confidence_score]
The 2000 represents the flattened total allocation in the physical memory layout

Verification step: A properly compiled model with classes: 1 and max_proposals_per_class: 25 should yield dimensions totaling 125 (1×5×25). The presence of 2000 indicates the compilation used different parameters.

Regarding bounding box decoding with NMS enabled:

When NMS is properly configured with bbox_decoders in your compilation settings, no additional decoding is required. The Hailo hardware performs all processing on-chip:

Bounding box coordinates are decoded from anchor/grid format
Non-maximum suppression is applied per class
Results are filtered to your specified max_proposals_per_class
Coordinates are converted to pixel units based on your image_dims configuration

Output tensor structure:

The output follows a [num_classes, attributes_per_detection, max_detections_per_class] format where each detection contains:

Attribute 0: x_min coordinate
Attribute 1: y_min coordinate
Attribute 2: x_max coordinate
Attribute 3: y_max coordinate
Attribute 4: confidence score

Unused detection slots will have confidence scores below your threshold or set to zero.

Hope this helps!

shashi · September 4, 2025, 3:21pm

Hi @Ohjun_Kwon

While not a direct solution to your problem, I want to let you know about our cloud compiler. At DeGirum (a SW partner of Hailo), we developed a cloud compiler that helps users convert YOLO checkpoints to hef files: Early Access to DeGirum Cloud Compiler. You can see if the tool can help you get the compiled model you need.