How to interpret and process the yolo SKD_NATIVE model output?

Using DFC and YOLO11n on a custom visdrone dataaset.

I’m getting fairly terrible metrics (AP@50 = .39) in my quantized (and FP_OPTIMIZED) models compared to my .onnx model (AP@50 = .50). I’ve so far been unable to run the native model on the validation set due to the confusion with the model outputs

I’m using a custom visdrone dataset with two classes. With the model script defined to include NMS, the quantized model will have an expected shape of (2, 5, 100) (2 classes, xyxy+conf, 100 detections)

Yet, when I try check my validation results on using the simple parsed model (no optimization), I’ll get this output shape, (80,80,64), which is very similar to the 1x64x80x80 I have in the single /model.23/cv2.0/cv2.0.2/Conv node (which happens to be the first end node listed in the end_node_names

runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_onnx_model(
    onnx_path,
    onnx_model_name,
    start_node_names=["/model.0/conv/Conv"],
    end_node_names=["/model.23/cv2.0/cv2.0.2/Conv", 
                   "/model.23/cv3.0/cv3.0.2/Conv",
                   "/model.23/cv2.1/cv2.1.2/Conv",
                   "/model.23/cv3.1/cv3.1.2/Conv",
                   "/model.23/cv2.2/cv2.2.2/Conv",
                   "/model.23/cv3.2/cv3.2.2/Conv"],
    net_input_shapes={"/model.0/conv/Conv": [1, 3, 640, 640]},
)

How can I process/interpret these outputs so I can compare my .onnx model with the parsed .har model?

Hey @natsayin_nahin,

Your translate_onnx_model() call extracts raw intermediate feature maps from the YOLO detection heads. These tensors don’t go through decoding or post-processing (such as confidence thresholding and NMS). This is expected behavior.

The quantized .har model, however, executes the full inference pipeline, including post-processing steps defined in your model script, producing outputs shaped like (classes, 5, num_detections).

To evaluate the parsed .npz output correctly, you need to apply the post-processing logic externally:

  1. Manually decode YOLO feature maps using Hailo Model Zoo utilities (or implement equivalent logic) to convert raw feature maps into bounding boxes. Each feature map (e.g., (80, 80, 64)) needs to be reshaped, have sigmoid activations applied, and then be decoded using anchor sizes, scaled to image dimensions, and processed with NMS.

  2. Match model script and output behavior.

Once decoded, format both sets of predictions to match the COCO/VisDrone style: (x1, y1, x2, y2, class_id, confidence) and compute AP@50 across the same validation split using your evaluation script.

This will give you an accurate comparison between your ONNX pipeline and the parsed or compiled models on Hailo.