How to interpret the YOLO outputs?

natsayin_nahin · March 6, 2025, 11:47pm

I’m following the hailo tutorial notebook DFC_2_Model_Optimization_Tutorial - this ran fine out of the box, but I’ve tried to adapt this for YOLOv11n. The output has a shape [80, 5, 100], based on the documentation, this has to be 80classes, 5 data points, 100 detections per class (based on the nms yaml).

The important bits of code are all added below, if you want to see the notebook, it’s uploaded to colab, of course a lot of this won’t run in colab as is - since I’m using this locally in a docker container. Colab was just an easy way to share the notebook

If I look into the output for a specific class, and ignore the many rows of zeros, I’ll get something like this for that one class:

array([[0.39044172, 0.01051593, 0.61553806, 0.24089272, 0.91469216],
       [0.48778102, 0.8696294 , 0.6793776 , 0.9998992 , 0.5478364 ]],
      dtype=float32)

I’m unsure if I’m just getting garbage predictions or I’m misunderstanding the output format. Is this xywhn + confidence, or something else? For comparision, running model.predict() in ultralytics (with yolo11n.pt) will yield these results for this same image:

model = YOLO('yolo11n.pt')
results = model.predict(sample_dataset[0,:,:,:], imgsz=640, conf=0.2)
# Process results list
for result in results:
    boxes = result.boxes  # Boxes object for bounding box outputs
    masks = result.masks  # Masks object for segmentation masks outputs
    keypoints = result.keypoints  # Keypoints object for pose outputs
    probs = result.probs  # Probs object for classification outputs
    obb = result.obb  # Oriented boxes object for OBB outputs

boxes[boxes.cls==62]

that will yield this output:

ultralytics.engine.results.Boxes object with attributes:
cls: tensor([62., 62.], device='cuda:0')
conf: tensor([0.9115, 0.2777], device='cuda:0')
data: tensor([[6.1889e+00, 2.4986e+02, 1.5438e+02, 3.9446e+02, 9.1155e-01, 6.2000e+01],
        [5.5910e+02, 3.1260e+02, 6.4000e+02, 4.2976e+02, 2.7771e-01, 6.2000e+01]], device='cuda:0')
id: None
is_track: False
orig_shape: (640, 640)
shape: torch.Size([2, 6])
xywh: tensor([[ 80.2820, 322.1584, 148.1863, 144.6057],
        [599.5480, 371.1809,  80.9040, 117.1644]], device='cuda:0')
xywhn: tensor([[0.1254, 0.5034, 0.2315, 0.2259],
        [0.9368, 0.5800, 0.1264, 0.1831]], device='cuda:0')
xyxy: tensor([[  6.1889, 249.8556, 154.3751, 394.4613],
        [559.0960, 312.5987, 640.0000, 429.7631]], device='cuda:0')
xyxyn: tensor([[0.0097, 0.3904, 0.2412, 0.6163],
        [0.8736, 0.4884, 1.0000, 0.6715]], device='cuda:0')

The code I’ve used is as follows, the notebook is here

import torchvision as tv
import torch

def preproc(image, output_height=640, output_width=640):
    preprocess = tv.transforms.Compose([
        tv.transforms.Resize((output_height, output_width)),
    ])
    
    data = np.array(preprocess(image))
    
    return data

data_batch_size = 1500
images_path = "../data/coco/images/val2017" 
images_list = [img_name for img_name in os.listdir(images_path) if os.path.splitext(img_name)[1] == ".jpg"]
calib_dataset = np.zeros((data_batch_size, 640, 640, 3))
for idx, img_name in enumerate(sorted(images_list)):
    if idx==data_batch_size:
        break
    img = Image.open(os.path.join(images_path, img_name)).convert('RGB')
    img_preproc = preproc(img)
    calib_dataset[idx, :, :, :] = img_preproc

np.save("calib_set.npy", calib_dataset)

The above was just to setup the calibration dataset, I’ve used COCO2017 for this via manual download (using the ultralytics yaml) and only downloading the val2017 set.

# Second, we will load our parsed HAR from the Parsing Tutorial
model_name = "yolo11n"
hailo_model_har_name = f"{model_name}_hailo_model.har"
assert os.path.isfile(hailo_model_har_name), "Please provide valid path for HAR file"
runner = ClientRunner(har=hailo_model_har_name)
# By default it uses the hw_arch that is saved on the HAR. For overriding, use the hw_arch flag.

# Now we will create a model script, that tells the compiler to add a normalization on the beginning
# of the model (that is why we didn't normalize the calibration set;
# Otherwise we would have to normalize it before using it)

# this was taken from the hailo github alls script for yolov11n
alls =  """
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv54, sigmoid)
change_output_activation(conv65, sigmoid)
change_output_activation(conv80, sigmoid)
nms_postprocess("./yolo11n_nms_config.json", meta_arch=yolov8, engine=cpu)
allocator_param(width_splitter_defuse=disabled)
 """

# Load the model script to ClientRunner so it will be considered on optimization
runner.load_model_script(alls)

# Call Optimize to perform the optimization process
runner.optimize(calib_dataset)

# Save the result state to a Quantized HAR file
quantized_model_har_path = f"{model_name}_quantized_model.har"
runner.save_har(quantized_model_har_path)

I then take this an run a basic inference to check the output:

sample_dataset = np.zeros((2, 640, 640, 3))
SAMPLE_IMAGE_PATH = "../data/coco/images/val2017/000000000139.jpg"
img = Image.open(SAMPLE_IMAGE_PATH).convert('RGB')
img_preproc = preproc(img)
img_preproc = torch.reshape(img_preproc, (640,640,3))
sample_dataset[0,:,:,:] = img_preproc.numpy()

# #Notice that we use the original images, because normalization is IN the model
with runner.infer_context(InferenceContext.SDK_FP_OPTIMIZED) as ctx:
    modified_res = runner.infer(ctx, sample_dataset[:1, :, :, :])

if I take a closer look into the class_id == 62, the televsion class, there are two detections:

tv_dets = modified_res[:, 62, :, :].reshape(5,100) # this is classid=62, which is a television
tv_dets.transpose()[:2, :]

"""
this outputs:
array([[0.39044172, 0.01051593, 0.61553806, 0.24089272, 0.91469216],
       [0.48778102, 0.8696294 , 0.6793776 , 0.9998992 , 0.5478364 ]],
      dtype=float32)

compared to the ground truth for this class (62) in the same image:
62 0.127641 0.505153 0.233312 0.2227
62 0.934195 0.583462 0.127109 0.184812
"""

I can’t seem to find any info regarding the output in the postprocessing docs

Any suggestions or comments on something I’ve missed?

natsayin_nahin · March 9, 2025, 3:55am

Sine there is no edit functionality here, I’d like to add that if I take the quantized model .har file and feed it into the following:
!hailomz eval yolov11n --har /local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_tutorials/notebooks/yolo11n_quantized_model.har

I’ll get expected results. My only hope here is that there isn’t something in the background that overrides my .har file and uses a default .har for yolo11n. But if my quantised har was actually used, it confirms the output makes sense, I just need additional formatting?

Evaluate annotation type *bbox*
DONE (t=17.04s).
Accumulating evaluation results...
DONE (t=3.40s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.390
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.547
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.424
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.207
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.427
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.571
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.320
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.525
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.566
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.332
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.630
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.770
<Hailo Model Zoo INFO> Done 5000 images AP=39.024 AP50=54.679

shashi · March 9, 2025, 4:10am

Hi @natsayin_nahin

Here is how you interpret output. Please let me know if you have questions:

This is your output from hailo after tv_dets.transpose()[:2, :].

 x = np.array([[0.39044172, 0.01051593, 0.61553806, 0.24089272],
...        [0.48778102, 0.8696294 , 0.6793776 , 0.9998992 ]])

Multiply by height and width of the input.

 x * 640
array([[249.8827008,   6.7301952, 393.9443584, 154.1713408],
       [312.1798528, 556.562816 , 434.801664 , 639.935488 ]])

xy pairs are switched (the below is your pytorch results).

xyxy: tensor([[  6.1889, 249.8556, 154.3751, 394.4613],
        [559.0960, 312.5987, 640.0000, 429.7631]], device='cuda:0')

Even after this, you still need to convert back to original image size because originally you had 640x426 but it was resized to 640x640 for the model input. You can then match it with ground truth which is with respect to the original image.

natsayin_nahin · April 15, 2025, 10:41pm

Hi @shashi - thanks again for the response, it helped a lot and this is working well for yolo11 now.

Infact I made some improvements following your user guide two on resizing/letterboxing - thanks for putting in the effort to create these guides.

One question on this topic, does your answer above hold true for all YOLO models? Or only those based on YOLOv8?

The reason I ask is because I used the same code processing code to format the output for YOLOv5, and the values don’t make sense when plotted:

example_det = batch_dets[13]
print(f"batch detections shape: {batch_dets.shape}\n")
print(f"image detections shape: {example_det.shape}\n")
print(f"sample image transposed shape: {example_det[0].T.shape}\n")
print(f"sample image transpose, first row:\n{example_det[0].T[0]}\n")

[Output]:
batch detections shape: (532, 2, 5, 80)
image detections shape: (2, 5, 80)
sample image transposed shape: (80, 5)
sample image transpose, first row:
[0.38911262 0.60026085 0.5215737  0.7272882  0.7160785 ]

Looks normal, yet when multiplied by 640, results in this:

Swapping the columns (or rather, since I was doing that for yolov11, not swapping the columns) also doesn’t give sensical results.

The only difference between the setup was the .alls script and the nms_config - not sure if I should be looking at fixing something in the quantization, or in the processing.

shashi · April 15, 2025, 11:24pm

Hi @natsayin_nahin
The code should work for all models for which hailo provides nms postprocessing integration. Is the yolov5 you used above from the hailo model zoo? or did you compile it by yourself?

natsayin_nahin · April 16, 2025, 10:54am

@shashi - you are correct, I went back an ran on the coco pretrained yolov5s model through the same pipeline I have for the visdrone yolo11 and visdrone yolo5, and it has valid detections. Which is somewhat bad news because it means there is actually catastrophic degradation occuring on parse/optimization.

Anyway, it is not a processing issue, but an optimisation one, so I’ll open a new thread for that

Topic		Replies	Views
How to interpret raw output General hailo-api , network	3	136	November 13, 2024
Cannot get correct out put for yolov5m from hailo_infer() API General	5	178	August 9, 2024
Not Sure If I Properly Converted Yolo 8 Model General dfc , hailo8	8	815	August 1, 2024
How to interpret and process the yolo SKD_NATIVE model output? General dfc	3	40	August 19, 2025
Yolov8 output boxes not making any sense General raspberry-pi , yolov8	4	258	November 6, 2024

How to interpret the YOLO outputs?

Related topics