Getting very low accuracies when running inference on yolov8n

I was trying to run inference on Hailo 8 using a precompiled yolov8n.hef model using the coco 2017 Validation data which contains 5000 images.
I generated a predictions.json file (Actual predictions) which i used along with instances_val2017.json (Ground Truth) using the pycocotools to measure the accuracy.

But the accuracies i got were all 0’s

=== Evaluation with IoU=0.50 threshold ===
Running per image evaluation…
Evaluate annotation type bbox
DONE (t=4.89s).
Accumulating evaluation results…
DONE (t=0.79s).
Average Precision (AP) @[ IoU=0.50:0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.50 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.50 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.50 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.50 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.50 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.50 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.50 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.50 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.50 | area= large | maxDets=100 ] = 0.000

=== Full Evaluation ===
Running per image evaluation…
Evaluate annotation type bbox
DONE (t=5.20s).
Accumulating evaluation results…
DONE (t=1.03s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

The yolov8n.hef precompiled model has Input yolov8n/input_layer1 UINT8, NHWC(640x640x3)

So I was actually following this tutorial Hailo Application Object Detection.
So what i did was i tool the coco 2017 val dataset then the precompiled yolov8n.hef file and i modified the given tutorial codes a bit so that it saves me a predictions.json file while creating a map of filename → imageId using the instances_val2017.json file which i got from the annotations in the coco dataset.

So after generating the predictions.json file

This is how some of the entries of the generated predictions.json file looks like

[{“image_id”: 139, “category_id”: 41, “bbox”: [241.0, 332.0, 170.0, 115.0], “score”: 0.3782227635383606}, {“image_id”: 139, “category_id”: 15, “bbox”: [0.0, 1.0, 596.0, 366.0], “score”: 0.31223785877227783}, {“image_id”: 285, “category_id”: 8, “bbox”: [1.0, 32.0, 380.0, 515.0], “score”: 0.7025351524353027}, {“image_id”: 632, “category_id”: 15, “bbox”: [224.0, 26.0, 295.0, 610.0], “score”: 0.5166792869567871}, {“image_id”: 632, “category_id”: 15, “bbox”: [97.0, 5.0, 421.0, 516.0], “score”: 0.46463966369628906},

I used the pycocotools to calculate me the accuracy, but it came out to be all 0’s.

Can someone suggest me what am i doing wrong.
How can i fix this issue.

Or if anyone can suggest me any better way to measure the accuracies by using the coco dataset.

Thankyou in advance!

Hi @Jubesh_Joseph

I see a few potential issues here:

  1. Not sure how you are getting category_idbut an adjustment is needed to go from 80 classes predicted by model to 91 classes in the original annotations

  2. Your bbox coordinates should be normalized (if your original annotations are normalized) and should be with respect to original image. Since bboxes returned by inference are with respect to model size (640x640x3), you need to scale bboxes to original image size by undoing the effects of resizing and letterboxing.

You can try suggestion in this thread: Yolo evaluation - General - Hailo Community and see if they can help.

Alternatively, if you use our DeGirum PySDK, we can give you a script that does the evaluation for you. FYI, DeGirum PySDK is a python package designed by DeGirum (a SW partner of Hailo) to simplify working with Hailo devices: Simplifying Edge AI Development with DeGirum PySDK and Hailo

Thanks @shashi i’ll check and try to resolve the issue.