Can't reproduce evaluation results for detection model

Hello,

I have a tool to calculate mAP and I’m trying to reproduce the mAP results seen with the eval option :

(.venv_hailo) p20c02@p20c02:~/rendu/hef_storage$ hailomz eval --hef yolov8m.hef --target hardware yolov8m
<Hailo Model Zoo INFO> Start run for network yolov8m ...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.491
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.659
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.536
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.541
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.656
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.373
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.610
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.646
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.442
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.706
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.802
<Hailo Model Zoo INFO> Done 5000 images AP=49.105 AP50=65.896

To do so, I compiled a yolov8m.hef using :

hailomz compile --hw-arch hailo8 yolov8m

With the default yolov8m.alls :

normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
model_optimization_config(calibration, batch_size=2)
change_output_activation(conv58, sigmoid)
change_output_activation(conv71, sigmoid)
change_output_activation(conv83, sigmoid)
post_quantization_optimization(finetune, policy=enabled, learning_rate=0.000025)
nms_postprocess("../../postprocess_config/yolov8m_nms_config.json", meta_arch=yolov8, engine=cpu)

And I let everything in the .alls as is except the yolov8m_nms_config.json where I put the score to 0.001 because I want to calculate the Average/Precision myself and compare it to the eval results :

{
	"nms_scores_th": 0.001,
	"nms_iou_th": 0.7,
	"image_dims": [
		640,
		640
	],
	"max_proposals_per_class": 100,
	"classes": 80,
	"regression_length": 16,
	"background_removal": false,
	"background_removal_index": 0,
	"bbox_decoders": [
		{
			"name": "bbox_decoder57",
			"stride": 8,
			"reg_layer": "conv57",
			"cls_layer": "conv58"
		},
		{
			"name": "bbox_decoder70",
			"stride": 16,
			"reg_layer": "conv70",
			"cls_layer": "conv71"
		},
		{
			"name": "bbox_decoder82",
			"stride": 32,
			"reg_layer": "conv82",
			"cls_layer": "conv83"
		}
	]
}

However, while infering using the hailo code examples for object_detection - immediately getting the len of detected object per classes - I noticed that I was getting a lower number of detection (100 detections with nms_scores_th=0.001 and nms_iou_th=0.7).
Testing with the pytorch weights directly, for the same scores and iou threshold, I’m getting 65 more detections (165 detections post nms).
I tested on other images, and I’m getting around half the detections on the hailo compiled .hef VS .pt file or .onnx.

Do you know if there are other parameters I’m suppose to play with ? Maybe the score/iou threshold is also set somewhere else, that would explain why half the detections are missing ?

Here are the versions I’m working on :

(.venv_hailo) p20c02@p20c02:~/rendu/hailo_model_zoo$ pip list | grep hailo
hailo-dataflow-compiler      3.30.0
hailo-model-zoo              2.14.0               /home/p20c02/rendu/hailo_model_zoo
hailo-tappas-dot-visualizer  3.31.0               /home/p20c02/rendu/tappas/tools/trace_analyzer/dot_visualizer
hailo-tappas-run-apps        3.31.0               /home/p20c02/rendu/tappas/tools/run_app
hailort                      4.20.0

Thank you for your help ! :slight_smile:

Cordially,

1 Like

Hello,

Any idea about this matter ?
I tried multiples yolo models and always have the same lowerish number of detection compared to the raw .pt or even .onnx. Half are missing in most cases, resulting in 5-10pts mAP drop when I calculate it VS hailomz eval.

I played with the values of scores and iou th, basically nullifying them and keeping only things like bbox decoders, but that didn’t changed much. Even if the values are higher with nms_iou_th=1 and score_threshold=0.001, the ratio of missing detection is still the same (1/2).

Is there any other place/file I’m supposed to look at in order to reproduce the results seen with the mz eval ?

Cordially,

1 Like