Can't reproduce evaluation results for detection model

breadcrumbs · March 18, 2025, 11:26pm

Hello,

I have a tool to calculate mAP and I’m trying to reproduce the mAP results seen with the eval option :

(.venv_hailo) p20c02@p20c02:~/rendu/hef_storage$ hailomz eval --hef yolov8m.hef --target hardware yolov8m
<Hailo Model Zoo INFO> Start run for network yolov8m ...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.491
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.659
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.536
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.541
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.656
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.373
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.610
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.646
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.442
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.706
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.802
<Hailo Model Zoo INFO> Done 5000 images AP=49.105 AP50=65.896

To do so, I compiled a yolov8m.hef using :

hailomz compile --hw-arch hailo8 yolov8m

With the default yolov8m.alls :

normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
model_optimization_config(calibration, batch_size=2)
change_output_activation(conv58, sigmoid)
change_output_activation(conv71, sigmoid)
change_output_activation(conv83, sigmoid)
post_quantization_optimization(finetune, policy=enabled, learning_rate=0.000025)
nms_postprocess("../../postprocess_config/yolov8m_nms_config.json", meta_arch=yolov8, engine=cpu)

And I let everything in the .alls as is except the yolov8m_nms_config.json where I put the score to 0.001 because I want to calculate the Average/Precision myself and compare it to the eval results :

{
	"nms_scores_th": 0.001,
	"nms_iou_th": 0.7,
	"image_dims": [
		640,
		640
	],
	"max_proposals_per_class": 100,
	"classes": 80,
	"regression_length": 16,
	"background_removal": false,
	"background_removal_index": 0,
	"bbox_decoders": [
		{
			"name": "bbox_decoder57",
			"stride": 8,
			"reg_layer": "conv57",
			"cls_layer": "conv58"
		},
		{
			"name": "bbox_decoder70",
			"stride": 16,
			"reg_layer": "conv70",
			"cls_layer": "conv71"
		},
		{
			"name": "bbox_decoder82",
			"stride": 32,
			"reg_layer": "conv82",
			"cls_layer": "conv83"
		}
	]
}

However, while infering using the hailo code examples for object_detection - immediately getting the len of detected object per classes - I noticed that I was getting a lower number of detection (100 detections with nms_scores_th=0.001 and nms_iou_th=0.7).
Testing with the pytorch weights directly, for the same scores and iou threshold, I’m getting 65 more detections (165 detections post nms).
I tested on other images, and I’m getting around half the detections on the hailo compiled .hef VS .pt file or .onnx.

Do you know if there are other parameters I’m suppose to play with ? Maybe the score/iou threshold is also set somewhere else, that would explain why half the detections are missing ?

Here are the versions I’m working on :

(.venv_hailo) p20c02@p20c02:~/rendu/hailo_model_zoo$ pip list | grep hailo
hailo-dataflow-compiler      3.30.0
hailo-model-zoo              2.14.0               /home/p20c02/rendu/hailo_model_zoo
hailo-tappas-dot-visualizer  3.31.0               /home/p20c02/rendu/tappas/tools/trace_analyzer/dot_visualizer
hailo-tappas-run-apps        3.31.0               /home/p20c02/rendu/tappas/tools/run_app
hailort                      4.20.0

Thank you for your help !

Cordially,

breadcrumbs · March 24, 2025, 3:47pm

Hello,

Any idea about this matter ?
I tried multiples yolo models and always have the same lowerish number of detection compared to the raw .pt or even .onnx. Half are missing in most cases, resulting in 5-10pts mAP drop when I calculate it VS hailomz eval.

I played with the values of scores and iou th, basically nullifying them and keeping only things like bbox decoders, but that didn’t changed much. Even if the values are higher with nms_iou_th=1 and score_threshold=0.001, the ratio of missing detection is still the same (1/2).

Is there any other place/file I’m supposed to look at in order to reproduce the results seen with the mz eval ?

Cordially,

lukasczk · May 11, 2025, 3:12pm

I have been able to get almost identical results after a bit of fiddling about.

conf_threshold = 0.001
iou_threshold = 0.7

What, for me, made the biggest difference, was the fact that I initially forgot to convert the image from BGR to RGB. The rest was just handling resizing a little better.

This is my code for resizing, shape being the input shape of the model.

image_height, image_width = image.shape[:2]
        scale = min(shape.height / image_height, shape.width / image_width)
        new_unpad = (int(round(image_width * scale)), int(round(image_height * scale)))
        resized_image = cv2.resize(image, new_unpad, interpolation=cv2.INTER_LINEAR)

        horizontal_padding, vertical_padding = (
            shape.width - new_unpad[0],
            shape.height - new_unpad[1],
        )

        left = int(round(horizontal_padding / 2 - 0.1))
        top = int(round(vertical_padding / 2 - 0.1))
        right = int(round(horizontal_padding / 2 + 0.1))
        bottom = int(round(vertical_padding / 2 + 0.1))

        padded_image = cv2.copyMakeBorder(
            src=resized_image,
            top=top,
            bottom=bottom,
            left=left,
            right=right,
            borderType=cv2.BORDER_CONSTANT,
            value=(114, 114, 114),
        )

Then, to rescale the boxes back up after inferrence:

def scale_boxes(
        boxes: TBoxes,
        original_image: NamedImage,
        reshaped_image: ReshapedImage,
        format: Literal["xyxy", "yxyx"],
        inferenced_shape: Shape | None = None,
    ) -> TBoxes:
        inf_shape = None
        if inferenced_shape is not None:
            inf_shape = [
                inferenced_shape.width,
                inferenced_shape.height,
                inferenced_shape.width,
                inferenced_shape.height,
            ]

        padding = [
            reshaped_image.horizontal_padding,
            reshaped_image.vertical_padding,
            reshaped_image.horizontal_padding,
            reshaped_image.vertical_padding,
        ]

        scale = [
            reshaped_image.horizontal_scale,
            reshaped_image.vertical_scale,
            reshaped_image.horizontal_scale,
            reshaped_image.vertical_scale,
        ]

        if format == "yxyx":
            padding.reverse()
            scale.reverse()
            if inf_shape is not None:
                inf_shape.reverse()

        for box in boxes:
            if inf_shape is not None:
                box *= inf_shape
            box -= padding
            box /= scale

            # clip to image bounds
            box[0] = max(box[0], 0)
            box[1] = max(box[1], 0)
            if format == "xyxy":
                box[2] = min(box[2], original_image.image.shape[1])
                box[3] = min(box[3], original_image.image.shape[0])
            else:
                box[2] = min(box[2], original_image.image.shape[0])
                box[3] = min(box[3], original_image.image.shape[1])

        return boxes

Yolo8 for Hailo uses the “yxyx” format.

Also, since the postprocessing step is done on the CPU in case of Yolo8, you can adjust the conf and iou thresholds during runtime (not just during module compilation).

    def infer_model_callback(self, model: InferModel):
        output = model.output_names[0]
        model.output(output).set_nms_iou_threshold(self.iou_threshold)
        model.output(output).set_nms_score_threshold(self.conf_threshold)

Topic		Replies	Views
Model seems to work but eval metrics all zero General hailo8 , yolov8	2	99	March 16, 2025
Model degradation - How to quantify General	2	61	February 4, 2025
Issues detecting all objects with Hailo 8L, RPi5, yolov8n_seg, HailoRT 4.20 General	4	101	March 6, 2025
Hailo Model Zoo YOLOv8 retraining environment General dfc , hailo8	10	358	July 3, 2025
Hailo Model Zoo - Failing to compile a DamoYolo General hailo8 , error , modelzoo	7	280	February 28, 2025

Can't reproduce evaluation results for detection model

Related topics