YOLO recall degradation after compiling to hef

I trained one class (birds) model in ultralytics YOLO and its recall = 0.92 (for confidence threshold 0.01)

after compiling to hef recall became 0.75, I verified on images that model sometimes doesn’t detect objects, especially small ones
Also I noticed that even for detected objects confidence scores are very different to YOLO pytorch model

Is it possible to configure compilation to make less aggressive optimizations?
Are there ways to debug on what stage degradation happened?

I haven’t made any changes in docker hailo_model_zoo config files
command hailomz compile --ckpt /local/shared_with_docker/birds/bird.onnx --calib-path /local/shared_with_docker/birds/birds_frames --yaml hailo_model_zoo/hailo_model_zoo/cfg/networks/yolov11m.yaml --classes 1 --hw-arch hailo8
full log - https://www.dropbox.com/scl/fi/1hkkyevqg6svmsaxc1aeb/birds.log?rlkey=ze7a4obetwxdqwau43feglt3h&st=km0bbhfh&dl=0

code for inference

params = VDevice.create_params()
params.scheduling_algorithm = HailoSchedulingAlgorithm.ROUND_ROBIN
with VDevice(params) as vdevice:
    infer_model = vdevice.create_infer_model('/home/bendyna-pi/birds/bird.hef')
    infer_model.set_batch_size(1)
    infer_model.input().set_format_type(FormatType.FLOAT32)
    infer_model.output().set_format_type(FormatType.FLOAT32)
    with infer_model.configure() as configured_infer_model:
      
	...
	img = img.astype(np.float32)
	bindings_list = []
	bindings = configured_infer_model.create_bindings()
	bindings.input().set_buffer(img)
	buffer2 = np.empty(infer_model.output().shape).astype(np.float32)
	bindings.output().set_buffer(buffer2)
	bindings_list.append(bindings)

	configured_infer_model.run(bindings_list, timeout_ms)
	buffer = bindings.output().get_buffer()
	det_utils = ObjectDetectionUtils('birds.txt')
	detections = det_utils.extract_detections(buffer, threshold=0.0001)
	boxes = detections['detection_boxes']
	conf = detections['detection_scores']

Hi @Ivan_Bendyna,

Welcome to the Hailo Community!

The degradation you are experiencing is probably due to the quantization.
I would recommend giving a look at the Debugging Accuracy section of the Hailo DataFlow Compiler User Guide (requires Developer Zone login).

Based on the log you shared, I can see a few points that you could check:

  • the YAML file of the yolov11m will use this model script for the conversion.
  • compression_level = 1 is applied, since you have more than 1024 images in teh caloibration set. This means that 25% of the parameters are set to 4 bits. Please try forcing compression_level to 0 using the following command in the model script:
    model_optimization_flavor(compression_level=0)
  • try setting the model’s outputs to 16 bits. THis may help with the accuracy
  • NMS postprocessing is being added to the model
    nms_postprocess("../../postprocess_config/yolov11m_nms_config.json", meta_arch=yolov8, engine=cpu)
    Please check if the nms_score_threshold and the iou_threshold specified in the JSON file are coherent with the ones you used testing the model in PyTorch, or adjust them accordingly
  • was the model trained in RGB or BGR? Make sure the preprocessing is correct (Pillow reads images in RGB format by default, OpenCV uses BGR)
  • you can use the Layer Analysis Tool to check the activations distribution layer-by-layer and understand which operator is more challenging to quantize

Thank you for the detailed answer.

The simplest option worked. I changed BGR->RGB before sending to Hailo and it improved recall from 0.75 to 0.89, very close to original pytorch YOLO model.

I am not 100% sure where is the problem as I use cv.imread() for both Pytorch, Hailo.
Looks like YOLO internally transfers BGR to RGB and we need to do the same manually.

1 Like

Hi @Ivan_Bendyna
Yes, YOLO internally converts BGR to RGB in its preprocessor function: ultralytics/ultralytics/engine/predictor.py at main · ultralytics/ultralytics