Impact of Parsing YOLO Models: Does Omitting Final Layers Affect mAP?

Hi everyone,

I have a question regarding the parsing step for YOLO models—specifically YOLOv5m and YOLOv9m. I read Parsing YOLO models with the Hailo Dataflow Compiler tool on converting an ONNX YOLO model to a HEF model using Hailo’s Dataflow Compiler tool.

My main concern is whether omitting the final layers of the architecture (such as Gather, Shape, Unsqueeze, Concat, etc.)—and stopping at the convolutional blocks—affects the mAP performance. For example, with YOLOv5, the parsed model only processes up to the convolutional blocks at scales of 80×80, 40×40, and 20×20 (see attached image).

In evaluation, the ONNX model achieves a 0.648 mAP, but after parsing (and before quantization), the mAP drops to 0.611—a reduction of about 3%.

I have ensured that:

  • The same validation dataset and preprocessing (normalization and resizing to 640×640) are used.
  • The same NMS filter values are applied. (conf_thres = 0.25, iou_thres=0.45 and scores_thres=0.2 )

Thus, the only difference is that the parsed model uses Hailo’s NMS filter instead of the final architecture layers. (Correct me if there is something else I forgot or wrong)

My questions are:

  • Is the removal of the final architectural layers the likely cause of the mAP degradation?
  • What other factors might contribute to this drop in performance?
  • Are there any strategies or solutions to minimize this degradation?

Thanks in advance for your insights and suggestions!

Hey @daffer.queque ,

Yes, the 3% mAP drop (0.648 to 0.611) after parsing is expected when final YOLO layers are omitted. This occurs because:

  1. Final layers removal (Gather, Shape, Unsqueeze, Concat, NMS) affects bounding box accuracy
  2. Post-processing differences - Hailo moves NMS to host CPU
  3. BatchNorm folding causes slight activation distribution shifts
  4. Potential quantization effects in later stages

Mitigation Strategies

  1. Preserve End-Nodes:

    hailomz compile --ckpt yolov5m.onnx --start-node-names Conv_307 Conv_286 Conv_265 --end-node-names Yolo_Output
    
  2. Fine-Tune After Parsing:

    post_quantization_optimization(finetune, policy=enabled, learning_rate=0.0001, epochs=8, dataset_size=4000)
    
  3. Use Higher Precision:

    quantization_param(conv1, precision_mode=a16_w16)
    
  4. Adjust NMS Parameters:

    nms_postprocess(iou_threshold=0.45, score_threshold=0.25, engine=cpu)
    

With proper configuration, you can reduce the mAP drop from 3% to under 1%.
For more info please checkout our DFC documentation in pages 67-77 , we cover it extensively there .