How to Parse YOLOv12 Detector Outputs on Hailo?

## Context

I’m working with a YOLOv12 detector model (`detector_v10_m5.hef`) compiled for Hailo. The model has 3 outputs:

- `detector_v10_m5/conv11`: shape `(1, 80, 80, 256)`, dtype `uint8`

- `detector_v10_m5/conv13`: shape `(1, 40, 40, 128)`, dtype `uint8`

- `detector_v10_m5/format_conversion2`: shape `(1, 1, 384, 384)`, dtype `uint8`

## Goal

Parse these outputs to extract bounding boxes `[x1, y1, x2, y2]` and confidence scores for object detection.

## What I’ve Tried

1. **Grid-based parsing** of `conv11`/`conv13`: Treating them as grid feature maps with 5 channels per anchor (x, y, w, h, obj), but the channel counts (256, 128) don’t divide evenly by 5, and I’m unsure of the correct decoding formula for YOLOv12.

2. **format_conversion2 as decoded output**: Tried reshaping `(1, 1, 384, 384)` = 147,456 values into `(N, 5)` or `(N, 6)` format, but this produces 24,000+ detections which seems wrong.

## Questions

1. **What is the correct output format for YOLOv12 on Hailo?** Are `conv11`/`conv13` raw grid outputs that need manual decoding, or is there a decoded output I should use?

2. **What does `format_conversion2` contain?** Is it decoded detections, intermediate features, or something else?

3. **What is the correct decoding formula?** For anchor-free YOLOv12, how should I convert the grid cell outputs to bounding box coordinates? Is there documentation or example code for YOLOv12 post-processing?

4. **Channel organization**: With 256 channels on an 80x80 grid and 128 channels on a 40x40 grid, how are these organized? Is it multiple anchors, or a different structure? I am assuming it is anchor free, but?

## Model Details

- Model: YOLOv12 (detector_v10_m5)

- Input: 640x640 RGB

- Outputs: See above

- Single-class detection (fish)

Any guidance, documentation links, or example code would be greatly appreciated!

1 Like

I’m facing the same issue:

ValueError: Dimension 1 in both shapes must be equal, but are 40 and 80. Shapes are [8,40,40,64] and [8,80,80,128].

 From merging shape 0 with other shapes. for '{{node Postprocessor/transpose/a}} = Pack[N=2, T=DT_FLOAT, axis=0](endnodes, endnodes_1)' with input shapes: [8,40,40,64], [8,80,80,128].

How can I update my network yaml to solve it?