## Context
I’m working with a YOLOv12 detector model (`detector_v10_m5.hef`) compiled for Hailo. The model has 3 outputs:
- `detector_v10_m5/conv11`: shape `(1, 80, 80, 256)`, dtype `uint8`
- `detector_v10_m5/conv13`: shape `(1, 40, 40, 128)`, dtype `uint8`
- `detector_v10_m5/format_conversion2`: shape `(1, 1, 384, 384)`, dtype `uint8`
## Goal
Parse these outputs to extract bounding boxes `[x1, y1, x2, y2]` and confidence scores for object detection.
## What I’ve Tried
1. **Grid-based parsing** of `conv11`/`conv13`: Treating them as grid feature maps with 5 channels per anchor (x, y, w, h, obj), but the channel counts (256, 128) don’t divide evenly by 5, and I’m unsure of the correct decoding formula for YOLOv12.
2. **format_conversion2 as decoded output**: Tried reshaping `(1, 1, 384, 384)` = 147,456 values into `(N, 5)` or `(N, 6)` format, but this produces 24,000+ detections which seems wrong.
## Questions
1. **What is the correct output format for YOLOv12 on Hailo?** Are `conv11`/`conv13` raw grid outputs that need manual decoding, or is there a decoded output I should use?
2. **What does `format_conversion2` contain?** Is it decoded detections, intermediate features, or something else?
3. **What is the correct decoding formula?** For anchor-free YOLOv12, how should I convert the grid cell outputs to bounding box coordinates? Is there documentation or example code for YOLOv12 post-processing?
4. **Channel organization**: With 256 channels on an 80x80 grid and 128 channels on a 40x40 grid, how are these organized? Is it multiple anchors, or a different structure? I am assuming it is anchor free, but?
## Model Details
- Model: YOLOv12 (detector_v10_m5)
- Input: 640x640 RGB
- Outputs: See above
- Single-class detection (fish)
Any guidance, documentation links, or example code would be greatly appreciated!