Request: YOLOv8m-Seg HEF with Integrated NMS/Post-Processing for Python Pipeline

Creoconcept · April 1, 2026, 6:58am

Hi everyone,

I am currently working on an automated background removal pipeline using a Hailo-8 accelerator. My current Python script is designed to harvest a single NMS tensor and a Prototype mask tensor to perform the final mask assembly via Numpy.

The Issue: The standard yolov8m_seg.hef currently available in the Hailo Model Zoo appears to be a “raw” variant. When running diagnostics on the output vStreams, I am seeing 10+ raw convolutional layers (e.g., conv79, conv94, conv60, etc.) rather than a consolidated NMS/Metadata output.

Parsing these raw strides and anchors in Python is creating a significant CPU bottleneck and adding unnecessary complexity to the pipeline.

The Request: Does anyone in the community have (or can point me toward) a pre-compiled YOLOv8m-Seg HEF (640x640) that has the NMS and Post-Processing layers included in the silicon?

Requirements:

Model: YOLOv8m-Seg (Instance Segmentation)
Input: 640x640x3
Post-processing: Integrated NMS (Outputting Bounding Boxes, Scores, and Mask Coefficients in a consolidated tensor).
Target: Hailo-8

If you have a DFC (Dataflow Compiler) command string or a .hef file that simplifies the output to (1, 100, 38) for detections and (1, 160, 160, 32) for prototypes, it would be a massive help.

Thanks in advance!

Michael · April 1, 2026, 8:08am

Hi @Creoconcept,

The CPU bottleneck you’re experiencing is likely coming from the manual NumPy decoding of the raw outputs rather than from the HEF format itself.

You might want to take a look at our instance segmentation example - it already handles all 10 raw output layers from YOLOv8m-Seg (bbox decode, NMS, mask assembly) and should work with your existing HEF without any recompilation: hailo-apps/hailo_apps/python/pipeline_apps/instance_segmentation at main · hailo-ai/hailo-apps · GitHub

If you find that Python postprocessing is still a throughput bottleneck, you could also try the C++ pipeline with ONNX Runtime hailo-apps/hailo_apps/cpp/onnxrt_hailo_pipeline at main · hailo-ai/hailo-apps · GitHub which offloads the decode step to an optimized ONNX model.

Please let me know if that works for you.

Thanks,