Retinaface Mobilenet V1 output

Hello everyone.

Based on one of your examples, I was able to run face detection (without GStreamer) with retinaface_mobilenet_v1, lightface_slim, scrfd_500m, scrfd_2.5g or scrfd_10g.

However I’m confused by the output.
For example retinaface_mobilenet_v1:

Architecture HEF was compiled for: HAILO8L
Network group name: retinaface_mobilenet_v1, Multi Context - Number of contexts: 3
    Network name: retinaface_mobilenet_v1/retinaface_mobilenet_v1
        VStream infos:
            Input  retinaface_mobilenet_v1/input_layer1 UINT8, NHWC(736x1280x3)
            Output retinaface_mobilenet_v1/conv41 UINT8, NHWC(92x160x8)
            Output retinaface_mobilenet_v1/conv42 UINT8, NHWC(92x160x4)
            Output retinaface_mobilenet_v1/conv43 UINT8, FCR(92x160x20)
            Output retinaface_mobilenet_v1/conv32 UINT8, NHWC(46x80x8)
            Output retinaface_mobilenet_v1/conv33 UINT8, NHWC(46x80x4)
            Output retinaface_mobilenet_v1/conv34 UINT8, FCR(46x80x20)
            Output retinaface_mobilenet_v1/conv23 UINT8, NHWC(23x40x8)
            Output retinaface_mobilenet_v1/conv24 UINT8, NHWC(23x40x4)
            Output retinaface_mobilenet_v1/conv25 UINT8, FCR(23x40x20)

I guess “retinaface_mobilenet_v1/conv25” is the final output?
What is this shape 23, 40, 20?
It’s not BBoxes or anything I’ve seen before.

Also the numbers do not make sense to me. Input shape is 738, 1280, 3 and one output examples looks like this:

[[124 130 128 ... 130 126 132]
  [124 127 126 ... 124 123 125]
  [125 127 127 ... 127 124 128]
  ...
  [109 129 117 ... 143 115 143]
  [112 127 119 ... 133 117 135]
  [118 122 122 ... 128 123 128]]

My face was in the middle of the image, so my guess is those numbers are not y,x positions?

Thank you for any help!

So after some more digging:

            lNetworkGroups = self._hailoVDevice.configure(hailoHEF, dHailoCfgParams)

            self._hailoNetGrp = lNetworkGroups[0]  # type: pkHailoPlPY.ConfiguredNetwork
            self._hailoNetParams = self._hailoNetGrp.create_params()

            # lInfos:
            # [0]: 'direction', 'format', 'name', 'network_name', 'nms_shape', 'quant_info', 'shape'
            # [0].shape: tuple
            # [0].format: 'equals' method, 'flags' FormatFlags, 'order' FormatOrder, 'type' FormatType
            # [0].quant_info: 'limvals_max' float, 'limvals_min' float, 'qp_scale' float, 'qp_zp' float
            # pkHailoPl.FormatType: 'AUTO', 'FLOAT32', 'UINT16', 'UINT8'

            sLastOutputName = self._hailoNetGrp.get_sorted_output_names()[-1]

Is my assumption correct that “get_sorted_output_names()[-1]” is the correct layer which should have the detection information?

So “correct” last layer for “scrfd_500m” looks like:

output: scrfd_500m/conv40 (20, 20, 20) FormatType.UINT8 ; 0.04532748833298683 119.0

If i do more guessing :slight_smile: i would say (20, 20, 20) means up to 20 faces can be detected?
If yes what’s inside (20, 20) ?

Hey @chrime,

Glad to hear you’ve got face detection running on multiple models! Let’s clarify the confusion around the output layers and shapes for the retinaface_mobilenet_v1 model.

  1. Output Layer Interpretation:
    The layer ‘retinaface_mobilenet_v1/conv25’ with shape (23, 40, 20) is an output feature map, not the final bounding boxes or keypoints. In face detection models like RetinaFace, outputs typically represent:

    • Location (bbox) predictions
    • Face detection confidence scores
    • Landmark predictions (eyes, nose, mouth, etc.)

    The (23, 40, 20) shape can be seen as a grid: 23x40 is a downscaled spatial map of your original input image, and 20 likely combines information for multiple anchor boxes and features.

  2. Post-Processing:
    These raw outputs need post-processing, including:

    • Decoding bounding box coordinates
    • Applying Non-Maximum Suppression (NMS) to filter overlapping detections
    • Interpreting landmark and confidence scores

    The numbers you see are raw feature map values, not direct x, y coordinates for bounding boxes. You’ll need to apply specific post-processing steps (usually found in model documentation or example code) to get final bounding boxes and landmarks.

  3. Input vs Output Shape:
    Your input shape (736, 1280, 3) is processed through multiple network layers, which downscale spatial dimensions and increase feature channel depth, resulting in outputs like (23, 40, 20). This downscaling is common in convolutional networks for efficiency and to capture larger features.

Let me know if you need help with post-processing or want more details on the specific outputs!

Best regards,
Omri

I found several classes for face detection post-processing in the GitHub repository hailo_model_zoo: hailo_model_zoo\hailo_model_zoo\core\postprocessing

F.e. hailo_model_zoo.core.postprocessing.face_detection.scrfd.SCRFDPostProc

The method “tf_postproc” looks like it should do all the post-process stuff?

Thank you for your help

So my current (simplified) flow is:

hef = hailo_platform.HEF('data/models/hailo8l/face_detection/scrfd_500m.hef')

vd_prms = hailo_platform.VDevice.create_params()
v_device = hailo_platform.VDevice(vd_prms)

cfg_prms = hailo_platform.ConfigureParams.create_from_hef(hef=hef, interface=hailo_platform.HailoStreamInterface.PCIe)

net_grps = v_device.configure(hef, cfg_prms)
net_grp = net_grps[0]
net_grp_prms = net_grp.create_params()

vstr_main_input = net_grp.get_input_vstream_infos()[0]

vstr_main_output_name = net_grp.get_sorted_output_names()[-1]

vstr_main_output = None
vtrs_outputs = net_grp.get_output_vstream_infos()
for vstr_info in vtrs_outputs:
    if vstr_main_output_name in vstr_info.name:
        vstr_main_output = vstr_info
        
vstr_prms_input = hailo_platform.InputVStreamParams.make(net_grp)
vstr_prms_output = hailo_platform.OutputVStreamParams.make(net_grp)

... some capture and resize stuff ...

input_data = { vstr_main_output_name: numpy.expand_dims(np_image_scaled, axis=0) }

with hailo_platform.InferVStreams(net_grp, vstr_prms_input, vstr_prms_output) as vstr_infer:

    # returns a dict with multiple output layer names as key
    # also it's batch/frame based
    results = vstr_infer.infer(input_data)
    
    # should be correct 'main' output for each HEF model?
    result_main = results[vstr_main_output_name]
    
    # load YAML config for model, in this example: hailo_model_zoo/cfg/base/scrfd.yaml
    ... some YAML magic ...

    post_proc = SCRFDPostProc(np_image_scaled.shape, anchros=yaml_data['postprocessing']['anchros'])

    # infer 1. batch/frame
    post_proc.tf_postproc(result_main[0])

However I get an exception in the last line: face detection failed: All branches must have the same number of output nodes

So the number seems to come from the YAML config:

  anchors:
    steps:
    - 8
    - 16
    - 32

However none of the output layers has either 8, 16 or 32 in its dimension.

Soo … tf_postproc is not what I need?

Hey @chrime

It looks like you’re on the right track with using SCRFDPostProc for post-processing, but the error you’re encountering (“All branches must have the same number of output nodes”) likely comes from a mismatch between the anchors in your YAML config and the actual output layers of your model.

The anchors defined in your YAML file, with steps 8, 16, and 32, are typically linked to specific downsampling rates in the model. If your model’s output layers don’t match these steps, the post-processing function might not align properly. You may need to adjust the anchor settings in the YAML file to fit the spatial dimensions of your model’s outputs.

It would also be helpful to verify whether the output layers of your HEF model match the expected anchor box dimensions for SCRFD. If tf_postproc isn’t quite fitting your setup, you might need to modify the post-processing pipeline or try an alternative approach that better aligns with your model’s output.

Hope this helps! Let me know if you need further clarification.

Best regards,
Omri

Hi Omri

Why are the anchors defined in the YAML file hailo_model_zoo/cfg/base/scrfd.yaml, resp. hailo_model_zoo/cfg/base/retinaface.yaml, different?
It’s from the official Hailo repository.
So my expectation was that hailo_model_zoo.core.postprocessing.face_detection.scrfd.SCRFDPostProc,
resp. hailo_model_zoo.core.postprocessing.face_detection_postprocessing.FaceDetectionPostProc, should work?

One sec, let me see from where I’ve downloaded the SCRF/Retinaface models …

and

Do you have different models for me?

Source refers to this repository: GitHub - biubug6/Pytorch_Retinaface: Retinaface get 80.99% in widerface hard val using mobilenet0.25.
Should i continue looking there?

My guess was that the best place to look for a decoder was in your Python package hailo_platform, but i couldn’t / can’t find it.
So far hailo_platform seems to be only hardware related stuff?

Then i looked into the Tappas repository and found core/hailo/libs/postprocesses/detection/face_detection.cpp,
which looks like it may contain the decoding method i need, but it’s all in C++ (also Tappas seems heavily dependent on GStreamer).

Finally i looked into hailo_model_zoo repository and found FaceDetectionPostProc / SCRFDPostProc, which seems to rely on a different model configuration?

Can you point me to an example code in one of your repositories or an non-Hailo, external repository?

Also: will utility/helper functions and classes, for cases like this, added to hailo_platform some day?

Thank you Omri

Best regards, chrime

Hey @chrime

1. Different Anchor Definitions in YAML Files:

The anchors in hailo_model_zoo/cfg/base/scrfd.yaml and retinaface.yaml are different because each model is trained with different scales and feature maps. These anchor definitions correspond to specific layers and downsampling factors used during training. The difference is expected as each model, SCRFD and RetinaFace, uses different network architectures and hence different anchor configurations.

Since the post-processing steps are tightly coupled with these anchor definitions, it’s essential to ensure that the YAML configuration matches the model you are using. If you switch between models, you’ll need to use the corresponding YAML file for each.

2. Model Sources and Compatibility:

If you’re using models from external sources, like the one from biubug6’s Pytorch RetinaFace repository, there could be differences in architecture or configuration that don’t align perfectly with the models in the Hailo Model Zoo. This could be why you’re experiencing issues when using SCRFDPostProc or FaceDetectionPostProc from the model zoo. These post-processing scripts are designed for models pre-trained and optimized for Hailo hardware.

It’s a good idea to double-check whether the models you’re using (e.g., from GitHub) match the configuration expected by the YAML files in the model zoo. If there’s a mismatch, you may need to adjust the post-processing code or even train a new model aligned with Hailo’s setup.

3. Finding Decoders in Hailo’s Codebase:

You’re correct that the hailo_platform package is more focused on hardware-related functionalities. For decoding face detection outputs, you’re on the right track with looking at the TAPPAS repository, where post-processing functions are implemented in C++. However, if you prefer to work in Python, the model zoo does include Python-based post-processing, which should align with the models from the zoo.

You can also explore the post-processing examples in the Hailo Model Zoo, such as:

  • hailo_model_zoo.core.postprocessing.face_detection.scrfd.SCRFDPostProc
  • hailo_model_zoo.core.postprocessing.face_detection_postprocessing.FaceDetectionPostProc

These are designed to handle the post-processing for the pre-trained models available in the Hailo Model Zoo, and you should be able to adapt them for your specific needs.

4. Utility Functions in hailo_platform:

At the moment, hailo_platform primarily focuses on interfacing with the hardware, and most of the model-specific post-processing logic is handled in the model zoo or TAPPAS. That said, adding utility/helper functions for post-processing is a great suggestion! This feedback can be passed to the Hailo team for consideration in future releases.

Next Steps:

  • Ensure you’re using the correct YAML configuration that matches the model you’re running.
  • For models downloaded from other sources, be prepared to modify the post-processing pipeline or retrain a compatible version for Hailo.
  • You can find most of the post-processing logic in TAPPAS and the hailo_model_zoo, with C++ and Python options available.

Feel free to share more details about the specific models you’re using, and I’d be happy to help further!

Best regards,
Omri

@chrime

Any progress on this? I’m also struggling with postprocessing of scrfd

Sorry no. I’m not working on thos project currently.
If you have any new ideas and/or resources please let me know too.
Thanks!