Retinaface Mobilenet V1 output

chrime · September 26, 2024, 8:50am

Hello everyone.

Based on one of your examples, I was able to run face detection (without GStreamer) with retinaface_mobilenet_v1, lightface_slim, scrfd_500m, scrfd_2.5g or scrfd_10g.

However I’m confused by the output.
For example retinaface_mobilenet_v1:

Architecture HEF was compiled for: HAILO8L
Network group name: retinaface_mobilenet_v1, Multi Context - Number of contexts: 3
    Network name: retinaface_mobilenet_v1/retinaface_mobilenet_v1
        VStream infos:
            Input  retinaface_mobilenet_v1/input_layer1 UINT8, NHWC(736x1280x3)
            Output retinaface_mobilenet_v1/conv41 UINT8, NHWC(92x160x8)
            Output retinaface_mobilenet_v1/conv42 UINT8, NHWC(92x160x4)
            Output retinaface_mobilenet_v1/conv43 UINT8, FCR(92x160x20)
            Output retinaface_mobilenet_v1/conv32 UINT8, NHWC(46x80x8)
            Output retinaface_mobilenet_v1/conv33 UINT8, NHWC(46x80x4)
            Output retinaface_mobilenet_v1/conv34 UINT8, FCR(46x80x20)
            Output retinaface_mobilenet_v1/conv23 UINT8, NHWC(23x40x8)
            Output retinaface_mobilenet_v1/conv24 UINT8, NHWC(23x40x4)
            Output retinaface_mobilenet_v1/conv25 UINT8, FCR(23x40x20)

I guess “retinaface_mobilenet_v1/conv25” is the final output?
What is this shape 23, 40, 20?
It’s not BBoxes or anything I’ve seen before.

Also the numbers do not make sense to me. Input shape is 738, 1280, 3 and one output examples looks like this:

[[124 130 128 ... 130 126 132]
  [124 127 126 ... 124 123 125]
  [125 127 127 ... 127 124 128]
  ...
  [109 129 117 ... 143 115 143]
  [112 127 119 ... 133 117 135]
  [118 122 122 ... 128 123 128]]

My face was in the middle of the image, so my guess is those numbers are not y,x positions?

Thank you for any help!

chrime · September 26, 2024, 2:17pm

So after some more digging:

            lNetworkGroups = self._hailoVDevice.configure(hailoHEF, dHailoCfgParams)

            self._hailoNetGrp = lNetworkGroups[0]  # type: pkHailoPlPY.ConfiguredNetwork
            self._hailoNetParams = self._hailoNetGrp.create_params()

            # lInfos:
            # [0]: 'direction', 'format', 'name', 'network_name', 'nms_shape', 'quant_info', 'shape'
            # [0].shape: tuple
            # [0].format: 'equals' method, 'flags' FormatFlags, 'order' FormatOrder, 'type' FormatType
            # [0].quant_info: 'limvals_max' float, 'limvals_min' float, 'qp_scale' float, 'qp_zp' float
            # pkHailoPl.FormatType: 'AUTO', 'FLOAT32', 'UINT16', 'UINT8'

            sLastOutputName = self._hailoNetGrp.get_sorted_output_names()[-1]

Is my assumption correct that “get_sorted_output_names()[-1]” is the correct layer which should have the detection information?

So “correct” last layer for “scrfd_500m” looks like:

output: scrfd_500m/conv40 (20, 20, 20) FormatType.UINT8 ; 0.04532748833298683 119.0

If i do more guessing i would say (20, 20, 20) means up to 20 faces can be detected?
If yes what’s inside (20, 20) ?

omria · September 29, 2024, 1:52am

Hey @chrime,

Glad to hear you’ve got face detection running on multiple models! Let’s clarify the confusion around the output layers and shapes for the retinaface_mobilenet_v1 model.

Output Layer Interpretation:
The layer ‘retinaface_mobilenet_v1/conv25’ with shape (23, 40, 20) is an output feature map, not the final bounding boxes or keypoints. In face detection models like RetinaFace, outputs typically represent:
- Location (bbox) predictions
- Face detection confidence scores
- Landmark predictions (eyes, nose, mouth, etc.)
The (23, 40, 20) shape can be seen as a grid: 23x40 is a downscaled spatial map of your original input image, and 20 likely combines information for multiple anchor boxes and features.
Post-Processing:
These raw outputs need post-processing, including:
- Decoding bounding box coordinates
- Applying Non-Maximum Suppression (NMS) to filter overlapping detections
- Interpreting landmark and confidence scores
The numbers you see are raw feature map values, not direct x, y coordinates for bounding boxes. You’ll need to apply specific post-processing steps (usually found in model documentation or example code) to get final bounding boxes and landmarks.
Input vs Output Shape:
Your input shape (736, 1280, 3) is processed through multiple network layers, which downscale spatial dimensions and increase feature channel depth, resulting in outputs like (23, 40, 20). This downscaling is common in convolutional networks for efficiency and to capture larger features.

Let me know if you need help with post-processing or want more details on the specific outputs!

Best regards,
Omri

chrime · October 1, 2024, 2:03pm

I found several classes for face detection post-processing in the GitHub repository hailo_model_zoo: hailo_model_zoo\hailo_model_zoo\core\postprocessing

F.e. hailo_model_zoo.core.postprocessing.face_detection.scrfd.SCRFDPostProc

The method “tf_postproc” looks like it should do all the post-process stuff?

Thank you for your help

chrime · October 1, 2024, 2:27pm

So my current (simplified) flow is:

hef = hailo_platform.HEF('data/models/hailo8l/face_detection/scrfd_500m.hef')

vd_prms = hailo_platform.VDevice.create_params()
v_device = hailo_platform.VDevice(vd_prms)

cfg_prms = hailo_platform.ConfigureParams.create_from_hef(hef=hef, interface=hailo_platform.HailoStreamInterface.PCIe)

net_grps = v_device.configure(hef, cfg_prms)
net_grp = net_grps[0]
net_grp_prms = net_grp.create_params()

vstr_main_input = net_grp.get_input_vstream_infos()[0]

vstr_main_output_name = net_grp.get_sorted_output_names()[-1]

vstr_main_output = None
vtrs_outputs = net_grp.get_output_vstream_infos()
for vstr_info in vtrs_outputs:
    if vstr_main_output_name in vstr_info.name:
        vstr_main_output = vstr_info
        
vstr_prms_input = hailo_platform.InputVStreamParams.make(net_grp)
vstr_prms_output = hailo_platform.OutputVStreamParams.make(net_grp)

... some capture and resize stuff ...

input_data = { vstr_main_output_name: numpy.expand_dims(np_image_scaled, axis=0) }

with hailo_platform.InferVStreams(net_grp, vstr_prms_input, vstr_prms_output) as vstr_infer:

    # returns a dict with multiple output layer names as key
    # also it's batch/frame based
    results = vstr_infer.infer(input_data)
    
    # should be correct 'main' output for each HEF model?
    result_main = results[vstr_main_output_name]
    
    # load YAML config for model, in this example: hailo_model_zoo/cfg/base/scrfd.yaml
    ... some YAML magic ...

    post_proc = SCRFDPostProc(np_image_scaled.shape, anchros=yaml_data['postprocessing']['anchros'])

    # infer 1. batch/frame
    post_proc.tf_postproc(result_main[0])

However I get an exception in the last line: face detection failed: All branches must have the same number of output nodes

So the number seems to come from the YAML config:

  anchors:
    steps:
    - 8
    - 16
    - 32

However none of the output layers has either 8, 16 or 32 in its dimension.

Soo … tf_postproc is not what I need?

omria · October 6, 2024, 7:59am

Hey @chrime

It looks like you’re on the right track with using SCRFDPostProc for post-processing, but the error you’re encountering (“All branches must have the same number of output nodes”) likely comes from a mismatch between the anchors in your YAML config and the actual output layers of your model.

The anchors defined in your YAML file, with steps 8, 16, and 32, are typically linked to specific downsampling rates in the model. If your model’s output layers don’t match these steps, the post-processing function might not align properly. You may need to adjust the anchor settings in the YAML file to fit the spatial dimensions of your model’s outputs.

It would also be helpful to verify whether the output layers of your HEF model match the expected anchor box dimensions for SCRFD. If tf_postproc isn’t quite fitting your setup, you might need to modify the post-processing pipeline or try an alternative approach that better aligns with your model’s output.

Hope this helps! Let me know if you need further clarification.

Best regards,
Omri

chrime · October 8, 2024, 8:06am

Hi Omri

Why are the anchors defined in the YAML file hailo_model_zoo/cfg/base/scrfd.yaml, resp. hailo_model_zoo/cfg/base/retinaface.yaml, different?
It’s from the official Hailo repository.
So my expectation was that hailo_model_zoo.core.postprocessing.face_detection.scrfd.SCRFDPostProc,
resp. hailo_model_zoo.core.postprocessing.face_detection_postprocessing.FaceDetectionPostProc, should work?

One sec, let me see from where I’ve downloaded the SCRF/Retinaface models …

and

github.com

hailo-ai/hailo_model_zoo/blob/master/docs/public_models/HAILO8L/HAILO8L_face_detection.rst


Public Pre-Trained Models
=========================

.. |rocket| image:: ../../images/rocket.png
  :width: 18

.. |star| image:: ../../images/star.png
  :width: 18

Here, we give the full list of publicly pre-trained models supported by the Hailo Model Zoo.

* Network available in `Hailo Benchmark <https://hailo.ai/products/ai-accelerators/hailo-8l-ai-accelerator-for-ai-light-applications/#hailo8l-benchmarks/>`_ are marked with |rocket|
* Networks available in `TAPPAS <https://github.com/hailo-ai/tappas>`_ are marked with |star|
* Benchmark and TAPPAS  networks run in performance mode
* All models were compiled using Hailo Dataflow Compiler v3.29.0



.. _Face Detection:

This file has been truncated. show original

Do you have different models for me?

Source refers to this repository: GitHub - biubug6/Pytorch_Retinaface: Retinaface get 80.99% in widerface hard val using mobilenet0.25.
Should i continue looking there?

My guess was that the best place to look for a decoder was in your Python package hailo_platform, but i couldn’t / can’t find it.
So far hailo_platform seems to be only hardware related stuff?

Then i looked into the Tappas repository and found core/hailo/libs/postprocesses/detection/face_detection.cpp,
which looks like it may contain the decoding method i need, but it’s all in C++ (also Tappas seems heavily dependent on GStreamer).

Finally i looked into hailo_model_zoo repository and found FaceDetectionPostProc / SCRFDPostProc, which seems to rely on a different model configuration?

Can you point me to an example code in one of your repositories or an non-Hailo, external repository?

Also: will utility/helper functions and classes, for cases like this, added to hailo_platform some day?

Thank you Omri

Best regards, chrime

omria · October 9, 2024, 10:39am

Hey @chrime

1. Different Anchor Definitions in YAML Files:

The anchors in hailo_model_zoo/cfg/base/scrfd.yaml and retinaface.yaml are different because each model is trained with different scales and feature maps. These anchor definitions correspond to specific layers and downsampling factors used during training. The difference is expected as each model, SCRFD and RetinaFace, uses different network architectures and hence different anchor configurations.

Since the post-processing steps are tightly coupled with these anchor definitions, it’s essential to ensure that the YAML configuration matches the model you are using. If you switch between models, you’ll need to use the corresponding YAML file for each.

2. Model Sources and Compatibility:

If you’re using models from external sources, like the one from biubug6’s Pytorch RetinaFace repository, there could be differences in architecture or configuration that don’t align perfectly with the models in the Hailo Model Zoo. This could be why you’re experiencing issues when using SCRFDPostProc or FaceDetectionPostProc from the model zoo. These post-processing scripts are designed for models pre-trained and optimized for Hailo hardware.

It’s a good idea to double-check whether the models you’re using (e.g., from GitHub) match the configuration expected by the YAML files in the model zoo. If there’s a mismatch, you may need to adjust the post-processing code or even train a new model aligned with Hailo’s setup.

3. Finding Decoders in Hailo’s Codebase:

You’re correct that the hailo_platform package is more focused on hardware-related functionalities. For decoding face detection outputs, you’re on the right track with looking at the TAPPAS repository, where post-processing functions are implemented in C++. However, if you prefer to work in Python, the model zoo does include Python-based post-processing, which should align with the models from the zoo.

You can also explore the post-processing examples in the Hailo Model Zoo, such as:

hailo_model_zoo.core.postprocessing.face_detection.scrfd.SCRFDPostProc
hailo_model_zoo.core.postprocessing.face_detection_postprocessing.FaceDetectionPostProc

These are designed to handle the post-processing for the pre-trained models available in the Hailo Model Zoo, and you should be able to adapt them for your specific needs.

4. Utility Functions in `hailo_platform`:

At the moment, hailo_platform primarily focuses on interfacing with the hardware, and most of the model-specific post-processing logic is handled in the model zoo or TAPPAS. That said, adding utility/helper functions for post-processing is a great suggestion! This feedback can be passed to the Hailo team for consideration in future releases.

Next Steps:

Ensure you’re using the correct YAML configuration that matches the model you’re running.
For models downloaded from other sources, be prepared to modify the post-processing pipeline or retrain a compatible version for Hailo.
You can find most of the post-processing logic in TAPPAS and the hailo_model_zoo, with C++ and Python options available.

Feel free to share more details about the specific models you’re using, and I’d be happy to help further!

Best regards,
Omri

axel.moller · October 31, 2024, 9:17pm

@chrime

Any progress on this? I’m also struggling with postprocessing of scrfd

chrime · November 7, 2024, 6:02pm

Sorry no. I’m not working on thos project currently.
If you have any new ideas and/or resources please let me know too.
Thanks!

chrime · December 27, 2024, 7:12pm

Hi omria

Do you know Netron?
If not, it’s a neural network model viewer.

Anyway …

I’ve checked outputs of “scrfd_500m.zip” from here https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/FaceDetection/scrfd/scrfd_500m/pretrained/2022-09-07/scrfd_500m.zip
See my uploaded screenshot.

My question: if i load the HEF file of “scrfd_500m” and list the outputs with “get_sorted_output_names()” it shows ‘scrfd_500m/conv27’, ‘scrfd_500m/conv26’, ‘scrfd_500m/conv25’, ‘scrfd_500m/conv33’, ‘scrfd_500m/conv32’, ‘scrfd_500m/conv34’, ‘scrfd_500m/conv39’, ‘scrfd_500m/conv38’, ‘scrfd_500m/conv40’.

Which one corresponds to bbox_32 shown in the uploaded screenshot?

Thanks

davidpichevin · January 3, 2025, 10:04pm

I’m trying to get some face detection to work on rpi5 and banging my head against it too, using the same postprocessing code.

Passing the results of the inference directly to the postprocess gives a “KeyError: 0” on “sorted_output_branches[branch_index]”, which is right since the results are a dict, with no index.

Passing results[output_vstream_info.name] or results[output_vstream_info.name][0] is a different error as the previous poster got, since now the post processor is receiving 80 output branches instead of 9.

Any chance you guys could write and provide an example of face detection using retinaface or scrfd and not just general detection or pose estimation. That would be really helpful in seeing how the face detection data is processed. The face detect post processing code is in there but it doesn’t look like anything is using it.

shashi · January 3, 2025, 10:27pm

@davidpichevin
Is a yolo based face detection model ok for your use case? Or you can only use retinaface or scrfd?

davidpichevin · January 3, 2025, 11:03pm

Hi,

That would work if I can get face landmarks and features with it. The rpicam example using yolov5 works but is very basic, which is why I looked for something else (and in Python), but I don’t have other reasons to not use it. I got something working well on Hailo8 board with the object detection and pose estimation python basic pipelines, would love to see a pipeline for the face detection/recognition models.

shashi · January 3, 2025, 11:17pm

@davidpichevin
Here is the code that performs face detection with 5 keypoints: widerface keypoints
Generally, this model is used as the first model in the face recognition pipeline. Please try and let us know if you encounter any difficulties.

davidpichevin · January 4, 2025, 1:06am

Thanks, but I’m not finding demo code in there, just another sdk. At this point, I guess I’d be happy to find an hef file for a yolo8 face model with landmarks, but I’m not seeing any of that in the Hailo model zoo, just the retinaface, scrfd etc (which is why I was trying to use those)

shashi · January 4, 2025, 1:12am

@davidpichevin
Maybe I need to provide a little bit of context. We developed a python SDK that makes working with Hailo models easy. The challenge in working with different models is getting the postprocessor integrated properly. We have integrated postprocessors for yolov8 classification, detection, segmentation, and key points. The notebook link I shared is in fact a working code that finds faces and 5 key points. The code automatically downloads the model and runs it on your local Hailo8/Hailo8L device.

davidpichevin · January 4, 2025, 1:54am

It isn’t, sorry. I installed it against my better judgement, and it seems to be yet another piece of code that doesn’t work and opens another can of worms: “ImportError: cannot import name ‘ImageTk’ from ‘PIL’ (/usr/lib/python3/dist-packages/PIL/init.py)”

Maybe I’m a little too annoyed after spending another day on this. I think the position estimation might work out for my project, if I can find the piece of code that actually manages the various landmarks (it seems to be in yet another black box).

shashi · January 4, 2025, 4:12am

Thank you for bringing this to our attention, and we sincerely apologize for the inconvenience you’ve experienced. We completely understand how frustrating it can be when things don’t work as expected, especially after spending time trying to set everything up.

The issue you encountered was related to the degirum_tools package in a recent release. We’ve identified the problem and have fixed it in the latest version. To ensure everything works correctly, please uninstall the previous version of degirum_tools and install the updated one. You can do this with the following commands:

pip uninstall degirum_tools
pip install degirum_tools

Please verify that the installed version is at least 0.16.3 to avoid any further issues. This should resolve the error you encountered.

Thank you for your understanding and patience.

shashi · January 8, 2025, 4:43am

@davidpichevin @axel.moller @chrime
We have added support for postprocessing of scrfd models (all 3 flavors: 10g, 2.5g, and 500m) to PySDK. Please see our post: SCRFD Model Support in DeGirum PySDK

Topic		Replies	Views
Help with inference/post-processing for the scrfd model for face detection General hailo8	5	617	May 21, 2025
hailo8 retinaface mobilenet model postprocessing General hailort , python , hailo8 , hailo-api	3	64	May 28, 2025
Postprocessing scrfd_10g output General raspberry-pi	1	112	January 8, 2025
Hailo-rpi5 scrfd_10g model error General raspberry-pi , hailo8	3	202	October 28, 2024
SCRFD Model Support in DeGirum PySDK General	5	408	May 21, 2025

Retinaface Mobilenet V1 output

1. Different Anchor Definitions in YAML Files:

2. Model Sources and Compatibility:

3. Finding Decoders in Hailo’s Codebase:

4. Utility Functions in hailo_platform:

Next Steps:

Related topics

4. Utility Functions in `hailo_platform`: