Hello everyone.
Based on one of your examples, I was able to run face detection (without GStreamer) with retinaface_mobilenet_v1, lightface_slim, scrfd_500m, scrfd_2.5g or scrfd_10g.
However I’m confused by the output.
For example retinaface_mobilenet_v1:
Architecture HEF was compiled for: HAILO8L
Network group name: retinaface_mobilenet_v1, Multi Context - Number of contexts: 3
Network name: retinaface_mobilenet_v1/retinaface_mobilenet_v1
VStream infos:
Input retinaface_mobilenet_v1/input_layer1 UINT8, NHWC(736x1280x3)
Output retinaface_mobilenet_v1/conv41 UINT8, NHWC(92x160x8)
Output retinaface_mobilenet_v1/conv42 UINT8, NHWC(92x160x4)
Output retinaface_mobilenet_v1/conv43 UINT8, FCR(92x160x20)
Output retinaface_mobilenet_v1/conv32 UINT8, NHWC(46x80x8)
Output retinaface_mobilenet_v1/conv33 UINT8, NHWC(46x80x4)
Output retinaface_mobilenet_v1/conv34 UINT8, FCR(46x80x20)
Output retinaface_mobilenet_v1/conv23 UINT8, NHWC(23x40x8)
Output retinaface_mobilenet_v1/conv24 UINT8, NHWC(23x40x4)
Output retinaface_mobilenet_v1/conv25 UINT8, FCR(23x40x20)
I guess “retinaface_mobilenet_v1/conv25” is the final output?
What is this shape 23, 40, 20?
It’s not BBoxes or anything I’ve seen before.
Also the numbers do not make sense to me. Input shape is 738, 1280, 3 and one output examples looks like this:
[[124 130 128 ... 130 126 132]
[124 127 126 ... 124 123 125]
[125 127 127 ... 127 124 128]
...
[109 129 117 ... 143 115 143]
[112 127 119 ... 133 117 135]
[118 122 122 ... 128 123 128]]
My face was in the middle of the image, so my guess is those numbers are not y,x positions?
Thank you for any help!