Highly accurate model for estimating depth and distance in space

Hello everyone,

I’m working on a 3D scene reconstruction project using RGB-D stereoscopic cameras, to estimate multi-human poses in space, whose articulations are constructed in real time to extract (x,y,z) coordinates of points of interest such as hands, head. The aim will therefore be to be able to evaluate euclidean distances in space very precisely, or at least as precisely as possible, and I’d be interested in an AI model to achieve this.

I’m currently using 2 rgb-d cams on an rpi5+hailo8 and I’ve heard of SC_Depth for this, but is it accurate enough for my real-time application? I’ve also tested the following exampl, but nothing more : Hailo-Application-Code-Examples/runtime/cpp/depth_estimation/scdepthv3 at main · hailo-ai/Hailo-Application-Code-Examples · GitHub

Does anyone have any suggestions, documentation for this or another model?

All the code is also done in python, and the models must be compatible with Hailo8, as I’ll be using several models for re-id and human pose estimation.

Hey @Theo_Vioux

About your depth estimation needs, we have a basic example using the SCDepthv3 model running on the Hailo8. You can find it here:

:paperclip: https://github.com/hailo-ai/hailo-rpi5-examples/blob/main/basic_pipelines/depth.py

The model used in that example is available here:
https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/ModelZoo/Compiled/v2.14.0/hailo8/scdepthv3.hef

This pipeline offers a good starting point for real-time monocular depth estimation, achieving 30 FPS on a Raspberry Pi 5 with a Hailo8. It’s a handy solution when you don’t have dedicated depth sensors.

Regarding the Hailo-Application-Code-Examples/runtime/cpp/depth_estimation/scdepthv3 example you mentioned, it’s also based on monocular depth estimation and mainly provides relative depth information. While useful for visual scene understanding, it doesn’t give you the precise metric-scale depth needed for accurate 3D multi-human pose estimation and Euclidean distance calculations – which is what you’re aiming for.

For higher accuracy in your specific use case:

Since you’re already working with two RGB-D cameras, I’d strongly recommend considering a stereo-based depth model like StereoNet. Here’s why it’s likely a better approach for you:

  • StereoNet can generate depth maps with actual metric values (in meters or millimeters), which is crucial for accurate 3D pose reconstruction and distance evaluation.

  • It’s better suited for precisely locating joints (like hands and the head) in 3D space.

  • You can still leverage the Hailo8 for fast 2D pose estimation (using models like lightweight_openpose or hrnet-lite) and then use the depth information from your stereo setup at those keypoints to reconstruct full 3D poses.

Hello @omria,

Do you have any documentation for stereonet because I can’t find it.
I tried the following link: Hailo-Application-Code-Examples/runtime/python/depth_estimation at main · hailo-ai/Hailo-Application-Code-Examples · GitHub which seems to use stereonet.hef but there is an incompatibility between Python3.11 and tensorflow. In addition, after switching to Python 3.9, I’m again having a problem with:

from hailo_platform import (HEF, Device, VDevice, HailoStreamInterface, ConfigureParams,
ModuleNotFoundError: No module named 'hailo_platform'

What I’m mainly interested in is making a precise 3D reconstruction of a person’s joints to be reproduced in a scaled 3D scene with Python3.11.