I’m working on a 3D scene reconstruction project using RGB-D stereoscopic cameras, to estimate multi-human poses in space, whose articulations are constructed in real time to extract (x,y,z) coordinates of points of interest such as hands, head. The aim will therefore be to be able to evaluate euclidean distances in space very precisely, or at least as precisely as possible, and I’d be interested in an AI model to achieve this.
Does anyone have any suggestions, documentation for this or another model?
All the code is also done in python, and the models must be compatible with Hailo8, as I’ll be using several models for re-id and human pose estimation.
This pipeline offers a good starting point for real-time monocular depth estimation, achieving 30 FPS on a Raspberry Pi 5 with a Hailo8. It’s a handy solution when you don’t have dedicated depth sensors.
Regarding the Hailo-Application-Code-Examples/runtime/cpp/depth_estimation/scdepthv3 example you mentioned, it’s also based on monocular depth estimation and mainly provides relative depth information. While useful for visual scene understanding, it doesn’t give you the precise metric-scale depth needed for accurate 3D multi-human pose estimation and Euclidean distance calculations – which is what you’re aiming for.
For higher accuracy in your specific use case:
Since you’re already working with two RGB-D cameras, I’d strongly recommend considering a stereo-based depth model like StereoNet. Here’s why it’s likely a better approach for you:
StereoNet can generate depth maps with actual metric values (in meters or millimeters), which is crucial for accurate 3D pose reconstruction and distance evaluation.
It’s better suited for precisely locating joints (like hands and the head) in 3D space.
You can still leverage the Hailo8 for fast 2D pose estimation (using models like lightweight_openpose or hrnet-lite) and then use the depth information from your stereo setup at those keypoints to reconstruct full 3D poses.