Looking for a head-pose-estimation pipeline (C++)

Hi,
as the title says, I am looking for a working example of a head-pose-estimation pipeline. It has to be a controlled environment with all parameters known, which should work on a Hailo 8.

What I’ve tried so far:
Since we are limited of what the DFC can compile. I chose this repo because it provides weights for MobileNetV2 and V3.

https://github.com/yakhyo/head-pose-estimation

I’ve converted the model from .pt to .onnx and then compiled it with the DFC the .onnx to a .hef with:
hailomz compile mobilenet_v2_1.4 --ckpt=mobilenetv2.onnx --hw-arch hailo8 --calib-path 300W_LP-Calibration_images

The corresponding .alls (mobilenet_v2_1.4) applies a normalization, which differs from the original and which I tried to compensate.

Within the code, I infer first with an Oak-D/ depthai pipeline. Because I haven’t figured out how to do inference with two models on the Hailo 8.

Whatever I try, I can’t make the head-pose model work (there is an output, but it could also be just quantization jitter).

I have given up this approach.

If I can’t find a solution which can be compiled with the DFC, I will go Oak-D only with this example (Except the for me unnecessary gaze / feature part. I just need the Euler angles):
https://github.com/luxonis/depthai-experiments/tree/master/gen2-gaze-estimation

Questions:

  1. Do you know a pose estimation pipeline that should definitely work on a Hailo 8?
  2. How can I process two models in series (C++ explanation / example preferred)?

Thanks in advance!

Kind of solved, after trying with an Oak-D/depthai converted FP16 Model:

I found out that the headpose-NN worked better with a less shaky input. And from a newbie-in-CV-perspective, it somehow makes sense. The NN seems to fixate on features it recognizes in the Input image, and when those features wiggle around, it is difficult to keep track of the angles.

If that is so, then the use of quantized models doesn’t make sense in this kind of scenario.

I invite you to enlighten me.

Hey @user115,

For pose estimation with C++, check out this repo:

You might need to tweak the postprocessing to match your specific model.

For running two models in series (like face detection → head pose estimation), you’ve got two solid options:

Option 1: Manual VDevice Approach

  1. Load both HEF files using hailort::VDevice
  2. Set up separate input/output streams for each model
  3. For each frame:
    • Run face detection and get bounding boxes
    • Crop and resize the ROIs using OpenCV
    • Feed those crops into your head pose model
    • Collect all results

This gives you full control but requires more manual management.

Option 2: Using HailoRT Scheduler (Recommended)

The Scheduler handles resource sharing and optimization automatically:

// Load multiple HEFs
std::vector<std::shared_ptr<hailort::HEF>> hefs = {
    hailort::HEF::create("face_detection.hef"),
    hailort::HEF::create("head_pose_estimation.hef")
};

// Create VDevice with scheduler
auto vdevice = hailort::VDevice::create(hailort::VDevice::DEFAULT_SETTINGS);

// Configure scheduler
std::vector<std::shared_ptr<hailort::ConfiguredNetworkGroup>> network_groups;
for (auto &hef : hefs) {
    auto net_group = vdevice->configure(hef);
    network_groups.push_back(net_group);
}

With the Scheduler, you get:

  • Better resource utilization
  • Automatic stream multiplexing
  • Simplified management of multiple models
  • Optimized execution

You’ll still need to handle the cropping/resizing between models, but the Scheduler takes care of all the resource management.

Let me know if you need more implementation details!