Question about running inference on multiple Hailo models

Hello,

I’m currently working with YOLO-based models, like yolov11n for detection and yolov8s-pose for keypoints, on the Hailo-8L accelerator.

I have a couple of questions:

  1. Is it feasible to load and run inference on two different .hef models (on the same input)—say, one for detection and another for pose estimation—sequentially on the same device?

  2. Also, could you share any code examples or documentation on how to run compiled .hef models directly on Hailo hardware using HailoRT?

I’m really trying to get a better grasp on executing multiple models and managing the performance trade-offs involved.

Thanks so much for your assistance!

Best regards.

Yes, the HailoRT scheduler allows you to do that.

See HailoRT User Guide.

Our Application Examples contains a GStreamer based LPR pipeline using multiple networks.

GitHub - Hailo Application Code Examples - Multistream LPR

You can use the hailortcli run2 command to test running multiple networks without an application.

hailortcli run2 set-net model1.hef set-net model2.hef

Use hailortcli run2 --help to see all options to tune the scenario e.g. fix the framerate for a model …

You can use the HailoRT monitor to see the networks running similar to htop. In a second terminal use the following command.

hailortcli monitor

You will need to set the Hailo monitor environment variable before your run your app or the CLI command.

export HAILO_MONITOR=1
hailortcli run2 set-net model1.hef ...