Guideline on running inference on custom model that is converted to HEF format

Kevin_Walter · October 15, 2025, 11:30am

Hi, I have just started using Hailo(8) and I am impressed with the variety of models in the model zoo. However, instructions to run a simple inference while having the .hef model are still either too complex or vague. The examples given in the repo feel like they are in a system and if you have your own .hef you are just stuck there with no way to understand and run your model. I have installed the hailort on the Raspberry and have a ready .hef model from the model zoo and a sample dataset I would like to run inference on to measure throughput. If anyone has done something similar to this before, kindly help out also to the Hailo team, perhaps having more clear examples to these situations will help. Thanks

omria · October 15, 2025, 11:45am

Hey @Kevin_Walter,

Welcome to the Hailo Community!

You can definitely test your model using our example repos. For testing, I’d suggest checking out one of these:

The main thing you’ll need to do is figure out what your model’s output looks like and whether it’s using NMS postprocessing on-chip or on the CPU. Based on that, you’ll need to create your own postprocessing logic.

If your model is based on YOLOv8 or YOLOv11 and has HailortPP, then the first repo (hailo-apps-infra) would probably be your best bet since it already has the detection and simple detection postprocessing built in.

Hope this helps!

Kevin_Walter · October 15, 2025, 1:40pm

Thanks, actually, the examples helped. I have managed to run one of your models in the model zoo(mobilenetv1 classification). I am getting extremely low prediction scores considering i am testing on the imagenet dataset,also, how have you measured the fps because I am getting really low throughput compared to your results

omria · October 15, 2025, 2:06pm

Hey, about the FPS issue - make sure you’ve enabled Gen 3.0 PCIe, and then check the actual speed by running hailortcli run model.hef.

The RPi only has 1 PCIe lane, and those results you’re seeing in the model zoo are based on 2 lanes. So with larger models, you might notice a bit of a difference in performance.

As for the prediction scores, could you share some of your results with me?

Kevin_Walter · October 15, 2025, 2:20pm

Sure, here are the results, also you can see below the fps I am getting. I am running it asychronously and including this hailo_infer.last_infer_job.wait(10000) to ensure the inference takes place first before calculating the time