Custom batch_size for models.

Dmytro_Broska · August 25, 2025, 1:45pm

Hi!
I use yolov11m model on Hailo8 chip to predict bounding boxes on video stream from camera connected to Raspberry 5. I use .send() and .recv() methods to give image for prediction and take results. Everything is okay.

But, now I want to try batch prediction. I want to pass 2 frames at once in .send() method and get 2 tensors for each input frame with predictions using .recv() method. I just changed my input tensor for [2, H, W, 3] and tried it. After several seconds i get error like that

[HailoRT] [error] Got HAILO_TIMEOUT while waiting for input stream buffer yolov11m/input_layer1
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_TIMEOUT(4) - Failed write to stream (device: 0001:01:00.0)

I assume that it is because of input buffer overflow, because each time I put 2 frames and receive prediction for one.
How can I adjust this parameter or recompile model for batch_size=2. I didnt find customization of batch_size parameter in Hailo docs.

Thanks!

KlausK · August 26, 2025, 10:39am

Welcome to the Hailo Community!

You don’t need to recompile the model. On Hailo devices, the batch size is set at runtime rather than during compilation.

There are two scenarios:

Single-context models: Batch size does not affect throughput (FPS).
Multi-context models: Increasing the batch size can improve throughput (FPS) but will add latency. This works because context-switching overhead is reduced. Each model has an upper limit where larger batch sizes no longer improve performance."

To check whether your model is single- or multi-context, run:

hailortcli parse-hef model.hef

To test the throughput of a model with batch size 2, run:

hailortcli run model.hef --batch-size 2

Dmytro_Broska · August 26, 2025, 11:53am

Thanks a lot!

My model has Number of contexts: 4 - does it mean that max batch size with performance improvements is 4 (upper limit)?

As I understand a value that is returned by this command
hailortcli run model.hef --batch-size 2

doesn’t guarantee the exact FPS while processing video stream, am I right?
I make inference on two frames using bindings and setting buffer (input, output) with .run_async() method. But, difference in inference time (time for .wait() method execution) increased almost twice for 1 and 2 frames input buffer.

Thank you in advance! I want to clarify these moments for myself to use Hailo module at full power!

Dmytro_Broska · August 26, 2025, 11:57am

Here’s my code for bindings creation:

bindings = list()

for name in self.\_input_stream_names:

    print(f'reading from {name}')

    frame = pickle.loads(next(self.\_buffer.receive(name)))

    frame = cv2.resize(frame, (model_input_width, model_input_height))

    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    bind = self.configured_infer_model.create_bindings()

    bind.input().set_buffer(frame)

    bind.output().set_buffer(np.empty(self.infer_model.output().shape).astype(np.float32))

    bindings.append(bind)

And code for inference:

jobs = self.configured_infer_model.run_async(bindings)
jobs.wait(self.timeout_ms)
for bind in bindings:
       prediction = bind.output().get_buffer()

KlausK · August 26, 2025, 12:42pm

This means the model has 4 parts (collection of layers). HailoRT will load the first context to the Hailo device and then sends the image. It will receive an intermediate result. Then HailoRT will load the next context to the Hailo device and send the intermediate result as input into the second context. This will be repeated until the final result is received. In your case 4 times.

The maximum batch-size is more related to the input size of the network. Just try increasing the batch-size until you get a warning from HailoRT CLI. Then you select a batch-size that is practical for your application and provides the required throughput.

HailoRT CLI will give you the upper limit of FPS that you can achieve with the HEF file. Your application will likely be slower depending on how you write the code, how pre- and post-processing your application requires and the host CPU.

Did you have a look at the Hailo Application Code Examples?

GitHub - Hailo-Application-Code-Examples - Python - Object Detection

Dmytro_Broska · August 26, 2025, 1:17pm

Yes, I saw code examples, and wrote my code as it was suggested there. But, I was a bit confused by huge difference between FPS rates in docs and what I got.

Thank you a lot! It is more clear now!

shashi · August 26, 2025, 1:26pm

Hi @Dmytro_Broska

At DeGirum (a SW partner of Hailo), we developed PySDK, a python package, to make application development easy with Hailo devices. Recently, we added support to make experimenting with batching easier. Please see an example here: hailo_examples/examples/018_eager_batching.ipynb at main · DeGirum/hailo_examples

Topic		Replies	Views
batch more than 1 General	22	110	July 15, 2025
About the maximum of --batch-size General hailo8	8	608	July 23, 2024
C/++ example of batch size > 1 please General	2	373	July 14, 2024
Relevance of batch size in CPP API for hailoRt General hailort	4	122	August 22, 2025
My model runs slower than expected General debug , optimization	1	842	July 17, 2024

Custom batch_size for models.

Related topics