Relevance of batch size in CPP API for hailoRt

Hi guys,

I am trying to understand the way batch size is handled in the HailoRT cpp API.

I am starting from this example:

I learned that I can modify the batch size by configuring the relevant instance of the configure_params. Here I call

configure_params->begin()->second.batch_size = 4;

In the configure_network_group function, e.g. in line 86 Hailo-Application-Code-Examples/runtime/cpp/classifier/classifier.cpp at dd6ada9d0d10e8b75660b74ab56ba018165204c0 · hailo-ai/Hailo-Application-Code-Examples · GitHub

After doing this, I’d assume that I’d have to call

input[0].write(MemoryView(batch.data())

in line 134 (Hailo-Application-Code-Examples/runtime/cpp/classifier/classifier.cpp at dd6ada9d0d10e8b75660b74ab56ba018165204c0 · hailo-ai/Hailo-Application-Code-Examples · GitHub) with a memory view 4 times bigger, i.e. matching the size of the data for a full batch.

However, if I do this, i get:
[HailoRT] [error] CHECK failed - write size 602112 must be 150528

Similar I get the same issue for output.read:

[HailoRT] [error] CHECK failed - Buffer size is not the same as expected for pool! (16000 != 4000)

Do I have to also configure the InputVStream and OutputVStream somehow in order to pass a full batch?
Or is the batch size handled completely inside the Hailo, so regardless of batch size configured, the data has to be passed image by image …?

I am looking forward to hearing from you!

Best regards,
Wolf

Hi @Wolfram_Strothmann
The batch size is handled in HailoRT. You should keep pushing frames one at a time. We will manage the batching. Also on the output, you will get your inference output one at a time. This way you can use the same code w/o worrying about batch size. In addition we can automatically use “lower” batch size if not enough frames are pushed after some timeout (configurable).

How does batch size affect inference latency? Generally, should I try to keep it larger or smaller?

Intuitively I thought the smaller the better, but in practice batch_size=10 has much less latency than batch_size=1.

Also, is batch_size parameter at inference somehow related to batch_size parameters when converting a .pt model to .hef file?