Inference speed and batch size

roiyim · October 22, 2024, 5:24am

Hi hailo

The two HEF models attached below have the same ONNX model, but different optimization model scripts.
However, as shown in the plots below, when varying the input batch size using a Python SDK, the FPS speed patterns differ.
The first model does not significantly increase in speed with larger batch sizes, whereas the second model experiences a substantial speed increase as the batch size grows.
Could you explain the potential reasons for this difference?

Thanks

Omer · October 23, 2024, 5:30am

Hi @roiyim,
Using batch size will speed up the model in case it’s complied in multiple contexts - a situation where the model is too large to fit the Hailo8 resources as a whole, so the model is “broken down” into several contexts so that only one contexts is loaded to the device each time, while the other wait on the host platform’s memory.

It might be that the higher optimization level also includes higher 4-bit compression and therefore have one of the models is compiled into single context\less contexts than the other one? this can explain why for one model using batch size is more significant.

Regards,

Topic		Replies	Views
My model runs slower than expected General debug , optimization	1	742	July 17, 2024
About the maximum of --batch-size General hailo8	8	524	July 23, 2024
same fps for batch-1 and batch-8 General hailo8	1	48	December 7, 2024
Speed optimization for custom Model (c++) General hailort , hailo8	5	212	October 1, 2024
The difference FPS value with same hef file General	1	163	July 2, 2024

Inference speed and batch size

Related topics