About the maximum of --batch-size

when I was trying to increase the throughput by setting the --batch-size to 1000, the run command and benchmark command gave me same error report saying “given batch size (1000) is bigger than max allowed (63)” , why the maxinum is set to be 63? How should I fix this?

The short answer is batch size of 1000 will not increase performance and will likely even decrease it. Batch size also increases latency.

There are two scenarios to look at.

The first one is that you have a model that does fit into a single Hailo-8 device. Then the batch-size parameter for running inference is irrelevant. You can achieve the highest throughput even with batch-size set to 1. You can push one image after another into the device. As soon as the previous image finished processing in the first layer, the next image can be pushed into the first layer.

The second case is when you have a model that is larger than a single Hailo-8/8L device. In this case the Hailo Dataflow Compiler will divide the network into multiple contexts. In this case during inference the network is computed one context at a time. In between the data is send back to the host and loaded when the context is switched.
In this case using a batch of images will decrease the switching overhead. However the gain will get smaller when you increase the batch-size and at the same time HailoRT will need to use more memory on the host to store the intermediate results.

I suspect, there is no need to fix this. Your application should work with the limit set by our engineers with good performance.

If you think that is not the case, can you please provide some more details on your use case. Why do you think you need batch size 1000?

1 Like

really appreciate your response!!

I want to the batch size higher because in the benchmark given by the official web site, the fps of resnet50 on a single device could reach more than 1300. When I was testing locally, the context is devided into 3 pieces, and the fps rapidly increase as batch-size increases. And when the batch size is 63, the fps is around 1000 and it is still going up. So I want to know if there is any method that I could break the 63 limit and to check if the device could reach the expected performance in my own environment.

By the way, if I have 2 hailo 8 devices and I do ''hailortcli run .hef --device-count 2, will the device 0 runs context 0 and device 1 runs context1 at the same time? Or in other similar way? :grin:

Do you have a Hailo-8 or Hailo-8L?

The model will run on both devices separately with context switching.

If you want to run each context on a separate Hailo-8 you would have to do this manually. This can be a solution in some cases e.g. you have two contexts and two devices and you know you do not want to run any other models.
However when you want to run other models as well it is better to just leave this to the HailoRT scheduler and run the models with context switching. This is much easier to manage.

Yes I do have one. And it could work well on single-context models. So I wanna test it by reaching the maxinum performance of multi-context model. Do you have any solution for a larger batch size?

thank for knowledging.

Which one do you have, the Hailo-8 or the Hailo-8L?

This is a limit set in the HailoRT driver.

GitHub - hailort_driver.hpp - See line 46

I have a Hailo-8. So what is the batch-size that was used to generate the fps data shown in the official site, which is 1357?

Can I modify the marco definition in the file?

This was measured with batch-size 1. However it used a HEF file that was compiled to single context. You can download the HEF from the GitHub ModelZoo.

GitHub - Hailo Model Zoo - Hailo-8 Classification

You would need to compile the model with performance mode. This will make the Hailo Dataflow Compiler try harder to find the best solution. This can take a lot longer to compile.

If it is just for fun you can give it a try.

1 Like

No more problems for me. Thank you so much for your patience and responses! :grin: :grin: