Slow inference due to hailort::reorder_input_stream

Balduzzi_Giovanni · January 20, 2026, 10:36am

Dear all,
I am using hailort v4.23.0 with the c++ api.
I want to run inference with some resnet model, no issue at all and blazing fast when working with single channel, but when I want to run on up to 16 batched 256x256x4 images it massively slows down.
Running the model with hailortcli run I would expect 800 images / s, but the actual performance is close to 128 images/s.
If I investigate with perf top I see that most of the time is spent in the function hailort::reorder_input_stream.
My input workflow looks like the async_infer_basic_example.cpp example:

Create vdevice and parse .hef model
model->set_batch_size(16) and configure it
Create 16 bindings with configuredModel.create_bindings()
Allocate 16 page aligned buffers of size input.get_frame_size() (256x256x4 bytes)
For each binding get a binding set_buffer( hailort::MemoryView(…)))
For each buffer memcpy a HxWxC image into it
call configuredIModel.run_async(bindings)

Am I doing something wrong? Should I drop to the lower level api and consider the padding on the channel dimension? Should I upgrade to version 5.x?