I’m running a semantic segmentation model on Hailo-8 with a Raspberry Pi 5, using HailoRT 4.21, and I’m seeing much slower inference performance in my C++ code (~18.27 fps) compared to hailortcli(40.50 fps).
I’m measuring inference time only and setting power_mode to HAILO_POWER_MODE_ULTRA_PERFORMANCE:
auto configure_params = vdevice->create_configure_params(hef).value();
for (auto &[name, params] : configure_params) {
params.power_mode = HAILO_POWER_MODE_ULTRA_PERFORMANCE;
}
auto network_group = vdevice->configure(hef, configure_params).value().at(0);
auto start_time = std::chrono::high_resolution_clock::now();
hailo_status status = vstreams.infer(input_views, output_views, 1);
auto end_time = std::chrono::high_resolution_clock::now();
Do you know why this is happening and how I can improve the inference time?
The CLI is much faster because it uses an optimized async pipeline that overlaps transfers with inference and pre-maps buffers. Your C++ code is doing blocking calls with overhead on every frame.
I’d like to ask two questions about the optimizations:
Is DMA pinned memory will also be more optimal for synchronous inference?
If I’m running the two inferences in two separate CPU threads, are they going to be parallelized on HAILO hardware? Or do I have to use the async API for that? Or they always queued no matter what API I use?