I’m running a semantic segmentation model on Hailo-8 with a Raspberry Pi 5, using HailoRT 4.21, and I’m seeing much slower inference performance in my C++ code (~18.27 fps) compared to hailortcli(40.50 fps).
I’m measuring inference time only and setting power_mode to HAILO_POWER_MODE_ULTRA_PERFORMANCE:
auto configure_params = vdevice->create_configure_params(hef).value();
for (auto &[name, params] : configure_params) {
params.power_mode = HAILO_POWER_MODE_ULTRA_PERFORMANCE;
}
auto network_group = vdevice->configure(hef, configure_params).value().at(0);
auto start_time = std::chrono::high_resolution_clock::now();
hailo_status status = vstreams.infer(input_views, output_views, 1);
auto end_time = std::chrono::high_resolution_clock::now();
Do you know why this is happening and how I can improve the inference time?
The CLI is much faster because it uses an optimized async pipeline that overlaps transfers with inference and pre-maps buffers. Your C++ code is doing blocking calls with overhead on every frame.