C++ Inference much slower than hailortcli run

Hi Hailo team,

I’m running a semantic segmentation model on Hailo-8 with a Raspberry Pi 5, using HailoRT 4.21, and I’m seeing much slower inference performance in my C++ code (~18.27 fps) compared to hailortcli(40.50 fps).

I’m measuring inference time only and setting power_mode to HAILO_POWER_MODE_ULTRA_PERFORMANCE:

auto configure_params = vdevice->create_configure_params(hef).value();
for (auto &[name, params] : configure_params) {
    params.power_mode = HAILO_POWER_MODE_ULTRA_PERFORMANCE;
}
auto network_group = vdevice->configure(hef, configure_params).value().at(0);
auto start_time = std::chrono::high_resolution_clock::now();
hailo_status status = vstreams.infer(input_views, output_views, 1);
auto end_time = std::chrono::high_resolution_clock::now();

Do you know why this is happening and how I can improve the inference time?

Thanks a lot for your help!

1 Like

Hey @Margarida,

Welcome to the community!

The CLI is much faster because it uses an optimized async pipeline that overlaps transfers with inference and pre-maps buffers. Your C++ code is doing blocking calls with overhead on every frame.

Quick fixes:

  1. Test raw performance first:
hailortcli run2 --mode raw_async set-net your_model.hef
  1. Switch to async API in C++:
// Pre-map buffers once
auto dma_input = vdevice->dma_map(input_raw_ptr, input_size).value();
auto dma_output = vdevice->dma_map(output_raw_ptr, output_size).value();

// Pipeline multiple inferences
std::vector<InferFuture> futures;
for (int i = 0; i < num_frames; i++) {
    futures.push_back(configured.run_async({dma_input}, {dma_output}));
}
for (auto &f : futures) { f.get(); }
  1. Use profiling to find bottlenecks:
hailortcli run example.hef --measure-stats --elem-fps

This should close the performance gap significantly. Let me know how it works out!