C++ Inference much slower than hailortcli run

Margarida · July 2, 2025, 11:25am

Hi Hailo team,

I’m running a semantic segmentation model on Hailo-8 with a Raspberry Pi 5, using HailoRT 4.21, and I’m seeing much slower inference performance in my C++ code (~18.27 fps) compared to hailortcli(40.50 fps).

I’m measuring inference time only and setting power_mode to HAILO_POWER_MODE_ULTRA_PERFORMANCE:

auto configure_params = vdevice->create_configure_params(hef).value();
for (auto &[name, params] : configure_params) {
    params.power_mode = HAILO_POWER_MODE_ULTRA_PERFORMANCE;
}
auto network_group = vdevice->configure(hef, configure_params).value().at(0);

auto start_time = std::chrono::high_resolution_clock::now();
hailo_status status = vstreams.infer(input_views, output_views, 1);
auto end_time = std::chrono::high_resolution_clock::now();

Do you know why this is happening and how I can improve the inference time?

Thanks a lot for your help!

omria · July 6, 2025, 9:44am

Hey @Margarida,

Welcome to the community!

The CLI is much faster because it uses an optimized async pipeline that overlaps transfers with inference and pre-maps buffers. Your C++ code is doing blocking calls with overhead on every frame.

Quick fixes:

Test raw performance first:

hailortcli run2 --mode raw_async set-net your_model.hef

Switch to async API in C++:

// Pre-map buffers once
auto dma_input = vdevice->dma_map(input_raw_ptr, input_size).value();
auto dma_output = vdevice->dma_map(output_raw_ptr, output_size).value();

// Pipeline multiple inferences
std::vector<InferFuture> futures;
for (int i = 0; i < num_frames; i++) {
    futures.push_back(configured.run_async({dma_input}, {dma_output}));
}
for (auto &f : futures) { f.get(); }

Use profiling to find bottlenecks:

hailortcli run example.hef --measure-stats --elem-fps

This should close the performance gap significantly. Let me know how it works out!

Topic		Replies	Views
Speed optimization for custom Model (c++) General hailort , hailo8	5	227	October 1, 2024
Poor performance of Hailo8L and Rpi5 General raspberry-pi , performance	6	971	March 20, 2025
Inference Performance Issue of Hailo-8L on RPi5 General	5	299	December 16, 2024
Which processes are required when only inference is performed? General hailort	3	38	June 29, 2025
Multithreading inference of two models on a single Hailo-8 General hailort , hailo8	6	231	January 2, 2025

C++ Inference much slower than hailortcli run

Related topics