Speed optimization for custom Model (c++)

paul.baeder · September 25, 2024, 12:42pm

Hello,

I have custom model that i parsed to an hef and i am using the Falcon H8C. I follwed the c++ examples on github and and tried vstreams, raw streams and async, but it runs only half as fast as it does with onnx-runtime on all of them.

My application is a real-time-application where i take a variable amount of pictures and feed them to the model in regular intervals.

When i give the model a lot of Images at once in a different application , it is quite fast ,but the real-time application is a lot slower. Because of this i am thinking that there is an issue with my code, but i don’t really know what it could be.

I also checked the pcie-Slots and they seem fine.

The only other thing i know about that i can still try is increasing the compressing_level when parsing. Is there anything i am missing?

With kind regards

Paul

omria · September 25, 2024, 1:17pm

Hey @paul.baeder,

Welcome to the Hailo Community!

To help you more effectively, could you provide more details about your code and setup? The situation you’re describing could have various causes and solutions. Here are some general suggestions you can try:

Increase batch size: Process small batches of images instead of single images in real-time.
Check asynchronous inference: Ensure your asynchronous processing is correctly set up to maximize throughput.
Optimize preprocessing: Make sure your preprocessing steps are as efficient as possible.
Investigate I/O bottlenecks: Profile your PCIe data transfer to ensure it’s operating efficiently.
Adjust compression level: Experiment with different compression levels during model parsing.
Tune for power/performance: Verify that your Falcon H8C is in a mode optimized for real-time inference.

If you’ve already implemented these steps and are still experiencing slowdowns, consider profiling your entire pipeline to identify specific bottlenecks.

If you can share your code or any error messages you’re encountering, we’ll be able to provide more targeted assistance. Don’t hesitate to provide additional information - we’re here to help!

paul.baeder · September 25, 2024, 2:13pm

Hey @omria ,

thank you for the reply. About the points you mentioned:

if i give the Model a bigger batch of Images at once ,it is faster per Image, but in my application the batch_size is variable, but i think this is fine.
My async Inference is roughly the same speed as Inference with Vstreams, so i think this is fine, unless it is supposed to be much faster.
My Inputs are 256x256 CV_8UC1 Grayscales. I am not really sure if it makes sense to preprocess these in any way
I found another forum post about this and checked mine:

Bildschirmfoto vom 2024-09-25 16-00-32785×29 4.54 KB

Bildschirmfoto vom 2024-09-25 15-59-57785×29 4.91 KB

, but i don’t know how to profile PCIe data transfer so i haven’t checked this in depth. Can u tell me how to do this?
I have not done this yet so i will most likely try this next.
Is this referring to the power mode in the network group parameters? :

or do i have to configure this somewhere else aswell?

I will probably try 4. and 5. and if it doesn’t help i will write here again.

omria · September 29, 2024, 1:15am

Hey @paul.baeder,

Thanks for the additional information. Here are some targeted suggestions to improve your inference performance:

Mini-Batch Processing:

Buffer a few frames and process in small batches
Balances throughput with real-time requirements

Asynchronous Inference Optimization:

Verify correct implementation
Ensure I/O and computation overlap for parallel processing

Streamline Preprocessing:

For 256x256 grayscale images, minimize preprocessing
Optimize any necessary resizing or normalization

PCIe Profiling:

Use HailoRT profiling:

hailortcli run --profile /path/to/model.hef

Consider perf for additional PCIe metrics
Check BIOS for PCIe Gen 3 x4 configuration

Power Settings:

Enable high-performance mode on Falcon H8C:

hailo_network_group_params_t params;
params.power_mode = HAILO_POWER_MODE_HIGH_PERFORMANCE;

Disable system power-saving features

These adjustments should help you approach your desired real-time performance. Let me know if you need further assistance!

Best regards

paul.baeder · September 30, 2024, 1:02pm

hey @omria ,

i have an update:
I increased the compression level but it did not help with the speed of the model. I did hailortcli latency test and the results are as followed:

so there is a hardware latency of 1,63 ms. Im am not sure if this is a number as expected. I unfortunately didn’t find any PCIe or power saving settings in my BIOS.

the 2,32 ms i get here are similar numbers to what im am getting in my code.

I also did some tests on a different computer and the speed is the same there.

for some reason i can’t use hailortcli run with the option profile and i can’t find hailo_network_group_params_t in my Programm ( its seems like it is nowhere in any of the headers , there is only hailo_activate_network_group_params_t or hailo_configure_network_group_params_t). I have hailort 4.18. Is this maybe from a newer or older version, or is there something wrong with my installation?

With kind regards

omria · October 1, 2024, 1:22pm

Hey @paul.baeder

Thanks for the update. Let’s address your points:

Latency: Your reported hardware latency (1.63 ms) and total latency (2.32 ms) seem reasonable for your model and input size. While compression didn’t help, remember that speed can be affected by various factors like batch size and PCIe performance.
BIOS: If you can’t find PCIe or power-saving settings in your BIOS, they might not be exposed. However, since performance is consistent across machines, this is likely not the main issue.
HailoRT Profiling: For the hailortcli run --profile issue, ensure you’re using the latest Hailo tools version and that profiling is supported for your model. Consider reinstalling or updating HailoRT if problems persist.

API Structure: In HailoRT 4.18, use hailo_configure_network_group_params_t to set power mode:

hailo_configure_network_group_params_t config_params = {};
config_params.power_mode = HAILO_POWER_MODE_HIGH_PERFORMANCE;

// Use when configuring your network group
hailo_status status = hailo_configure_network_group(network_group, &config_params);

Let me know if these suggestions help or if you need further assistance!

Topic		Replies	Views
My model runs slower than expected General debug , optimization	1	647	July 17, 2024
Benchmark Performance General raspberry-pi , hailo8	7	756	October 9, 2024
Poor performance of Hailo8L and Rpi5 General raspberry-pi , performance	6	705	March 20, 2025
Problem with optimize model General	3	239	October 4, 2024
Slow hailomz compile General	5	78	April 1, 2025

Speed optimization for custom Model (c++)

Related topics