Speed optimization for custom Model (c++)

Hello,

I have custom model that i parsed to an hef and i am using the Falcon H8C. I follwed the c++ examples on github and and tried vstreams, raw streams and async, but it runs only half as fast as it does with onnx-runtime on all of them.

My application is a real-time-application where i take a variable amount of pictures and feed them to the model in regular intervals.

When i give the model a lot of Images at once in a different application , it is quite fast ,but the real-time application is a lot slower. Because of this i am thinking that there is an issue with my code, but i don’t really know what it could be.

I also checked the pcie-Slots and they seem fine.

The only other thing i know about that i can still try is increasing the compressing_level when parsing. Is there anything i am missing?

With kind regards

Paul

Hey @paul.baeder,

Welcome to the Hailo Community!

To help you more effectively, could you provide more details about your code and setup? The situation you’re describing could have various causes and solutions. Here are some general suggestions you can try:

  1. Increase batch size: Process small batches of images instead of single images in real-time.
  2. Check asynchronous inference: Ensure your asynchronous processing is correctly set up to maximize throughput.
  3. Optimize preprocessing: Make sure your preprocessing steps are as efficient as possible.
  4. Investigate I/O bottlenecks: Profile your PCIe data transfer to ensure it’s operating efficiently.
  5. Adjust compression level: Experiment with different compression levels during model parsing.
  6. Tune for power/performance: Verify that your Falcon H8C is in a mode optimized for real-time inference.

If you’ve already implemented these steps and are still experiencing slowdowns, consider profiling your entire pipeline to identify specific bottlenecks.

If you can share your code or any error messages you’re encountering, we’ll be able to provide more targeted assistance. Don’t hesitate to provide additional information - we’re here to help!

Hey @omria ,

thank you for the reply. About the points you mentioned:

  1. if i give the Model a bigger batch of Images at once ,it is faster per Image, but in my application the batch_size is variable, but i think this is fine.

  2. My async Inference is roughly the same speed as Inference with Vstreams, so i think this is fine, unless it is supposed to be much faster.

  3. My Inputs are 256x256 CV_8UC1 Grayscales. I am not really sure if it makes sense to preprocess these in any way

  4. I found another forum post about this and checked mine:



    Bildschirmfoto vom 2024-09-25 16-00-47
    , but i don’t know how to profile PCIe data transfer so i haven’t checked this in depth. Can u tell me how to do this?

  5. I have not done this yet so i will most likely try this next.

  6. Is this referring to the power mode in the network group parameters? :
    Bildschirmfoto vom 2024-09-25 16-06-05
    or do i have to configure this somewhere else aswell?

I will probably try 4. and 5. and if it doesn’t help i will write here again.

Hey @paul.baeder,

Thanks for the additional information. Here are some targeted suggestions to improve your inference performance:

  1. Mini-Batch Processing:
  • Buffer a few frames and process in small batches
  • Balances throughput with real-time requirements
  1. Asynchronous Inference Optimization:
  • Verify correct implementation
  • Ensure I/O and computation overlap for parallel processing
  1. Streamline Preprocessing:
  • For 256x256 grayscale images, minimize preprocessing
  • Optimize any necessary resizing or normalization
  1. PCIe Profiling:
  • Use HailoRT profiling:
hailortcli run --profile /path/to/model.hef
  • Consider perf for additional PCIe metrics
  • Check BIOS for PCIe Gen 3 x4 configuration
  1. Power Settings:
  • Enable high-performance mode on Falcon H8C:
hailo_network_group_params_t params;
params.power_mode = HAILO_POWER_MODE_HIGH_PERFORMANCE;
  • Disable system power-saving features

These adjustments should help you approach your desired real-time performance. Let me know if you need further assistance!

Best regards

hey @omria ,

i have an update:
I increased the compression level but it did not help with the speed of the model. I did hailortcli latency test and the results are as followed:

image

so there is a hardware latency of 1,63 ms. Im am not sure if this is a number as expected. I unfortunately didn’t find any PCIe or power saving settings in my BIOS.

the 2,32 ms i get here are similar numbers to what im am getting in my code.

I also did some tests on a different computer and the speed is the same there.

for some reason i can’t use hailortcli run with the option profile and i can’t find hailo_network_group_params_t in my Programm ( its seems like it is nowhere in any of the headers , there is only hailo_activate_network_group_params_t or hailo_configure_network_group_params_t). I have hailort 4.18. Is this maybe from a newer or older version, or is there something wrong with my installation?

With kind regards

Hey @paul.baeder

Thanks for the update. Let’s address your points:

  1. Latency: Your reported hardware latency (1.63 ms) and total latency (2.32 ms) seem reasonable for your model and input size. While compression didn’t help, remember that speed can be affected by various factors like batch size and PCIe performance.

  2. BIOS: If you can’t find PCIe or power-saving settings in your BIOS, they might not be exposed. However, since performance is consistent across machines, this is likely not the main issue.

  3. HailoRT Profiling: For the hailortcli run --profile issue, ensure you’re using the latest Hailo tools version and that profiling is supported for your model. Consider reinstalling or updating HailoRT if problems persist.

  4. API Structure: In HailoRT 4.18, use hailo_configure_network_group_params_t to set power mode:

    hailo_configure_network_group_params_t config_params = {};
    config_params.power_mode = HAILO_POWER_MODE_HIGH_PERFORMANCE;
    
    // Use when configuring your network group
    hailo_status status = hailo_configure_network_group(network_group, &config_params);
    

Let me know if these suggestions help or if you need further assistance!