I have custom model that i parsed to an hef and i am using the Falcon H8C. I follwed the c++ examples on github and and tried vstreams, raw streams and async, but it runs only half as fast as it does with onnx-runtime on all of them.
My application is a real-time-application where i take a variable amount of pictures and feed them to the model in regular intervals.
When i give the model a lot of Images at once in a different application , it is quite fast ,but the real-time application is a lot slower. Because of this i am thinking that there is an issue with my code, but i don’t really know what it could be.
I also checked the pcie-Slots and they seem fine.
The only other thing i know about that i can still try is increasing the compressing_level when parsing. Is there anything i am missing?
To help you more effectively, could you provide more details about your code and setup? The situation you’re describing could have various causes and solutions. Here are some general suggestions you can try:
Increase batch size: Process small batches of images instead of single images in real-time.
Check asynchronous inference: Ensure your asynchronous processing is correctly set up to maximize throughput.
Optimize preprocessing: Make sure your preprocessing steps are as efficient as possible.
Investigate I/O bottlenecks: Profile your PCIe data transfer to ensure it’s operating efficiently.
Adjust compression level: Experiment with different compression levels during model parsing.
Tune for power/performance: Verify that your Falcon H8C is in a mode optimized for real-time inference.
If you’ve already implemented these steps and are still experiencing slowdowns, consider profiling your entire pipeline to identify specific bottlenecks.
If you can share your code or any error messages you’re encountering, we’ll be able to provide more targeted assistance. Don’t hesitate to provide additional information - we’re here to help!
thank you for the reply. About the points you mentioned:
if i give the Model a bigger batch of Images at once ,it is faster per Image, but in my application the batch_size is variable, but i think this is fine.
My async Inference is roughly the same speed as Inference with Vstreams, so i think this is fine, unless it is supposed to be much faster.
My Inputs are 256x256 CV_8UC1 Grayscales. I am not really sure if it makes sense to preprocess these in any way
I found another forum post about this and checked mine:
i have an update:
I increased the compression level but it did not help with the speed of the model. I did hailortcli latency test and the results are as followed:
so there is a hardware latency of 1,63 ms. Im am not sure if this is a number as expected. I unfortunately didn’t find any PCIe or power saving settings in my BIOS.
the 2,32 ms i get here are similar numbers to what im am getting in my code.
I also did some tests on a different computer and the speed is the same there.
for some reason i can’t use hailortcli run with the option profile and i can’t find hailo_network_group_params_t in my Programm ( its seems like it is nowhere in any of the headers , there is only hailo_activate_network_group_params_t or hailo_configure_network_group_params_t). I have hailort 4.18. Is this maybe from a newer or older version, or is there something wrong with my installation?
Latency: Your reported hardware latency (1.63 ms) and total latency (2.32 ms) seem reasonable for your model and input size. While compression didn’t help, remember that speed can be affected by various factors like batch size and PCIe performance.
BIOS: If you can’t find PCIe or power-saving settings in your BIOS, they might not be exposed. However, since performance is consistent across machines, this is likely not the main issue.
HailoRT Profiling: For the hailortcli run --profile issue, ensure you’re using the latest Hailo tools version and that profiling is supported for your model. Consider reinstalling or updating HailoRT if problems persist.
API Structure: In HailoRT 4.18, use hailo_configure_network_group_params_t to set power mode:
hailo_configure_network_group_params_t config_params = {};
config_params.power_mode = HAILO_POWER_MODE_HIGH_PERFORMANCE;
// Use when configuring your network group
hailo_status status = hailo_configure_network_group(network_group, &config_params);
Let me know if these suggestions help or if you need further assistance!