Regarding Benchmark FPS on Hailo8

Hello,

I am using a Hailo8 connected to my Desktop PC through PCIe.

Currently just trying to learn more about Hailo by running some examples from Model Zoo through CLI and other HailoRT CLI commands.

While trying some examples from the Model Zoo models such as YOLOv8n, I observed that after compiling the model using hailomz compile yolov8n –performance and using the HEF file to obtain benchmark, I get FPS of about 1036 which is similar to what is reported on HailoAI Model Explorer.

However, when I run the model using hailomz eval yolov8n –har yolov8n.har –hef yolov8n.hef –target hardware –batch-size 1, I get very low FPS in the range of 40-45 (inferred from the completion bar displaying img/s).

Upon looking at the device utilization through hailo monitor, I see that hailortcli benchmark has a 100% utilization while hailomz eval has only 15-20% utilization.

Could you please explain what is the difference between the two commands and the FPS obtained from them. And why the model doesn’t run with 100% utilization during hailomz eval?

Sorry if this has already been mentioned somewhere in the documentation. If so, please guide me where I can find the explanation in documentation?

My Desktop PC environment is as follows:

  • Ubuntu-20.04
  • Python-3.8
  • HailoRT-4.21.0
  • Hailo DFC-3.31.0
  • Hailo MZ-2.15.0

They are two different tools, each serving a distinct purpose. hailortcli run is designed to measure throughput, while hailomz eval measures model accuracy. Use hailortcli run when benchmarking performance and hailomz eval when verifying model accuracy.

Please correct me if I’m misunderstanding. Is it okay to assume the below:

  1. hailortcli benchmark/hailortcli run

This command will continuously run randomly generated data on “Hailo only” without considering any preprocessing/postprocessing pipeline. No data transfers to and from CPU/GPU.

Gives maximum possible FPS on Hailo when resource utilization(APU, control, memory, etc.) is pushed to max. Doesn’t care about accuracy, only maximum throughput.

Helps to investigate bottlenecks in model layers when resources are not a bottleneck.

  1. hailomz eval

This command will run data from a dataset on Hailo. So, it will have a data-loading pipeline, data preprocessing pipeline and post-processing pipeline running on CPU/GPU(?). There will be data transfers to and from Hailo.

Since data-loading, preprocessing, post-processing pipelines are involved, the FPS drops. But the FPS achieved using this command is the maximum possible(?) FPS you can get during inference on your PC+Hailo combination.