How to Measure the TOPs consumed by the AI Model

surya_yelamanchili · October 31, 2025, 11:20am

I have deployed the AI model on the Raspberry-pi and now i need to measure the TOPs consumed by the AI Model.

As i am new to the Hailo I need the help to complete the Measure the TOps

KlausK · October 31, 2025, 11:38am

You cannot directly measure the TOPS of a model running on a Hailo device. The architecture is similar to an FPGA, with compute resources distributed across the device, rather than a CPU with cycle counters.

However, you can generate a profiler report for your model, which provides the number of operations (OPS) per input tensor. To do this, follow the built-in tutorials in the Hailo AI Software Suite. Inside the Docker container, run the following command:

hailo tutorial

The usage of the profiler is demonstrated in the DFC 3 Compilation tutorial.

For models available in the Model Zoo, you can download the corresponding Profiler Report directly from our GitHub page by clicking the PR link. For example:

GitHub - Hailo Model Zoo - Hailo-8 Object Detection

You can measure the maximum FPS of a compiled model on your platform using the HailoRT CLI. Simply run:

hailortcli run model.hef

With this information, you can calculate the maximum TOPS for your setup. Keep in mind that the value may change if your application runs at a lower FPS or on a different host, for example, one with more PCIe lanes.

surya_yelamanchili · October 31, 2025, 11:51am

can i get some more detailed way for better understanding it

surya_yelamanchili · October 31, 2025, 11:56am

On the terminal I got this can get some clarity on this so that i can understand it
HW-only FPS : 61.863900
Peak TOPS (HW-only) : 1.232915655
Streaming FPS : 0.353611
Streaming TOPS : 0.007047285

KlausK · November 10, 2025, 9:52pm

hw_only mode runs the model without dequantizing the data, therefore removing influence of the host CPU performance.

Streaming mode runs the model including dequantization of data and NMS when included in the HEF. Because these run on the host CPU this could be lower than hw_only mode.
In your case the number is unexpectedly low. Is this from the same model?

If your model is compiled to multiple contexts you can achieve a higher FPS/TOPS by using the --batch-size parameter, because it will reduce the switching overhead at the cost of latency. You can find out whether your model is single or multiple context by running the following command:

hailortcli parse-hef model.hef
hailortcli run model.hef --batch-size 8