Why there is no correlation between latency and fps measurements?

while running ‘hailortcli benchmark yolov5m_wo_spp_60p.hef’ I get the following output in which there is no correlation between latency and fps measurements.

Starting Measurements…
Measuring FPS in hw_only mode
Network yolov5m_wo_spp_60p/yolov5m_wo_spp_60p: 100% | 3272 | FPS: 218.10 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the ovecurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network yolov5m_wo_spp_60p/yolov5m_wo_spp_60p: 100% | 3272 | FPS: 218.07 | ETA: 00:00:00
Measuring HW Latency
Network yolov5m_wo_spp_60p/yolov5m_wo_spp_60p: 100% | 699 | HW Latency: 12.84 ms | ETA: 00:00:00

=======
Summary

FPS (hw_only) = 218.106
(streaming) = 218.078
Latency (hw) = 12.8354 ms
Device 0000:01:00.0:
Power in streaming mode (average) = 5.30504 W
(max) = 5.3327 W

The Hailo8 works in a pipeline, meaning that we process several frames simultaneously.
The latency mentioned in your question is the process time of a single frame from the moment it enters the Hailo device to it output.
Since we are working in a pipeline usually you will find that the the FPS is higher than 1/latency.

To add a bit of color to Yanov’s answer, yes latency measures the time difference between the first unit of input data (in case of image, the first unit would be the first row of pixels) arrives to the Hailo 8 device and the first unit of output data leaves the Hailo 8 device to the host. The output data unit format depends on the type of model that is ran, if it is an object detection model, it is a structured list of detected objects, if it is a semantic segmentation network, the output data unit would be a row of the output image.

Therefore latency is the time for a unit data of data to go through all the layers of the model.

Intuitively, FPS should be the inverse of the latency. That’s the case only if each layer is processed sequentially. However, our Hailo-8 device pipelines the processing by running all the layers in parallel, each layer processing a different row at any given time. That’s why the FPS is in fact better than the inverse of the latency.

In summary, if t0 is the time when the first frame arrives at the edge of the Hailo-8 device:

1st frame output will appear at t0 + latency

2nd frame output will appear at time t0 + latency + 1/FPS

3rd frame output will appear at time t0 + latency + 1/FPS + 1/FPS

Etc

Note that data I/O is usually hidden behind the inference time which is 1/FPS .

1 Like