What is the meaning of Send/Recv rate in Hailortcli run?


Dear all members,

I have a question regarding the meaning of the Send/Recv rate when measuring performance using the hailortcli run *.hef command-line tool.

I ran the MSPN.hef model downloaded from Hailo’s official benchmark page.
However, when I executed it, the results looked like this:

Compared to the official benchmark numbers, the performance degraded significantly — from 2132 to 792, as shown in the image.

This may be due to our system setup: we are using a Raspberry Pi 5 with a PCIe 2.0 x1 lane, which might be the cause of the performance drop.

But my main question is: what exactly do the Send and Recv rates represent?

At first, I thought these rates simply reflected the bandwidth usage over the PCIe interface.
However, even though the model’s performance dropped significantly, the measured Send/Recv rates didn’t seem to reach the theoretical bandwidth limit of PCIe 2.0 x1 — which is 4000 Mbit/s (or 500 MB/s).

This leads me to believe that there might be other data being transmitted that isn’t fully captured in the current measurements.
If anyone has expertise in this area and can shed some light on it, I would really appreciate your input.

My current hypothesis is that bandwidth usage related to model weights or other metadata may not be included in the Send/Recv measurements. (Only BW rate for inference was measured…?)
If someone could suggest a method to validate this hypothesis, that would be incredibly helpful.

Thank you for reading, and I hope you have a wonderful day.

Best regards,
Jusung Kang

===============================================

===============================================

Hey @happistday,

Thank you for your detailed question about the Send/Recv rates when using the hailortcli run command. Let me clarify what these metrics represent and explain the performance difference you’re observing.


Send Rate represents the bandwidth used to send input data (tensors) from the host to the device during inference.
Recv Rate represents the bandwidth used to receive output data (tensors) from the device back to the host.

These rates are application-level I/O metrics that specifically capture:

  • Data movement through the HailoRT API during inference

Important Exclusions:

  • Firmware traffic
  • Model loading processes
  • Weights or metadata transfer
  • Internal control messaging

The difference in performance you observed is mainly due to hardware configuration:

  1. You’re using PCIe Gen2.0, which has lower bandwidth compared to Gen3
  2. You’re on a single-lane (x1) setup, while the benchmark system uses two lanes (x2) — which is also the recommended setup for optimal throughput

To illustrate the impact, here are comparison results from running yolov8m.hef on my systems:


:laptop: x86 Platform (PCIe Gen3 x2)

Batch Size Frames Per Second (FPS) Send Rate Recv Rate
1 45 ~441 Mbit/s ~439 Mbit/s
8 110 ~1083 Mbit/s ~1076 Mbit/s

:strawberry: Raspberry Pi 5 (PCIe Gen3 x1)

Batch Size Frames Per Second (FPS) Send Rate Recv Rate
1 29 ~286 Mbit/s ~284 Mbit/s
8 83 ~816 Mbit/s ~811 Mbit/s

:test_tube: Raspberry Pi 5 (PCIe Gen2 x1)

Batch Size Frames Per Second (FPS) Send Rate Recv Rate
1 14.98 ~147 Mbit/s ~146 Mbit/s
8 43.19 ~424 Mbit/s ~421 Mbit/s

As you can see, the jump from Gen2 to Gen3, or from x1 to x2 lanes, can significantly affect inference performance — especially at higher batch sizes where PCIe bandwidth becomes a bottleneck.

1 Like

Thank you for your kind comments!

That’s exactly what we want to know — what kinds of traffic are not counted.

The problem is that, in order to compare performance between experiments, we need information about what was excluded.

Our question is: is there any way to check the traffic that was excluded?

The key point is that we want to understand the PCIe bandwidth usage during our experiments.

Again, thank you for your kind comments, and have a great day.

Best regards,
Jusung Kang