the recommended input size resolution for Hailo 8

Hi.

I’m working with Hailo-8 and experimenting with semantic segmentation models.

I noticed that while some models can be compiled and run at 640 x 640 input resolution, others fail during HEF compilation due to context/mapping issues.

Currently, I’m comparing PIDNet-M and Segformer0/1 models.

What I understand so far (please correct me if wrong)

From my experiments, it seems that:

  • The limiting factor is not host RAM or GPU memory

  • But rather:

    • activation peak size

    • control graph complexity

    • inter-context routing / internal field limits

  • Especially models with multi-scale decoder + large concat operations (e.g. SegFormer) scale poorly with input resolution

On the other hand, CNN-based models like PIDNet, which use add-based fusion and smaller activation peaks, seem more Hailo-friendly at higher resolutions.

So my question is;

1.Input resolution vs memory allocation

  • How exactly does input resolution affect:

    • on-chip memory allocation

    • control graph size

    • and context partitioning?

  • Is the growth closer to linear, quadratic, or model-structure dependent?

2.Why 640×640 works for some models but not others

  • Is there a rule of thumb for why PIDNet-M @ 640×640 can sometimes compile,
    while SegFormer-B0/B1 cannot, even though parameter counts may be similar?

  • Is activation peak size the dominant factor here?

3.Best practices for HEF compilation at higher resolutions

  • Are there recommended strategies when trying to push input resolution higher?

4.Does higher resolution affect the inference performance results? If so, how much does it influence the result?

Hi @Seungdae_Kim,

  1. Accurate insights.
    1. Resolution affects the intermediate buffers between layers, and also in the buffer size from host to the Hailo device. For larger tensors, we need to use more internal memory blocks and this affects the control graph.
    2. It really depends where the bottleneck is, but if it’s shear compute it will be linear on the amount of pixels.
  2. Segformer have an inner mat-mul that grows quadratric on the input resolution, and might require special (some alls commands) handling.
  3. This is also application dependant.
    1. Tilling - Breaking up the input to multiple tiles feeding each one to the NNcore. Works great in detecting small objects, e.g. from a drone or satellite.
    2. Identifying if there is a layer that consumes large amount of resources (mem/control/compute) that needs a special care.
    3. If the resolution is not standard, e.g. very long or narrow, try flipping by 90 degrees.
  4. As I written above, linear by the pixel count
1 Like