Hello Hailo community,
I am currently working on compiling a custom LightGlue model (transformer-based image matching with self and cross-attention blocks). I am currently targeting the Hailo-8, but I am wondering if compiling this model will be easier and run significantly faster on the Hailo-10H.
I know there is no official recipe for LightGlue in the Model Zoo yet, but I would appreciate some technical opinions on this.
Since Hailo-10H is explicitly advertised for GenAI applications, I assume its architecture is better suited for the heavy matrix multiplications and attention mechanisms found in LightGlue.
If so, what are the concrete hardware or software compiler advantages of Hailo-10H over Hailo-8 for a model like this? Specifically:
- Hardware support for operations: Does the 10H have native/better acceleration for Operations like
BatchMatMul,Softmax, andLayerNorm, which are typically tricky to quantize and compile efficiently on Hailo-8? - Memory hierarchy: Does the dedicated DDR memory on the 10H help alleviate the SRAM bottlenecks caused by $N \times N$ attention activation maps, rather than just storing larger weights?
- Compiler maturity: Is the Hailo Dataflow Compiler inherently better equipped to map Transformer graphs to the 10H without requiring as many manual graph splits or workarounds?
Thanks in advance for any insights!