Hello again Hailo Community,
I have some generic questions and some specific ones regarding compiling a LightGlue model.
- Can I specify the separation between contexts or somehow help in that step? I find sometimes it takes a lot of time to automatically split the network into different contexts.
- Can we save the original onnx names in some form of metadata in the final hef for the associated inputs and outputs? I find the names in the hef file quite uninformative sometimes, like for example “crossattn_layer0_part2_scope1/conv2” and therefore it is difficult to determine what I should feed into each input, or how to interpret each output.
- Will compiling a Lightglue model for Hailo-10 be substantially easier than for Hailo-8? I guess that due to larger SRAM it can compute larger chunks in one shot and the dedicated DDR4 will allow much faster context-switching. In general, do you recommend more hailo10 for this kind of model as opposed to hailo8?
- For the first self-attention block in LightGLue, I saw some big drops in SNR in the precision conversion before the multiplication between the attention weights and the V tokens, and in the Layernorm+Gelu. Is this kind of expected? If so, what is recommended to mitigate it? Currently I am taking the inputs to those computations out of the hailo8 in 16bit precision and perform them on CPU. Because of that I have in total 54 sub-models to make the whole LightGLue model… I think this it too much and would like to find a cleaner and neat solution.
- In the model zoo I have seen some transformer models that use batchnorm instead of the typically used layernorm. What is the reason for that? is it just for speed purposes or also related to accuracy?
- Is it possible to reuse the compilation of a network to speed up the compilation of another one that basically has the same architecture but just different weights?
Thanks in advance
Hi @user714,
1. Context separation:
You might be able to influence this via context_switch_param(mode=enabled|disabled|allowed) in your model script. performance_param(compiler_optimization_level=max) and allocator_param(automatic_ddr=True, timeout=20m) could also help the compiler find better splits, though compilation time may increase significantly.
2. Original ONNX names in HEF:
It seems like the HEF may already store original names internally. There are HailoRT API calls like get_original_names_from_vstream_name() and get_vstream_name_from_original_name() that might help map between vstream names and your original ONNX names. Worth trying. In model scripts, from_tf('original/name') might also let you reference layers by their original names.
3. Hailo-10 vs Hailo-8 for LightGlue:
LightGlue isn’t currently in the Hailo Model Zoo, and we don’t have a validated compilation recipe for it
4. SNR drops in attention/LayerNorm+GELU:
This seems fairly typical for transformer quantization. Before splitting into 54 sub-models, you might want to try quantization_param(layer, precision_mode=a16_w16) to keep just those sensitive layers in 16-bit on the Hailo itself. Increasing optimization level (optimization_level=4), enabling finetune, and activation clipping could also help. Recent DFC versions (2025-01+) apparently added improved LayerNorm quantization support - might be worth upgrading if you haven’t already.
5. BatchNorm vs LayerNorm in Model Zoo:
BatchNorm is crucial for good quantization accuracy because it reduces activation ranges.
6. Reusing compilation for same architecture, different weights:
There’s an “automatic model script” (.auto.alls) generated after compilation that captures the compiler’s allocation decisions. You should be able to extract it (hailo har extract <HAR> --auto-model-script-path auto.alls) and apply it to another quantized HAR of the same architecture. This would presumably skip the search phase and speed things up considerably - though I haven’t verified this with models that only differ in weights.
Thanks,