Hello Hailo community
I am trying to run a Lightglue model on Hailo 8, and although I managed to parse it by splitting it in 3 chunks or even at the level of self and cross attention blocks, I am having issues now with the accuracy. This topic is related to this previous one
Currently the final descriptors at the final 9-th transformer layer has an SNR of 7dB approximately. It is difficult to analyze in the profiler the whole network or even one of the 3 chunks in which I split the network due the large number of layers, but at this level or at the level of individual attention blocks I identified a big drop in SNR after a precision change layer, right after the softmax and before the matmul that combines the V values. In the “alls” script, I specified that the “matmul” layers should work with “a16_w16” precision, so I am not sure why this change in precision to 8 bits is done. According to the Hailo profiler, around 91% of the weights are in 16bit precision.
Here is the two immediate paths with their SNRs right before this matmul layer for the case of the first self-attention block:
ew_mult_softmax2 (34 dB) → precision_change6 (26.63 dB) ----> |
conv_feature_splitter1_2 (36.33 dB) -------------------------------------> | —> matmul6 (30dB)
Since there are 18 of these attention blocks sequentially chained in the whole network the final accuracy suffers substantially. Do you recommend some solutions?
I think I will try to compute the softmax and the following matmul in the host CPU in high precision to check if that at least can recover almost the original accuracy, what do you think? any other ideas?
Thanks a lot in advance