Yes, the error is resolved when I don’t use 16-bit quantization.
However, I used 16-bit quantization to improve the SNR.
When I use the default 8-bit quantization, the output layer SNR is below 10.
By changing from 8-bit to 16-bit, the SNR looks fine.
I taught that 16-bit quantization was the easiest way to verify the model performance. Using this 16bit quantization to determine whether it operates at the highest performance.
Then follow the Accuracy Analysis Tool is the only way to improve SNR issues ?
16b is definitely an important tool in improving accuracy, but it has some limitations, and we also have additional tools that could be a better fit on certain situations.
Yes, I would start with applying to 16b only on the output layer. This is ok on most of the cases (unless applying NMS on the output)
Then, the Layer Analysis Tool is a good step, trying to identify what are the layers that induce the error (i.e. has low SNR), and try to understands what is the pathology of that error (e.g. extreme values, split values)