I have successfully converted my model from ONNX to HEF and deployed it on the Hailo8L. While the model runs without issues, the inference results are significantly worse compared to the ONNX outputs. The performance gap is much larger than expected.
I have already verified that the input format remains RGB throughout the process, so I’m unsure what could be causing such a drastic difference.
Could you help identify potential reasons for this discrepancy? Also, would it be possible for me to share both the ONNX and HEF models with you so you can check if there are any issues?
Hi @joy.yen
Did you evaluate the mask mAP on a validation dataset for both the float onnx and the hef? This will give a hint on how much is the actual degradation. While there can be some loss due to quantization, if the mAP loss is much higher, then either pre-processing or calibration or some compiler settings could be the issue.
Hi @shashi
I haven’t evaluated the mask mAP. The HEF evaluation method only provides FPS and latency. Does this mean that we need to implement our own evaluation method?
Hi @joy.yen
Yes, you should implement a box/mask map evaluation method. Since you trained the model by yourself, you should already have the eval script and validation dataset.
Hi @joy.yen
Hailo devices run quantized models. Hence their inputs and outputs are quantized. Each output tensor contains the quantization information so that you can convert it back to floating point values. The formula to convert quantized output to float is float_value = scale*(quant_value-zero_point).
hi @shashi
But my output is uint8 in the range of 0-255, and I can’t output any floating-point values. Is there any way to make Hailo output the original model output instead of uint8 pixel values?
Hi @joy.yen
In your original model, what do the output values mean? You mentioned last layer is sigmoid. Can you share some more details on output tensor size and its interpretation.