Inference output dtype leads to different results

I’ve been trying to bring a simple CNN example into the Hailo8. Optimization and compilation is targeted for UINT8 dtype inference.

When I test the model in inference by using InferVStreams(), I can somehow decide which dtype to use for both Input and Output VStream.

Flipping the input vstream between Float32 and Uint8 give me numerically the same results.

Flipping the output vstream between Float32 and Uint8 dtype though, gives me a massive impact in the result. The Uint8 type result carries an offset/gain error inside the 8bit range.

PSNR(Float32 inference vs Hailo8 inference results):

Output VStream format_type==FormatType.Float32: 45.089 dB
Output VStream format_type==FormatType.UINT8: 27.667 dB

How can I correct the offset/gain error of the UINT8 quantized output data?

How do you dequantize the results? Are you using the output.quant_info.qp_scale and output.quant_info.qp_zp?

That should give you the same result.

The option to get quantized results is provided in case you have a weak CPU and want to filter data before dequantization, allowing you to save a few operations. When you choose float, the dequantization is done by HailoRT on the host, so the result should be the same either way.

Hi Klaus,

I wasn’t aware of this at all and didn’t get things together in my head. Now it makes perfect sense. Thank you for the quick response.

So you are saying, when input/output tensors are dtype=uint8, the PCIe datasize and buffer size inside Hailo8 device are fixed and the type of the VStreams are just for convenience and automatically applying conversions when set to float?

Any chance to move some of these operations to the Hailo device? I would do something like:

img_out = ((infer_result.astype(np.float) - qp_zp) * qp_scale*255).clip(0, 255).astype(np.uint8)

Can’t such normalizations be part of the accelerator? And besides, the total usage of the 8bit scale is less than 50% in my case due to over-/undershoots that will later be clipped. Can this be handled by the optimizer in some way?

/André

We do support input normalization. See Hailo Dataflow Compiler User Guide.

Our architecture is based on integer computation only. Therefore dequantizing the output has be done on the host.

Yes.

We also support 16-bit integer. This will allow you have a greater input and output data range.

Thank you for clarification!

However, it would be technically possible to include a final de-normalization stage in your toolchain to address this problem.

The qp_scale and qp_zp is known by your tools. You only need an adjustable output stage, how to de-normalize, or de-quantize if you will your final data to result in a defined value range. Just like how you do it with pre-normalization, but with post-normalization instead.

IMHO, this could be done on the Hailo-Chip.

HailoRT will dequantize the data for you if you chose the right output data type. It is done on the host.

I am certain our chip designers investigated the pros and cons of implementing this functionality on chip vs leaving this task to the host.
It is usually not an issue for the customers I have talked to.

Is this important for your use case and why? Are you pushing your host to the limit or … ?

Klaus,

I don’t have a special use-case for now. I just see room for improvement here.

In case the Hailo-8 shall supersize an image to a large image (don’t know the limitations here), e.g. to 4096x4096x3 pixels, the host would have to apply 50331648 mul/add operations to convert uint8 to uint8 in a different data range. To use uint16 here seems like a workaround.

I believe it would fit better into the Hailo-8 device as final acceleration step. I leave it to you to decide if it’s a valid statement. I don’t do benchmarks now, but may come back to it sometime soon.

But thank you for clarification of the details!

/André