What Layers are Quantized by Default by DFC

Dear community,

Do you know what data type the input layer (layer 1) and output layers (could be between 42 - 69) have after the quant + compilation with DFC? Does it automatically quantized to UINT8 or do they stay as FLOAT32 (or any floating point numbers)?

I’m asking this because ideally, we quantize every other layer in the model but not the input and output layers. Nevertheless, this is only true if they are in floating point numbers, otherwise, if they are quantized to UINT8, I’d rather quantize them too into UINT16.

Thanks for this great community and looking forward to your input and suggestions!

Hey @nino,

When using the Dataflow Compiler (DFC), input and output layers are typically quantized to UINT8 by default. The DFC handles this quantization using two floating-point values for each layer: qp_scale and qp_zp (zero point).

For input layers, the data is quantized by dividing it by qp_scale and adding qp_zp. Output layers are de-quantized by subtracting qp_zp and then multiplying by qp_scale.

If you want to keep certain layers (like input and output) in FLOAT32, you can set this explicitly during model compilation. Use the HAILO_FORMAT_TYPE_FLOAT32 flag to prevent quantization on specific layers, maintaining floating-point precision where needed.

If your input and output layers are already in FLOAT32, there’s no need to quantize them to UINT16 unless it better suits your application. You have full control over layer quantization and can adjust based on your requirements.

Let me know if you need any clarification on configuring this!

Best regards

1 Like

Thanks @omria!

Do you have an example code or command line with the arguments to execute the concepts that you mentioned?

Maybe I need your opinion about this. The digital camera that I use can only provide UINT10 and the d GStreamer pipeline only converts RGB to UINT12. My app losing 4bits precision in any case. Nevertheless I would like the model to take the best possible input type for more features e.g. UINT16 rather than UINT8, and FLOAT16 rather than UINT16 or UINT8. Is this possible?

Thank you in advance!