Where to quantize inputs

Hey @geoff

Welcome to the Hailo Community!

The mismatch you’re seeing is likely due to the input format defined in your HEF file. If it’s set to HAILO_FORMAT_TYPE_UINT8, the API expects quantized uint8 data, not float32. Here are a few ways to address this:

  1. Automatic Quantization: If you prefer HailoRT to handle quantization, you can configure your input stream to use FLOAT32 format when setting up virtual streams:
hailort::InputVStreamParams input_params;
input_params.format.type = HAILO_FORMAT_TYPE_FLOAT32;

This allows you to pass float32 data directly, and HailoRT will quantize it using the qp_scale and qp_zp values from your HEF file.
2. Manual Quantization: If you want to quantize inputs yourself, use the qp_scale and qp_zp parameters from your HEF file:

for (size_t i = 0; i < buffer_size; i++) {
    quantized_buffer[i] = (uint8_t)((input_buffer[i] / qp_scale) + qp_zp);
}
  1. Output Dequantization: If you need float32 results from quantized output:
for (size_t i = 0; i < output_size; i++) {
    dequantized_output[i] = (float32_t)(quantized_output[i] - qp_zp) * qp_scale;
}

Remember, when using async inference, ensure your buffer format matches the expected type (uint8 or float32).

The key is to either configure your model to handle float32 inputs through HailoRT’s virtual streams, or manually handle quantization if uint8 input is expected.

If you need more details on C++ API usage or setting up virtual streams, feel free to ask. Good luck with your implementation!