How can I dequantize the result of detr_resnet?

I would like to run the detr_resnet_v1_50 model from the Hailo Model Zoo on a Hailo-8 device.
I’ve noticed that both the input and output of this model are quantized.

While I suspect the model will handle the dequantization of the input RGB image, it seems that I need to manually dequantize the output.

Could you please explain how to dequantize the model’s output?

Hey @PhilKyue_Shin,

Welcome to the Hailo Community!

I took a look at your HEF file and ran it through the parser to check the input/output layers:

hailortcli parse-hef detr_resnet_v1_18_bn.hef 
Architecture HEF was compiled for: HAILO8
Network group name: detr_resnet_v1_18_bn, Multi Context - Number of contexts: 7
    Network name: detr_resnet_v1_18_bn/detr_resnet_v1_18_bn
        VStream infos:
            Input  detr_resnet_v1_18_bn/input_layer1 UINT8, NHWC(800x800x3)
            Output detr_resnet_v1_18_bn/conv113 UINT16, FCR(1x100x92)
            Output detr_resnet_v1_18_bn/conv116 UINT16, FCR(1x100x4)

For handling the quantized outputs, I’d recommend going with Option A - it’s much cleaner and lets HailoRT handle all the dequantization math for you:

Option A (What I’d recommend): Let HailoRT do the heavy lifting

Just set your output VStreams to FLOAT32 format and HailoRT will automatically apply the quantization parameters from your HEF file:

import numpy as np
from hailo_platform import (HEF, VDevice, ConfigureParams,
                           InputVStream, OutputVStream, FormatType)

hef_path = "detr_resnet_v1_18_bn.hef"
hef = HEF(hef_path)

with VDevice() as vdev:
    configure_params = hef.create_configure_params(vdev)
    net_group = vdev.configure(hef, configure_params)

    in_params  = net_group.create_input_vstream_params()
    out_params = net_group.create_output_vstream_params()

    # This is the key part - ask HailoRT to dequantize automatically
    for p in out_params.values():
        p.format.type = FormatType.FLOAT32

    with InputVStream(net_group, in_params) as in_vs, \
         OutputVStream(net_group, out_params) as out_vs:

        # Send your preprocessed 800x800x3 uint8 input
        # in_vs.write(input_u8)

        # Read the outputs - they're already dequantized!
        logits  = out_vs["detr_resnet_v1_18_bn/conv113"].read()  # (1,100,92), float32
        boxes   = out_vs["detr_resnet_v1_18_bn/conv116"].read()  # (1,100,4),  float32

Then you can do your standard DETR post-processing:

  • For logits: apply softmax across the 92 classes (the last class is typically “no-object” in our zoo models)
  • For boxes: they’re in [cx, cy, w, h] format normalized to [0,1]. Apply sigmoid if needed (though many of our zoo exports already handle this), then scale by your image dimensions and convert to [x0, y0, x1, y1] if that’s what you need.

Option B: Manual dequantization (if you want more control)

If you prefer to handle the quantization yourself (maybe for bandwidth reasons), you can stick with UINT16 and do the conversion manually:

import numpy as np
from hailo_platform import (HEF, VDevice, ConfigureParams,
                           InputVStream, OutputVStream, FormatType)

hef = HEF("detr_resnet_v1_18_bn.hef")

with VDevice() as vdev:
    net_group = vdev.configure(hef, hef.create_configure_params(vdev))

    in_params  = net_group.create_input_vstream_params()
    out_params = net_group.create_output_vstream_params()

    # Get the quantization parameters for each output
    qinfo_cls = out_params["detr_resnet_v1_18_bn/conv113"].quant_info
    qinfo_box = out_params["detr_resnet_v1_18_bn/conv116"].quant_info

    # Note: attribute names might vary by HailoRT version
    scale_cls = getattr(qinfo_cls, "scale", getattr(qinfo_cls, "qp_scale"))
    zero_cls  = getattr(qinfo_cls, "zero_point", getattr(qinfo_cls, "qp_zp"))
    scale_box = getattr(qinfo_box, "scale", getattr(qinfo_box, "qp_scale"))
    zero_box  = getattr(qinfo_box, "zero_point", getattr(qinfo_box, "qp_zp"))

    with InputVStream(net_group, in_params) as in_vs, \
         OutputVStream(net_group, out_params) as out_vs:

        # Read raw quantized outputs
        logits_q = out_vs["detr_resnet_v1_18_bn/conv113"].read()  # (1,100,92), uint16
        boxes_q  = out_vs["detr_resnet_v1_18_bn/conv116"].read()  # (1,100,4),  uint16

        # Manual dequantization using: float_value = scale * (q_value - zero_point)
        logits = scale_cls * (logits_q.astype(np.float32) - float(zero_cls))
        boxes  = scale_box * (boxes_q.astype(np.float32) - float(zero_box))

Honestly, I’d go with Option A unless you have a specific reason to handle the quantization manually. It’s cleaner and less error-prone.

Hope this helps!

Thank you for your suggestions.

I have tried Option A, but it did not work as expected. I am using HailoRT C++, and here is the code I used to set the output format to FLOAT32. However, the output type remained UINT16.

auto configure_params = hef.create_configure_params(HAILO_STREAM_INTERFACE_PCIE);
if (!configure_params) {
    return hailort::make_unexpected(configure_params.status());
}

auto network_groups = vdevice.configure(hef, configure_params.value());
if (!network_groups) {
    return hailort::make_unexpected(network_groups.status());
}

if (1 != network_groups->size()) {
    std::cerr << "Invalid amount of network groups" << std::endl;
    return hailort::make_unexpected(HAILO_INTERNAL_FAILURE);
}

auto& configured_ngs = *network_groups;

auto output_params = configured_ngs[0]->make_output_vstream_params(false, HAILO_FORMAT_TYPE_AUTO, 1000, HAILO_DEFAULT_VSTREAM_QUEUE_SIZE);

auto& output_params_map = *output_params;

for(auto it: output_params_map)
    it.second.user_buffer_format.type = HAILO_FORMAT_TYPE_FLOAT32;

return std::move(network_groups->at(0));

Additionally, I tested Option B. This attempt resulted in invalid output, where the dequantized bounding box coordinates were negative, which should not be possible.

Here is a snippet of the output:

box_scale, box_zero_point: 0.00105664, 7177
output_boxes 0, 0: 3514
output_boxes 0, 1: 5445
output_boxes 0, 2: 4188
output_boxes 0, 3: 5461
dequantized output_boxes 0, 0: -3.87047
dequantized output_boxes 0, 1: -1.8301
dequantized output_boxes 0, 2: -3.15829
dequantized output_boxes 0, 3: -1.81319
0: output_classes: 67, output_scores: 1, output_locations: [-2.29132,-0.923502,-5.44961,-2.73669]
output_boxes 1, 0: 7857
output_boxes 1, 1: 5237
output_boxes 1, 2: 5546
output_boxes 1, 3: 5260
dequantized output_boxes 1, 0: 0.718514
dequantized output_boxes 1, 1: -2.04988
dequantized output_boxes 1, 2: -1.72338
dequantized output_boxes 1, 3: -2.02558
1: output_classes: 67, output_scores: 1, output_locations: [1.5802,-1.03709,-0.143174,-3.06267]
output_boxes 2, 0: 4005
output_boxes 2, 1: 6831
output_boxes 2, 2: 4696
output_boxes 2, 3: 7736
dequantized output_boxes 2, 0: -3.35166
dequantized output_boxes 2, 1: -0.365597
dequantized output_boxes 2, 2: -2.62152
dequantized output_boxes 2, 3: 0.590661