.hef fails on unet multiclass segmentation

Pedro_Rosito · March 20, 2026, 8:15pm

Hi to all, i’m currently working with a custom unet model for semantic segmentation in a raspberry pi 5 with an AI HAT + (HAILO 8). The model has to segment 4 classes in an industrial image. I have the .onnx trained that works fine and also I managed to get de .har quantized working pretty well too (just with an acceptable error margin) using the hailo dataflow compiler. When I compile the .hef file and do the inference in my raspberry pi with the ai hat, the results are totally wrong and I can’t understand why.

I’m doing exactly the same preprocess method to prepare the image for .har quantized and for the .hef. In both cases i’m doing the normalization outside the .alls. I think that the .hef has something wrong or i’m not understanding correctly how to work with the output of the inference method.

This is the preprocess method i’m using:


    img_pil = Image.open(image_path).convert("RGB")
    img_original = np.array(img_pil)
   

    img_resized = img_pil.resize(target_size, Image.BILINEAR)
    img_resized_np = np.array(img_resized, dtype=np.float32)  # (H, W, 3), 0-255
    
    img_float = img_resized_np / 255.0  # (H, W, 3), [0,1]
    
    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
    std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
    img_normalized = (img_float - mean) / std  # (H, W, 3)
    
    return img_normalized, img_original

This is the postprocess method i’m using:

    output_dict = result.results[0]
    
    tensor_uint8 = output_dict['data']  # Shape: (1, 1048576, 4)
    scale = output_dict['quantization']['scale'][0]
    zero = output_dict['quantization']['zero'][0]
    
    tensor_float = scale * (tensor_uint8.astype(np.float32) - zero)
    tensor_reshaped = tensor_float.reshape(1024, 1024, 4)
    
    tensor_normalized = np.zeros_like(tensor_reshaped)
    for c in range(4):
        channel = tensor_reshaped[:, :, c]
        mean = channel.mean()
        std = channel.std()
        if std > 0:
            tensor_normalized[:, :, c] = (channel - mean) / std
        else:
            tensor_normalized[:, :, c] = channel - mean

    mask_zscore = np.argmax(tensor_normalized, axis=-1)

This is the output of the inference method:

[{'data': array([[[112,  94,  86,  83],
        [120,  97,  82,  71],
        [121,  95,  81,  65],
        ...,
        [133, 102,  82,  67],
        [130, 101,  83,  69],
        [119, 106,  90,  73]]], shape=(1, 1048576, 4), dtype=uint8), 'id': 0, 'name': 'unet/conv31', 'quantization': {'axis': -1, 'scale': [0.1700144112110138], 'zero': [99]}, 'shape': [1, 1048576, 4], 'size': 4194304, 'type': 'DG_UINT8'}]

Any help is appreciated.

Thanks in advance!

Michael · March 20, 2026, 9:37pm

Hi @Pedro_Rosito,

Your issue is likely in the postprocessing - the per-channel z-score normalization you’re applying to the output logits could be distorting the class predictions, since it rescales each class independently and destroys the relative magnitudes the model learned. You might want to try removing that normalization step entirely and just doing np.argmax(tensor_reshaped, axis=-1) directly on the dequantized logits.

Also, it might be worth checking your input format - HEF inference often expects uint8 [0–255] input with normalization baked into the model’s quantization, so you may want to try feeding the raw resized uint8 image instead of the normalized float version, depending on how your ALLS was configured during compilation.

Thanks,

Pedro_Rosito · March 23, 2026, 5:35pm

Hi @Michael thanks for your suggestions. I’ve followed them and the inference now is almost ok. The problem i’m having now is that the segmentation masks that i’m getting with the .hef in the raspberry with the ai hat, have a bigger error than those obtained with the .har quantized, in fact one of the four classes (that has a lower pixel representation) is not being detected at all. Any suggestions in that regard?

Thanks again!

Michael · March 24, 2026, 8:05pm

Hi @Pedro_Rosito,

Worth trying to compile the HEF with higher precision for the output layer by using model_optimization_config(batch_size=X, compression_level=0) or marking specific layers as sensitive in your ALLS script (e.g., quantization_param([your_output_layer], precision_mode=a16_w16) ).
It could be worth experimenting with increasing calibration dataset size and ensuring it has good representation of that class.

Thanks,

Pedro_Rosito · March 25, 2026, 3:54pm

Hi @Michael thanks again for your help.

I’ve tried your suggestions but the problem persists. The .har quantized is almost as accurate as the pytorch model, but when I use the .hef in the raspberry the performance drops considerably.

Any other suggestion or idea will be appreciated.

Thanks!

Michael · March 25, 2026, 7:48pm

Hi @Pedro_Rosito,

Since HAR quantized model works well but the HEF on the Pi doesn’t, the issue is likely how the HEF runtime handles input/output .

Verify the input format the HEF expects - you can inspect this with hailortcli parse-hef your_model.hef and look at the input layer’s format and data type; if the HEF expects uint8 input, you should feed the raw resized image (0–255) and let the HEF’s built-in normalization handle the rest, rather than feeding pre-normalized float values.
Check the output tensor ordering - the HEF might reorder output channels differently than the HAR emulator, so it could help to print the output tensor shape and name and compare it against what you expect.
Try running inference using the hailortcli run command or Hailo’s Python InferModel API directly to rule out any issue with your inference wrapper code.
If you’re using quantization_param with normalization values in the ALLS, make sure you’re not double-normalizing - either let the model handle normalization internally (via ALLS) and feed uint8, or do it externally and ensure the ALLS has no normalization configured.

Thanks,