.hef fails on unet multiclass segmentation

Hi to all, i’m currently working with a custom unet model for semantic segmentation in a raspberry pi 5 with an AI HAT + (HAILO 8). The model has to segment 4 classes in an industrial image. I have the .onnx trained that works fine and also I managed to get de .har quantized working pretty well too (just with an acceptable error margin) using the hailo dataflow compiler. When I compile the .hef file and do the inference in my raspberry pi with the ai hat, the results are totally wrong and I can’t understand why.

I’m doing exactly the same preprocess method to prepare the image for .har quantized and for the .hef. In both cases i’m doing the normalization outside the .alls. I think that the .hef has something wrong or i’m not understanding correctly how to work with the output of the inference method.

This is the preprocess method i’m using:


    img_pil = Image.open(image_path).convert("RGB")
    img_original = np.array(img_pil)
   

    img_resized = img_pil.resize(target_size, Image.BILINEAR)
    img_resized_np = np.array(img_resized, dtype=np.float32)  # (H, W, 3), 0-255
    
    img_float = img_resized_np / 255.0  # (H, W, 3), [0,1]
    
    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
    std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
    img_normalized = (img_float - mean) / std  # (H, W, 3)
    
    return img_normalized, img_original

This is the postprocess method i’m using:

    output_dict = result.results[0]
    
    tensor_uint8 = output_dict['data']  # Shape: (1, 1048576, 4)
    scale = output_dict['quantization']['scale'][0]
    zero = output_dict['quantization']['zero'][0]
    
    tensor_float = scale * (tensor_uint8.astype(np.float32) - zero)
    tensor_reshaped = tensor_float.reshape(1024, 1024, 4)
    
    tensor_normalized = np.zeros_like(tensor_reshaped)
    for c in range(4):
        channel = tensor_reshaped[:, :, c]
        mean = channel.mean()
        std = channel.std()
        if std > 0:
            tensor_normalized[:, :, c] = (channel - mean) / std
        else:
            tensor_normalized[:, :, c] = channel - mean

    mask_zscore = np.argmax(tensor_normalized, axis=-1)

This is the output of the inference method:

[{'data': array([[[112,  94,  86,  83],
        [120,  97,  82,  71],
        [121,  95,  81,  65],
        ...,
        [133, 102,  82,  67],
        [130, 101,  83,  69],
        [119, 106,  90,  73]]], shape=(1, 1048576, 4), dtype=uint8), 'id': 0, 'name': 'unet/conv31', 'quantization': {'axis': -1, 'scale': [0.1700144112110138], 'zero': [99]}, 'shape': [1, 1048576, 4], 'size': 4194304, 'type': 'DG_UINT8'}]

Any help is appreciated.

Thanks in advance!

Hi @Pedro_Rosito,

Your issue is likely in the postprocessing - the per-channel z-score normalization you’re applying to the output logits could be distorting the class predictions, since it rescales each class independently and destroys the relative magnitudes the model learned. You might want to try removing that normalization step entirely and just doing np.argmax(tensor_reshaped, axis=-1) directly on the dequantized logits.

Also, it might be worth checking your input format - HEF inference often expects uint8 [0–255] input with normalization baked into the model’s quantization, so you may want to try feeding the raw resized uint8 image instead of the normalized float version, depending on how your ALLS was configured during compilation.

Thanks,

Hi @Michael thanks for your suggestions. I’ve followed them and the inference now is almost ok. The problem i’m having now is that the segmentation masks that i’m getting with the .hef in the raspberry with the ai hat, have a bigger error than those obtained with the .har quantized, in fact one of the four classes (that has a lower pixel representation) is not being detected at all. Any suggestions in that regard?

Thanks again!

Hi @Pedro_Rosito,

  1. Worth trying to compile the HEF with higher precision for the output layer by using model_optimization_config(batch_size=X, compression_level=0) or marking specific layers as sensitive in your ALLS script (e.g., quantization_param([your_output_layer], precision_mode=a16_w16) ).
  2. It could be worth experimenting with increasing calibration dataset size and ensuring it has good representation of that class.

Thanks,