Problem with the conversion of the u-net model onnx to hef format or with inference for hailo8

Greetings, I recently started working with hailo8.

I have a learning task to run an inference of a trained custom u-net model for a semantic segmentation task. The initial model is presented in the format .onnx . I have converted the model from the .onnx to .hef format according to the official Hailo manual (https://hailo.ai/developer-zone/documentation/dataflow-compiler-v3-33-0/?sp_referrer=tutorials/tutorials.html).

I tried to launch inference using a ready-made example./runtime/hailo-8/cpp/semantic_segmentation/semseg.cpp from the Hailo-Application-Code-Examples repository ( GitHub - hailo-ai/Hailo-Application-Code-Examples ), but I came across the fact that the pixel classes of the input image belong to non-existent classes:

.venv) admin@raspberrypi:~/Hailo-Application-Code-Examples/runtime/hailo-8/cpp/semantic_segmentation $  ./segmentation_example_cpp -hef=./custom_hailo_models/last_best_model_hailo.hef -path=./image_part_001.mp4
-I- video path: ./image_part_001.mp4
-I- hef: ./custom_hailo_models/last_best_model_hailo.hef

-I---------------------------------------------------------------------
-I- Dir  Name                                                          
-I---------------------------------------------------------------------
-I- IN:  best_model_hailo/input_layer1
-I---------------------------------------------------------------------
-I- OUT: best_model_hailo/conv46
-I---------------------------------------------------------------------

-I- Started write thread ./image_part_001.mp4
-I- Started read thread ./image_part_001.mp4
Class num:  141
segmentation_example_cpp: /home/admin/Hailo-Application-Code-Examples/runtime/hailo-8/cpp/semantic_segmentation/cityscape_labels.hpp:41: cv::Vec3f CityScapeLabels::id_2_color(int): Assertion `i >= 0 && i <= 6' failed.
Aborted

My Model Parameters:
Architecture: U-net;
The size of the entrance: {1, 3, 512, 512};
Output size: {1, 6, 512, 512};
Training sample size: 63;
Batch size: 1;
Color format: RGB.

Optimization parameters:
hw_arch: hailo8;
output_height=512;
output_width=512;
resize_side=512;
alls = “normalization1 = normalization([123.675, 116.28, 103.53], [58.395, 57.12, 57.375])\n”

The thing is, I don’t understand: I made a mistake in the conversion process or misunderstood the operation of the inference. I would be very grateful if you would point out my mistake, give me advice, or at least send me the most complete guide for the correct conversion and launch of the inference of my model.
Thank you in advance!

Hey @user370 ,

Welcome to the Hailo Community!

From what you shared, your model and HEF are probably fine. The crash is due to using a demo built for Cityscapes with a model that has a different class setup. The code is expecting class IDs in the [0–6] range, but your output gives 141, which triggers the assertion.

This is mainly a post-processing issue — not a problem with the model itself.

What to do:

  1. Replace the CityScapeLabels::id_2_color logic with your own color mapping for 6 classes (or skip coloring for now and just dump raw class indices).
  2. Run hailortcli parse-hef your_model.hef to confirm:
  • Output shape (likely 512x512x6)
  • Layout (NHWC vs NCHW)
  • Data format (UINT8 etc.)
  1. Make sure you’re not applying normalization both in .alls and again in C++. If it’s already in .alls, skip it in your code.

Once you adjust the post-processing to match your model’s output, the crash should go away.

If you send the parse-hef output and the argmax/coloring part of your code, we can help confirm the fix.

Hope this helps!

Hello, thanks for the feedback! I am attaching the following:

  1. The output of the parse-hef command for my model:
admin@raspberrypi:\~/Hailo-Application-Code-Examples/runtime/hailo-8/cpp/semantic_segmentation/custom_hailo_models $ hailortcli parse-hef last_best_model_hailo.hef
Architecture HEF was compiled for: HAILO8
Network group name: best_model_hailo, Single Context
Network name: best_model_hailo/best_model_hailo
VStream infos:
Input  best_model_hailo/input_layer1 UINT8, NHWC(512x512x3)
Output best_model_hailo/conv46 UINT8, NHWC(512x512x6)
  1. A fragment of the program code semseg.cpp , in which the input image is being processed and recorded into the output video (personally, I did not find anything related to normalization here):
template <typename T> hailo_status write_all(std::vector<InputVStream> &input, std::string &video_path, 
                                            int height, int width, int channels) {
    std::cout << "-I- Started write thread " << video_path << std::endl;
    cv::VideoCapture capture(video_path);
    cv::Mat frame;
    if(!capture.isOpened())
        throw "Unable to read video file";
    for( ; ; ) {
        capture >> frame;
        if(frame.empty()) {
            break;
            }
        if (frame.channels() == 3)
            cv::cvtColor(frame, frame, cv::COLOR_BGR2RGB);
    
        if (frame.rows != height || frame.cols != width)
            cv::resize(frame, frame, cv::Size(width, height), cv::INTER_AREA);

        int factor = std::is_same<T, uint8_t>::value ? 1 : 4;  // In case we use float32_t, we have 4 bytes per component
        auto status = input[0].write(MemoryView(frame.data, height * width * channels * factor)); // Writing height * width, 3 channels of uint8
        if (HAILO_SUCCESS != status) 
            return status;      
    }
    std::cout << "-I- Finished write thread " << video_path << std::endl;
    return HAILO_SUCCESS;
}

template <typename T> cv::Mat semseg_post_process(std::vector<T>& logits, int height, int width) {
    cv::Mat output(height, width, CV_32FC3, cv::Scalar(0)); 
    cv::Mat input(height, width, CV_8UC1, logits.data());

    static CityScapeLabels obj;
    for (int r = 0; r < height; ++r) {
        for (int c = 0; c < width; ++c) {
            T *pixel = input.ptr<T>(r,c);
	    output.at<cv::Vec3f>(r,c) = obj.id_2_color(*pixel);
        }
    }
    return output;
}

template <typename T> hailo_status read_all(OutputVStream &output, std::string &video_path, int height, int width, int frame_count) {
    std::vector<T> data(output.get_frame_size());
    std::vector<cv::String> file_names;
    std::cout << "-I- Started read thread " << video_path << std::endl;
    cv::VideoWriter video("./processed_video.mp4",cv::VideoWriter::fourcc('m','p','4','v'),30, cv::Size(width,height));
    for (int i = 0; i < frame_count; i++) {
        auto status = output.read(MemoryView(data.data(), data.size()));   // !!!!!!
        if (HAILO_SUCCESS != status)
            return status;

        auto seg_image = semseg_post_process<T>(data, height, width);
        cv::GaussianBlur(seg_image, seg_image, cv::Size(5, 5), 0, 0);
        seg_image.convertTo(seg_image, CV_8U, 1.6, 10);
        video.write(seg_image);
    }
    video.release();
    std::cout << "-I- Finished read thread " << video_path << std::endl;
    return HAILO_SUCCESS;
}
  1. Modified CityScapeLabels:
#include <cassert>
#include <array>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>

class CityScapeLabels
{
private:
std::array<cv::Vec3f, 6> \_colors;

public:
CityScapeLabels()
{
\_colors\[0\]  = cv::Vec3f(60., 16.,  152.); // purple - road 1
\_colors\[1\]  = cv::Vec3f(132., 41.,  246.); // orange - sidewalk 2
\_colors\[2\]  = cv::Vec3f(110.,  193.,  228. ); // yellow - building 3
\_colors\[3\]  = cv::Vec3f(254., 221., 58.); // yellow - wall 4
\_colors\[4\]  = cv::Vec3f(226., 169., 41.); // yellow - fence 5
\_colors\[5\]  = cv::Vec3f(155., 155., 155.); // grey   - pole 6

}
cv::Vec3f id_2_color(int i)
{
    std::cout << "Class num:  " << i << std::endl;
    assert(i >= 0 && i < 6);
    return _colors[i]; 
}
};

I will be glad if we can solve this problem together. Thank you for your help.

Hey @user370,

Your HEF and basic pipeline look good - the issue isn’t with the conversion itself, but with how the example code is interpreting the output tensor.

Looking at your parse-hef output:

Your model outputs UINT8, NHWC(512x512x6) - meaning for each pixel, you get 6 channels (one for each class score/logit). But the example code is treating the output like it’s a single-channel image with “class ID per pixel”:

The problem is here:

cv::Mat input(height, width, CV_8UC1, logits.data());
T *pixel = input.ptr<T>(r,c);
obj.id_2_color(*pixel);

This completely ignores the channel dimension and just reads arbitrary bytes as “class IDs” - which is why you’re seeing values like 141 and hitting that assert(i >= 0 && i < 6).

Here’s what you need to fix:

You need to treat the output as 3D (H × W × 6 channels) and do an argmax across the 6 channels for each pixel.

Since your output is NHWC(512x512x6) in UINT8, the buffer is laid out like:

[ (y=0,x=0,c=0), (y=0,x=0,c=1), ..., (y=0,x=0,c=5),
  (y=0,x=1,c=0), ..., (y=0,x=1,c=5),
  ...
]

Replace your semseg_post_process function with something like this:

template <typename T>
cv::Mat semseg_post_process(std::vector<T> &logits, int height, int width, int num_classes = 6)
{
    cv::Mat output(height, width, CV_32FC3, cv::Scalar(0));
    static CityScapeLabels obj;
    
    for (int r = 0; r < height; ++r) {
        for (int c = 0; c < width; ++c) {
            int base = (r * width + c) * num_classes;
            T max_val = logits[base];
            int max_idx = 0;
            
            for (int k = 1; k < num_classes; ++k) {
                T v = logits[base + k];
                if (v > max_val) {
                    max_val = v;
                    max_idx = k;
                }
            }
            
            output.at<cv::Vec3f>(r, c) = obj.id_2_color(max_idx);
        }
    }
    return output;
}

Then update your call in read_all:

auto seg_image = semseg_post_process<T>(data, height, width, 6);

And get rid of that cv::Mat input(... CV_8UC1, logits.data()) line - you don’t need it anymore since you’re indexing directly into the logits vector.

That should fix it!

Great, your corrections helped! Thank you so much for your help!