YoloV7 returns poor detections

Hello Hailo !

I am trying to run a custom YoloV7 on a Hailo H8 card using the C++ API but it clearly does not work. I will detail the process I followed :

  1. From my onnx model, I got a .hef file using DFC 3.33 using the 3 commands hailo parser hailo optimize and hailo compile. At this step, there is one issue that I can deal with : during the parsing, the output is done before the detection head (so after the last 3 sigmoids). Therefore I have 3 outputs (for each anchors) instead of a single one. I have reimplemented the detection head in C++ but I wish it was not necessary.

  2. I use the C++ example async_infer_basic_example with a few modification :

  • During model loading, I specify that I expect the output to be float (since those are sigmoid output) using
        for (const auto &output_name : infer_model->get_output_names()) {
            infer_model->output(output_name)->set_format_type(HAILO_FORMAT_TYPE_FLOAT32);
        }
  • I fill the input buffer with the image in NCHW format without any conversion, that is, I have pixel value for each RGB channel (0-255) that I simply reorder
  • After the inference, I apply the detection head to my output to obtain the box and apply the NMS manually

In the end, I obtain boxes around the persons (so it doesn’t seem random) but the boxes are never totally OK and some person are not detected at all as you can see here.

I don’t know if there was an issue during the onnx->hef process or in my value interpretation.

Thank you for your guidance ! :slight_smile:

PS: I did not intend to use Hailo Zoo because we have custom neural network model that will not be available in the Zoo which we want to run on Hailo card. This YoloV7 test is just first step to check if we could use Hailo cards in the future.

Hey K_E,

Looking at that photo, it seems like you’ve got a lot of bounding boxes surrounding people, but they aren’t directly on them. That probably points to an issue with either the input data format, the output data format, or how you’re interpreting the Non-Maximum Suppression (NMS) during post-processing.

Regarding the first behavior you mentioned, that’s actually expected for YOLOv7. If you have more than one context, you need to run NMS on the host, so what you’ve done there is perfectly fine.

The part where you mentioned filling the input buffer with the image in NCHW format with pixel values 0-255 and no conversion might be where the error is coming from. A couple of things to check:

  • Hailo models generally expect input in NHWC format (Height, Width, Channels). Also, the input data needs to match the quantization and normalization that was used when the model was trained and calibrated.
  • If your HEF file is set up for UINT8 input, you should be providing properly quantized uint8 data. If you’d rather use float32, you need to configure the input stream for that, and HailoRT can usually handle the quantization process for you if it’s set up correctly.

Can you run the command :

hailortcli parse-hef { model } , we can see the inputs and outputs of the model and what it does expect and provide so we can help you better!