How to process image data when using C API

Hi,
I’ve just started using the Hailo-8 chip and I don’t know much about the hailort library APIs yet.

First, I started by analyzing vstreams_example.c.
However, shortcut network used in vstreams_example.c doesn’t perform any meaningful image processing and, src_data used as input image is just a random data, not an image.

So, I decided to use yolov7-tiny.hef instead of the shortcut network and use meaningful images instead of random data.

By running hailortcli parse-hef yolov7_tiny.hef, I was able to obtain the following output:


VStream infos:
Input yolov7_tiny/input_layer1 UINT8, NHWC(640x640x3)
Output yolov7_tiny/conv58 UINT16, FCR(20x20x255)
Output yolov7_tiny/conv51 UINT16, FCR(40x40x255)
Output yolov7_tiny/conv43 UINT8, FCR(80x80x255)

However, I did not fully understand the meaning of this format.

  1. For yolov7_tiny/input_layer1 above, src_data is defined to 3D array and src_data[99][49][0] represents the pixel of 100x50 of first channel?

  2. What does FCR above means?

  3. How can I visualize the dst_data from yolov7_tiny.hef?

Hi @candy24910,

To answer your questions:

  1. Yes, correct.
  2. FCR means first channels (features) are sent to HW.
  3. If by visualize you mean see the data in specific pixel, then it will be use the dst_data buffer (here: hailort/hailort/libhailort/examples/c/vstreams_example/vstreams_example.c at master · hailo-ai/hailort · GitHub). If by visualize you mean see the bboxes drawn on an image, this is either something you need to implement yourself, or use the detection C++ example we have in out Hailo Application repository:
    Hailo-Application-Code-Examples/runtime/cpp/object_detection/general_detection_inference at main · hailo-ai/Hailo-Application-Code-Examples · GitHub

Regards,

As far as I know, final output’s depth of YOLO network is Bx(C+5) where B is the number of bounding box predicted by each grid and C is the number of class.

the output of yolov7_tiny.hef has depth of 255.

  1. For yolov7_tiny.hef, I guess B=3 and C = 80, which results in B x (C+5) = 255. Is this right?

  2. Parsing yolov7_tiny.hef provide 3 output vstreams which have different height and width(20, 40, 80) but same depth(255). Why does yolov7_tiny.hef have multiple output, and which one do I use to check the result of detection? (and does the width and height correspond to the number of grid?)

  3. In your reply, you mentioned that the FCR means the first channel is sent to the HW. I still don’t fully understand this.

If the result from a (FCR)20x20x255 of output_vstream is sent to the dst_data array from Hailo, does FCR mean that the values stored in the order
1x1x1, 1x2x1, 1x3x120x20x1,
1x1x2, 1x2x2, 1x3x220x20x2,
and so on in the dst_data array?
Could you perhaps explain with an example?

Hi @candy24910,
To answer your questions:

  1. Yes, you are correct.
  2. In yolov7_tiny when using the Hailo SW, it depends - if you use the Hailo NMS, then the output will be a single output in a shape of (C,B) where C is the number classes the model was trained on and B in the maximum number of bboxes (defined in the optimization step) and it will include the bbox decoding & extraction and NMS.
    If you don’t use the Hailo NMS, for yolov7 you’ll have 3 outputs with shape that corresponds to the input resolution of the model and the number of classes. The height and width are different because each of them is used for a different object size - the bigger grid is used for the bigger anchors and stride, the medium for the medium and the small for the small. The number of channels doesn’t change because it’s the same calculation you described in section 1 for all.
    There are 3 output because from there in the model start the postprocessing ops that the Hailo SW doesn’t support. You need to implement this ops yourself on the CPU, and this is why we recommend to use the Hailo NMS option when possible.
  3. The order the data enters the device matters, as the device knows to handle specific order (NHWC), so in case the model’s format order is different, some pre-infer would be done by the Hailo SW.
    For FCR, it’s described in the HailoRT user guide:
    image
    The output would be aligned as regular multi-dimensional array, so it depends on how you iterate over it.

Regards,