.hef generation with existing onnx model without training

Hi. I have a pt/onnx model which I already heavily trained. The model works very well, and now we wanted to try it out on the raspberry pi. The inference works, but one frame takes 130ms. Thats why we bought the Raspberry Pi Ai Kit. The demo runs really fast.

Now that the DFC is out, I took a look into it. I managed to create a .har file. But now, I need to again “train” with images? why is that necessary, my model is already trained I just want it converted. Can I somehow quantize (which is needed for the compiler) without the need of images again?

Thanks in advance

1 Like

Hi @kamper,
Thanks for using the AI Kit on the Raspberry pi5.
If you think about it, naively compressing numbers 4 times fold (32b → 8int) almost without degradation is amazing. Part of achieving this is by using a small portion of the data (typically up to 1000 images, no need for GT) as a calibration set. This is a standard term called PTQ (Post Training Quantization) and is very common. This really helps ‘tune’ the quantized numbers correctly. With that, we can typically get less than 2% degradation.

1 Like

Welcome to the Hailo Community!

The Hailo Dataflow Compiler aims to approximate a large equation involving floating-point numbers using smaller integer numbers of 4, 8 (default), and 16 bits. With just the model the DFC only knows half the numbers in the the equation. The other half comes from the images. Providing the Hailo Dataflow Compiler with a representative set of images that reflects the data the model will encounter during inference enables it to optimize the model more effectively by selecting the appropriate quantization for each layer.

1 Like

Thank you for your answers. I tried optimizing and it seemed to work after some debugging. I got to my .hef file now.

Then I tried to run an example with my own hef file:

python basic_pipelines/detection.py --input resources/detection0.mp4 --hef test/fish.hef

which then promtly is followed by those error messages:

NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6

the full output in the console was

(venv_hailo_rpi5_examples) pi@raspberrypi1:~/Ai_Kit/hailo-rpi5-examples $ DISPLAY=:0 python basic_pipelines/detection.py --input resources/detection0.mp4 --hef test/fish.hef
hailomuxer name=hmux filesrc location=resources/detection0.mp4 name=src_0 ! queue name=queue_dec264 max-size-buffers=3 max-size-bytes=0 max-size-time=0 !  qtdemux ! h264parse ! avdec_h264 max-threads=2 !  video/x-raw, format=I420 ! queue name=queue_scale max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! videoscale n-threads=2 ! queue name=queue_src_convert max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! videoconvert n-threads=3 name=src_convert qos=false ! video/x-raw, format=RGB, width=640, height=640, pixel-aspect-ratio=1/1 ! tee name=t ! queue name=bypass_queue max-size-buffers=20 max-size-bytes=0 max-size-time=0 ! hmux.sink_0 t. ! queue name=queue_hailonet max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! videoconvert n-threads=3 ! hailonet hef-path=test/fish.hef batch-size=2 nms-score-threshold=0.3 nms-iou-threshold=0.45 output-format-type=HAILO_FORMAT_TYPE_FLOAT32 force-writable=true ! queue name=queue_hailofilter max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! hailofilter so-path=/home/pi/Ai_Kit/hailo-rpi5-examples/basic_pipelines/../resources/libyolo_hailortpp_post.so  qos=false ! queue name=queue_hmuc max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! hmux.sink_1 hmux. ! queue name=queue_hailo_python max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! queue name=queue_user_callback max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! identity name=identity_callback ! queue name=queue_hailooverlay max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! hailooverlay ! queue name=queue_videoconvert max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! videoconvert n-threads=3 qos=false ! queue name=queue_hailo_display max-size-buffers=3 max-size-bytes=0 max-size-time=0 ! fpsdisplaysink video-sink=xvimagesink name=hailo_display sync=true text-overlay=False signal-fps-measurements=true
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6
NMS score threshold is set, but there is no NMS output in this model.
CHECK_SUCCESS failed with status=6

did i miss something in the optimization step?

When just trying to run the model, this seems to work

(venv_hailo_rpi5_examples) pi@raspberrypi1:~/Ai_Kit/hailo-rpi5-examples $ hailortcli run test/fish.hef
Running streaming inference (test/fish.hef):
  Transform data: true
    Type:      auto
    Quantized: true
Network yolov8/yolov8: 100% | 633 | FPS: 126.45 | ETA: 00:00:00
> Inference result:
 Network group: yolov8
    Frames count: 633
    FPS: 126.45
    Send Rate: 1243.08 Mbit/s
    Recv Rate: 420.83 Mbit/s

HailoRT CLI can run models with any input and output format. That is why it is working.

For the Yolo models we recommend to add the NMS post-processing to the HEF file. NMS will be run by HailoRT on the host and you no longer need to run it as part of your application.

Your application expects the post-processing to be done by HailoRT.

To include the post-processing to the compiled HEF you will need to add the nms_postprocess command to your model script. Have a look at the ALLS script for the Yolov8m (example) model here:

GitHub - Hailo Model Zoo - yolov8m.alls

You will need to adapt the yolov8m_nms_config.json file to your model. The file is part of the yolov8m model downloads from our Model Zoo.

GitHub - Hailo Model Zoo - Hailo-8 Object Detection

The behavior is the same for the other Yolo model variants.

To find out whether your HEF includes NMS try the following command.

hailortcli parse-hef model.hef

With NMS the output will look like this:

Architecture HEF was compiled for: HAILO8
Network group name: yolov8m, Multi Context - Number of contexts: 4
    Network name: yolov8m/yolov8m
        VStream infos:
            Input  yolov8m/input_layer1 UINT8, NHWC(640x640x3)
            Output yolov8m/yolov8_nms_postprocess FLOAT32, HAILO NMS(number of classes: 80, maximum bounding boxes per class: 100, maximum frame size: 160320)
            Operation:
                Op YOLOV8
                Name: YOLOV8-Post-Process
                Score threshold: 0.200
                IoU threshold: 0.70
                Classes: 80
                Cross classes: false
                Max bboxes per class: 100
                Image height: 640
                Image width: 640
1 Like