How to infer custom trained Yolov8m for instance segmentation on Ubuntu 22.04

Hello, I’m trying to infer yolov8m-seg trained on my data. Steps I already do:

  1. Export from “.pt” to “.onnx”, using Ultralytics API. When I export, I forgot to set imgz, so default imgz is 640x640 (if I understand correct)

  2. Export from “.onnx” to “.hef”, using hailomz compile. Path to yaml file I set as suggested in Hailo Model Zoo documentation (hailo_model_zoo/cfg/networks/yolov8s-seg.yaml). I changed nothing in this config(i’m not sure if i need to). Full command: hailomz compile --ckpt my-yolov8m-seg.onnx --calib-path /path/to/calibration/imgs/dir/ --yaml path/to/yolov8m-seg.yaml.

  3. Trying to infer. I pass some batch of shape [1, 640, 640, 3] and receive some output from model.
    Config:

base:
- base/yolov8_seg.yaml
network:
  network_name: yolov8m_seg
paths:
  alls_script: yolov8m_seg.alls
  network_path:
  - models_files/InstanceSegmentation/coco/yolov8/yolov8m/pretrained/2023-03-06/yolov8m-seg.onnx
  url: https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/InstanceSegmentation/coco/yolov8/yolov8m/pretrained/2023-03-06/yolov8m-seg.zip
parser:
  nodes:
  - null
  - - /model.22/cv2.2/cv2.2.2/Conv
    - /model.22/cv3.2/cv3.2.2/Conv
    - /model.22/cv4.2/cv4.2.2/Conv
    - /model.22/cv2.1/cv2.1.2/Conv
    - /model.22/cv3.1/cv3.1.2/Conv
    - /model.22/cv4.1/cv4.1.2/Conv
    - /model.22/cv2.0/cv2.0.2/Conv
    - /model.22/cv3.0/cv3.0.2/Conv
    - /model.22/cv4.0/cv4.0.2/Conv
    - /model.22/proto/cv3/act/Mul
info:
  task: instance segmentation
  input_shape: 640x640x3
  output_shape: 20x20x64, 20x20x80, 20x20x32, 40x40x64, 40x40x80, 40x40x32, 80x80x64,
    80x80x80, 80x80x32, 160x160x32
  operations: 110.2G
  parameters: 27.3M
  framework: pytorch
  training_data: coco instances train2017
  validation_data: coco instances val2017
  eval_metric: mAP
  full_precision_result: 40.6
  source: https://github.com/ultralytics/ultralytics
  license_url: https://github.com/ultralytics/ultralytics/blob/main/LICENSE
  license_name: GPL-3.0

I receive 6 outputs with shape mentioned in yaml config.

Question: what should I do with this 6 outputs ? Should I pass them somewhere else ? Or I need change these outputs to another layers names (check my onnx in netron and change outputs ?) ?

Feel free to ask questions
Thanks for help!

Hi @max,
I glad to see that you are able to advance so much on your own!

Those 6 outputs needs to be connected to the post processing of the network to produce the actual output that you expect.

If your inference system is based on Raspberry pi5, you can take a look here for the full instance-segmentation pipeline, and integrate your specific HEF:
hailo-rpi5-examples/basic_pipelines/instance_segmentation.py at main · hailo-ai/hailo-rpi5-examples (github.com)

If your system is based on x86 platform, you can take a look here:
Hailo-Application-Code-Examples/runtime/python/instance_segmentation at main · hailo-ai/Hailo-Application-Code-Examples (github.com)

Good luck:)

1 Like

Hi @Nadav

Thanks for quick response. I test second solution and it worked for me. I mark your answer as solution. Could you also help me with my second problem ? I already create a topic for it. I just want to know if it’s possible to do it right now or not.

Topic: Dataflow Compiler: BackendAllocatorException: Compilation failed: No successful assignment for: concat1

Thanks,
Max

Hi @Nadav ,

One more question about Yolov8 segmentation. I’m trying to increase performance. I use “int8” option in Ultralytics model.export method (path_to_yolo_onnx = model.export(format=“onnx”, int8=True)).
In logs during exporting i see that int8 argument take part in export process.

Logs:

Ultralytics YOLOv8.2.77 🚀 Python-3.8.10 torch-2.4.0+cu121 CPU (Intel Core(TM) i5-8265U 1.60GHz)
WARNING ⚠️ INT8 export requires a missing 'data' arg for calibration. Using default 'data=coco8-seg.yaml'.
YOLOv8m-seg summary (fused): 245 layers, 27,224,700 parameters, 0 gradients, 110.0 GFLOPs

PyTorch: starting from '/local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (52.3 MB)

ONNX: starting export with onnx 1.14.0 opset 17...
ONNX: export success ✅ 2.7s, saved as '/local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx' (104.1 MB)

Export complete (6.0s)
Results saved to /local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod
Predict:         yolo predict task=segment model=/local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx imgsz=640 int8 
Validate:        yolo val task=segment model=/local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx imgsz=640 data=/mnt/azureml/cr/j/b6290a543a804919871446a8f607ff4a/cap/data-capability/wd/yaml_config_path/data_config.yaml int8 
Visualize:       https://netron.app
path_to_yolo_onnx: /local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx
Loading /local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx for ONNX Runtime inference...

image 1/1 /local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/data/test/data-1005.jpg: 640x640 1 toilet seat, 647.7ms
Speed: 2.6ms preprocess, 647.7ms inference, 30.7ms postprocess per image at shape (1, 3, 640, 640)

Inference time on onnx is smaller than on default model (~1000ms vs 647ms). That is a good sign that int8 works for me.

Then I compile my onnx with hailomz compile. No problems here.

In code I do 2 changes:
Before:

input_vstreams_params = InputVStreamParams.make(network_group, quantized=False, format_type=FormatType.FLOAT32)
    output_vstreams_params = OutputVStreamParams.make(network_group, quantized=False, format_type=FormatType.FLOAT32)

--- some code ---

input_data = {input_vstream_info.name: np.expand_dims(processed_image, axis=0).astype(np.float32)}

After (i know that “quantized” is unused) :

input_vstreams_params = InputVStreamParams.make(network_group, quantized=True, format_type=FormatType.UINT8)
    output_vstreams_params = OutputVStreamParams.make(network_group, quantized=True, format_type=FormatType.UINT8)

--- some code ---

input_data = {input_vstream_info.name: np.expand_dims(processed_image, axis=0).astype(np.uint8)}

Performance before (with float32) : ~0.07 seconds per infer, batch size = 1
Performance after (with uint8) : ~0.2 seconds per infer, batch size = 1

What I’m doing wrong ?

Sorry If I ask u to often.

Sincerely,
Max

Hi Max,
That’s a nice try, but our quantizer requires input as 32b. So, while you were successfull I’m not sure if the results are any good.

Have you tried to use the --performance mz compilation command? It takes longer time, but the compiler tries much harder to get better results.

Another thing that really boosts performance, is choosing the right model for the task. Many (including me :slight_smile: ) go automatically to the V8, but in many cases it’s an overkill. Using a slightly smaller net, with simpler activation and simpler post-processing (e.g. yolov5s) gives similar accuracy with a much better overall performance.

Hi @Nadav ,
Thanks for quick response. No, I do not try --performance option. About V8 I totally agree with you. I just experiment with existed trained production V8. Trying to get best performance from your HAILO8).

Sincerely,
Max