How to infer custom trained Yolov8m for instance segmentation on Ubuntu 22.04

max · August 20, 2024, 1:08pm

Hello, I’m trying to infer yolov8m-seg trained on my data. Steps I already do:

Export from “.pt” to “.onnx”, using Ultralytics API. When I export, I forgot to set imgz, so default imgz is 640x640 (if I understand correct)
Export from “.onnx” to “.hef”, using hailomz compile. Path to yaml file I set as suggested in Hailo Model Zoo documentation (hailo_model_zoo/cfg/networks/yolov8s-seg.yaml). I changed nothing in this config(i’m not sure if i need to). Full command: hailomz compile --ckpt my-yolov8m-seg.onnx --calib-path /path/to/calibration/imgs/dir/ --yaml path/to/yolov8m-seg.yaml.
Trying to infer. I pass some batch of shape [1, 640, 640, 3] and receive some output from model.
Config:

base:
- base/yolov8_seg.yaml
network:
  network_name: yolov8m_seg
paths:
  alls_script: yolov8m_seg.alls
  network_path:
  - models_files/InstanceSegmentation/coco/yolov8/yolov8m/pretrained/2023-03-06/yolov8m-seg.onnx
  url: https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/InstanceSegmentation/coco/yolov8/yolov8m/pretrained/2023-03-06/yolov8m-seg.zip
parser:
  nodes:
  - null
  - - /model.22/cv2.2/cv2.2.2/Conv
    - /model.22/cv3.2/cv3.2.2/Conv
    - /model.22/cv4.2/cv4.2.2/Conv
    - /model.22/cv2.1/cv2.1.2/Conv
    - /model.22/cv3.1/cv3.1.2/Conv
    - /model.22/cv4.1/cv4.1.2/Conv
    - /model.22/cv2.0/cv2.0.2/Conv
    - /model.22/cv3.0/cv3.0.2/Conv
    - /model.22/cv4.0/cv4.0.2/Conv
    - /model.22/proto/cv3/act/Mul
info:
  task: instance segmentation
  input_shape: 640x640x3
  output_shape: 20x20x64, 20x20x80, 20x20x32, 40x40x64, 40x40x80, 40x40x32, 80x80x64,
    80x80x80, 80x80x32, 160x160x32
  operations: 110.2G
  parameters: 27.3M
  framework: pytorch
  training_data: coco instances train2017
  validation_data: coco instances val2017
  eval_metric: mAP
  full_precision_result: 40.6
  source: https://github.com/ultralytics/ultralytics
  license_url: https://github.com/ultralytics/ultralytics/blob/main/LICENSE
  license_name: GPL-3.0

I receive 6 outputs with shape mentioned in yaml config.

Question: what should I do with this 6 outputs ? Should I pass them somewhere else ? Or I need change these outputs to another layers names (check my onnx in netron and change outputs ?) ?

Feel free to ask questions
Thanks for help!

Nadav · August 20, 2024, 2:35pm

Hi @max,
I glad to see that you are able to advance so much on your own!

Those 6 outputs needs to be connected to the post processing of the network to produce the actual output that you expect.

If your inference system is based on Raspberry pi5, you can take a look here for the full instance-segmentation pipeline, and integrate your specific HEF:
hailo-rpi5-examples/basic_pipelines/instance_segmentation.py at main · hailo-ai/hailo-rpi5-examples (github.com)

If your system is based on x86 platform, you can take a look here:
Hailo-Application-Code-Examples/runtime/python/instance_segmentation at main · hailo-ai/Hailo-Application-Code-Examples (github.com)

Good luck:)

max · August 21, 2024, 2:00pm

Hi @Nadav

Thanks for quick response. I test second solution and it worked for me. I mark your answer as solution. Could you also help me with my second problem ? I already create a topic for it. I just want to know if it’s possible to do it right now or not.

Topic: Dataflow Compiler: BackendAllocatorException: Compilation failed: No successful assignment for: concat1

Thanks,
Max

max · August 22, 2024, 11:00am

Hi @Nadav ,

One more question about Yolov8 segmentation. I’m trying to increase performance. I use “int8” option in Ultralytics model.export method (path_to_yolo_onnx = model.export(format=“onnx”, int8=True)).
In logs during exporting i see that int8 argument take part in export process.

Logs:

Ultralytics YOLOv8.2.77 🚀 Python-3.8.10 torch-2.4.0+cu121 CPU (Intel Core(TM) i5-8265U 1.60GHz)
WARNING ⚠️ INT8 export requires a missing 'data' arg for calibration. Using default 'data=coco8-seg.yaml'.
YOLOv8m-seg summary (fused): 245 layers, 27,224,700 parameters, 0 gradients, 110.0 GFLOPs

PyTorch: starting from '/local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (52.3 MB)

ONNX: starting export with onnx 1.14.0 opset 17...
ONNX: export success ✅ 2.7s, saved as '/local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx' (104.1 MB)

Export complete (6.0s)
Results saved to /local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod
Predict:         yolo predict task=segment model=/local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx imgsz=640 int8 
Validate:        yolo val task=segment model=/local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx imgsz=640 data=/mnt/azureml/cr/j/b6290a543a804919871446a8f607ff4a/cap/data-capability/wd/yaml_config_path/data_config.yaml int8 
Visualize:       https://netron.app
path_to_yolo_onnx: /local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx
Loading /local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/version-33-06-05-2024-10-00-12.onnx for ONNX Runtime inference...

image 1/1 /local/shared_with_docker/tpu_investigation/tests/test_yolov8_seg_prod/data/test/data-1005.jpg: 640x640 1 toilet seat, 647.7ms
Speed: 2.6ms preprocess, 647.7ms inference, 30.7ms postprocess per image at shape (1, 3, 640, 640)

Inference time on onnx is smaller than on default model (~1000ms vs 647ms). That is a good sign that int8 works for me.

Then I compile my onnx with hailomz compile. No problems here.

In code I do 2 changes:
Before:

input_vstreams_params = InputVStreamParams.make(network_group, quantized=False, format_type=FormatType.FLOAT32)
    output_vstreams_params = OutputVStreamParams.make(network_group, quantized=False, format_type=FormatType.FLOAT32)

--- some code ---

input_data = {input_vstream_info.name: np.expand_dims(processed_image, axis=0).astype(np.float32)}

After (i know that “quantized” is unused) :

input_vstreams_params = InputVStreamParams.make(network_group, quantized=True, format_type=FormatType.UINT8)
    output_vstreams_params = OutputVStreamParams.make(network_group, quantized=True, format_type=FormatType.UINT8)

--- some code ---

input_data = {input_vstream_info.name: np.expand_dims(processed_image, axis=0).astype(np.uint8)}

Performance before (with float32) : ~0.07 seconds per infer, batch size = 1
Performance after (with uint8) : ~0.2 seconds per infer, batch size = 1

What I’m doing wrong ?

Sorry If I ask u to often.

Sincerely,
Max

Nadav · August 22, 2024, 11:53am

Hi Max,
That’s a nice try, but our quantizer requires input as 32b. So, while you were successfull I’m not sure if the results are any good.

Have you tried to use the --performance mz compilation command? It takes longer time, but the compiler tries much harder to get better results.

Another thing that really boosts performance, is choosing the right model for the task. Many (including me ) go automatically to the V8, but in many cases it’s an overkill. Using a slightly smaller net, with simpler activation and simpler post-processing (e.g. yolov5s) gives similar accuracy with a much better overall performance.

max · August 22, 2024, 12:03pm

Hi @Nadav ,
Thanks for quick response. No, I do not try --performance option. About V8 I totally agree with you. I just experiment with existed trained production V8. Trying to get best performance from your HAILO8).

Sincerely,
Max

mzahana · November 8, 2024, 10:09am

Hi @max

I am trying to do something similar to what you did.

I trained a yolov8n-seg model on my custom dataset with 4 classes. And I downloaded the AI suite docker image inside my Windows WSL2. I exported the .pt file to onnx and I use the hailomz compile as follows

hailomz compile --ckpt ../shared_with_docker/best.onnx --calib-path ../shared_with_docker/images/ --yaml hailo_model_zoo/hailo_model_zoo/cfg/networks/yolov8n_seg.yaml --performance --classes 4
[info] First time Hailo Dataflow Compiler is being used. Checking system requirements... (this might take a few seconds)
[Warning] CUDNN version should be 8.9 or higher, found ..
Component                                Requirement                    Found                               
==========  ==========       ==========  ==========                                                         
OS                                       Ubuntu               Ubuntu                                        Required
Release     20.04                        22.04                                       Required               
Package     python3-tk                   V                    Required                                      
Package     graphviz                     V                    Required                                      
Package     libgraphviz-dev              V                    Required                                      
Package     python3.10-dev               V                    Required                                      
RAM(GB)     16                           78                                          Required               
RAM(GB)     32                           78                                          Recommended            
CPU-Arch                                              x86_64  x86_64                              Required  
CPU-flag                                              avx     V                      Required               
GPU-Driver                                            525     566       Recommended                         
CUDA                                                  11.8    11.8                   Recommended            
CUDNN                                                 8.9     .         Recommended                         
Var:CC                       unset                    unset             Required                            
Var:CXX                      unset                    unset             Required                            
Var:LD                       unset                    unset             Required                            
Var:AS                       unset                    unset             Required                            
Var:AR                       unset                    unset             Required                            
Var:LN                       unset                    unset             Required                            
Var:DUMP                     unset                    unset             Required                            
Var:CPY                      unset                    unset             Required                            
In file included from /local/workspace/hailo_virtualenv/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1948,
                 from /local/workspace/hailo_virtualenv/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /local/workspace/hailo_virtualenv/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5,
                 from /home/hailo/.pyxbld/temp.linux-x86_64-cpython-310/local/workspace/hailo_model_zoo/hailo_model_zoo/core/postprocessing/cython_utils/cython_nms.c:1246:
/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
   17 | #warning "Using deprecated NumPy API, disable it with " \
      |  ^~~~~~~
<Hailo Model Zoo INFO> Start run for network yolov8n_seg ...
<Hailo Model Zoo INFO> Initializing the hailo8 runner...
[info] Translation started on ONNX model yolov8n_seg
[info] Restored ONNX model yolov8n_seg (completion time: 00:00:00.04)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.23)
[info] Start nodes mapped from original model: 'images': 'yolov8n_seg/input_layer1'.
[info] End nodes mapped from original model: '/model.22/cv2.2/cv2.2.2/Conv', '/model.22/cv3.2/cv3.2.2/Conv', '/model.22/cv4.2/cv4.2.2/Conv', '/model.22/cv2.1/cv2.1.2/Conv', '/model.22/cv3.1/cv3.1.2/Conv', '/model.22/cv4.1/cv4.1.2/Conv', '/model.22/cv2.0/cv2.0.2/Conv', '/model.22/cv3.0/cv3.0.2/Conv', '/model.22/cv4.0/cv4.0.2/Conv', '/model.22/proto/cv3/act/Mul'.
[info] Translation completed on ONNX model yolov8n_seg (completion time: 00:00:00.77)
[info] Saved HAR to: /local/workspace/yolov8n_seg.har
<Hailo Model Zoo INFO> Using generic alls script found in /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n_seg.alls because there is no specific hardware alls
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to yolov8n_seg from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov8n_seg.alls
<Hailo Model Zoo WARNING> Ignoring classes parameter since the model has no NMS post-process.
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.52)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:19<00:00,  3.29entries/s]
[info] Statistics Collector is done (completion time is 00:00:21.10)
[info] Output layer yolov8n_seg/conv45 with sigmoid activation was detected. Forcing its output range to be [0, 1] (original range was [9.31174976773056e-16, 0.9431573152542114]).
[info] Output layer yolov8n_seg/conv61 with sigmoid activation was detected. Forcing its output range to be [0, 1] (original range was [1.7576537254026084e-23, 0.9824845194816589]).
[info] Output layer yolov8n_seg/conv74 with sigmoid activation was detected. Forcing its output range to be [0, 1] (original range was [4.132465093344215e-20, 0.9821887612342834]).
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune
.
.
.
[info] Quantization-Aware Fine-Tuning is done (completion time is 00:04:42.97)
[info] Starting Layer Noise Analysis
Full Quant Analysis:   0%|                                                                                                                                                           | 0/2 [00:00<?, ?iterations/s]Traceback (most recent call last):
  File "/local/workspace/hailo_virtualenv/bin/hailomz", line 33, in <module>
    sys.exit(load_entry_point('hailo-model-zoo', 'console_scripts', 'hailomz')())
  File "/local/workspace/hailo_model_zoo/hailo_model_zoo/main.py", line 122, in main
    run(args)
  File "/local/workspace/hailo_model_zoo/hailo_model_zoo/main.py", line 111, in run
    return handlers[args.command](args)
  File "/local/workspace/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 250, in compile
    _ensure_optimized(runner, logger, args, network_info)
  File "/local/workspace/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 91, in _ensure_optimized
    optimize_model(
  File "/local/workspace/hailo_model_zoo/hailo_model_zoo/core/main_utils.py", line 326, in optimize_model
    runner.optimize(calib_feed_callback)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 2093, in optimize
    self._optimize(calib_data, data_type=data_type, work_dir=work_dir)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 1935, in _optimize
    self._sdk_backend.full_quantization(calib_data, data_type=data_type, work_dir=work_dir)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1045, in full_quantization
    self._full_acceleras_run(self.calibration_data, data_type)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1229, in _full_acceleras_run
    optimization_flow.run()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 306, in wrapper
    return func(self, *args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/flows/optimization_flow.py", line 326, in run
    step_func()
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped
    result = method(*args, **kwargs)
  File "/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_model_optimization/tools/subprocess_wrapper.py", line 113, in parent_wrapper
    raise SubprocessUnexpectedFailure(
hailo_model_optimization.acceleras.utils.acceleras_exceptions.SubprocessUnexpectedFailure: Subprocess step3 failed with unexpected error. exitcode -4

So I am not able to get the .hef file.

@Nadav any hint what could be the issue here?

Thanks

Nadav · November 10, 2024, 2:01pm

Hi, unfortunately, WSL2 is not officially supported, especially with respect to the GPU usage. Do you have the option to run this on native Linux?

Topic		Replies	Views
Raspberry Pi 5 + Hailo8L for instance segmentation using yolo v5 General raspberry-pi	18	570	February 24, 2025
Not Sure If I Properly Converted Yolo 8 Model General dfc , hailo8	8	725	August 1, 2024
Can't Optimize or Compile YoloV8 with trained on Custom Dataset General yolov8	10	1196	November 10, 2024
Yolo segmentation converted to hef file dont work General gstreamer , raspberry-pi , hailo8 , error	12	418	March 5, 2025
Compile YOLOv8 onnx to hef General hailo8	6	720	July 25, 2024

How to infer custom trained Yolov8m for instance segmentation on Ubuntu 22.04

Related topics