How to install Hailo Model Zoo

samjy_chang · September 10, 2024, 3:18pm

Hello, my board is rapi5 with Hailo-8L, I want to run with python “Hailo-Application-Code-Examples” from github, but I don’t know how to install Hailo Model Zoo

KlausK · September 10, 2024, 11:23pm

Welcome to the Hailo Community!

You do not need to install the Hailo Model Zoo. You can download precompiled models (HEFs) from the Model Zoo GitHub page. For instance Object detection models for Hailo-8L.

GitHub - Hailo Model Zoo - Hailo-8L - Object detection

To convert your own models you will need to install the Hailo AI Software Suite docker on a x86 Ubuntu machine. The suite comes with built in tutorials that show you the conversion process step by step. Just call the following command inside the docker.

hailo tutotorial

This will start a Jupyter Notebook server with notebooks for each step.

The suite also includes a hailomz command line tool that lets you convert models from the Model Zoo.

After going trough the tutorial call hailomz --help and you will notice the same steps (parse, optimize, compile) as in the general workflow. This will allow you to convert the models from the Model Zoo while using fewer parameters. The hailo_model_zoo directory contains yaml and alls files that tell hailomz what parameters to use to convert the models.

dario.ravarro · September 11, 2024, 8:53am

Hi
as I want to run this code in RPI5 with hailo8L (for fast_sam), but it requires hailo model zoo to be installed…how to proceed then?
https://github.com/hailo-ai/Hailo-Application-Code-Examples/blob/main/runtime/python/instance_segmentation/yoloseg_inference.py

omria · September 11, 2024, 9:31am

To get started, please visit our Hailo Model Zoo repository on GitHub:

You’ll find a Quick Start guide there that should walk you through the process. if you run into any issues or have questions, feel free to post them here in the community.

samjy_chang · September 11, 2024, 4:13pm

hi Klausk
thanks for your help,
When I run Hailo-Application-Code-Examples/runtime/python/streaming at main · hailo-ai/Hailo-Application-Code-Examples · GitHub,
The script will call Hailo model zoo but Raspi5 can’t install DFC then it also can’t install Hailo model zoo. Have you successfully run this on a Raspberry Pi 5?

Best regards
Sam

samjy_chang · September 14, 2024, 2:31am

Ｈi Klausk sir,
Thanks for your help. I have followed your instruction to download model from git hub but When I run the “/Hailo-Application-Code-Examples/runtime/python/streaming/yolox_stream_inference.py”. it is same error as photo

dario.ravarro · September 16, 2024, 6:49am

facing same issue…any solution?

omria · September 16, 2024, 7:17am

Hey

In this part of the example, we’re utilizing the Hailo Model Zoo for post-processing. Here’s the relevant code:

# Create a dictionary that maps tensor shapes to layer names (needed for post-processing)
layer_from_shape: dict = {infer_results[key].shape: key for key in infer_results.keys()}

from hailo_model_zoo.core.postprocessing.detection import yolo

# Post-processing configuration, as recommended in hailo_model_zoo/cfg/base/yolox.yaml
anchors = {"strides": [32, 16, 8], "sizes": [[1, 1], [1, 1], [1, 1]]}
yolox_post_proc = yolo.YoloPostProc(
    img_dims=(INPUT_RES_H, INPUT_RES_W), 
    nms_iou_thresh=0.65, 
    score_threshold=0.01, 
    anchors=anchors, 
    output_scheme=None, 
    classes=80, 
    labels_offset=1, 
    meta_arch="yolox", 
    device_pre_post_layers=[]
)

# The order here is crucial since the reorganized tensor must be in (BS,H,W,85) shape
endnodes = [
    infer_results[layer_from_shape[1, 80, 80, 4]],  # stride 8
    infer_results[layer_from_shape[1, 80, 80, 1]],  # stride 8
    infer_results[layer_from_shape[1, 80, 80, 80]], # stride 8
    infer_results[layer_from_shape[1, 40, 40, 4]],  # stride 16
    infer_results[layer_from_shape[1, 40, 40, 1]],  # stride 16
    infer_results[layer_from_shape[1, 40, 40, 80]], # stride 16
    infer_results[layer_from_shape[1, 20, 20, 4]],  # stride 32
    infer_results[layer_from_shape[1, 20, 20, 1]],  # stride 32
    infer_results[layer_from_shape[1, 20, 20, 80]]  # stride 32
]

# Process the outputs
hailo_preds = yolox_post_proc.yolo_postprocessing(endnodes)
num_detections = int(hailo_preds['num_detections'])
scores = hailo_preds["detection_scores"][0].numpy()
classes = hailo_preds["detection_classes"][0].numpy()
boxes = hailo_preds["detection_boxes"][0].numpy()

# Set the number of detections to 0 if the first score is 0
if scores[0] == 0:
    num_detections = 0

# Create a dictionary to hold prediction results
preds_dict = {
    'scores': scores,
    'classes': classes,
    'boxes': boxes,
    'num_detections': num_detections
}

# Report and visualize detections
frame = report.report_detections(
    preds_dict, 
    frame, 
    scale_factor_x=orig_w, 
    scale_factor_y=orig_h
)
cv2.imshow('frame', frame)

If you’d like, you can either replace the operations performed by the Hailo Model Zoo with your own post-processing steps, or try running other examples. However, please note that these examples have primarily been tested on x86 architectures and are not optimized for Raspberry Pi (RPI) at this time. We’re working on versions specifically optimized for RPI, similar to what we did with the TAPPAS , which now works on RPI as RPI-examples.

Let me know if you encounter any issues or if you would like further assistance with this setup.

Best regards,
Omri

dario.ravarro · September 17, 2024, 6:41am

Hi
we cannot install hailo_model_zoo in rpi…so how to test the examples from Hailo-Application-Code-Examples in rpi ?
Specifically I want to test fast sam
Thank you

omria · September 17, 2024, 8:03am

Hey @dario.ravarro,

If you take a look at this inference example: Hailo YOLOSeg Inference Example, you’ll notice that the post-processing function imports from the Hailo model zoo.

To use this example on the Raspberry Pi (RPI), you’ll need to write your own post-processing function and a function to visualize the results. That’s why we didn’t recommend using these examples directly until they are optimized. The best way to proceed would be to review the post-processing functions in the Hailo model zoo and integrate them into your example instead of importing them.

Here are the functions you’ll need:

def visualize_yolov5_seg_results(
    detections, img, class_names=None, alpha=0.5, score_thres=0.25, mask_thresh=0.5, max_boxes_to_draw=20, **kwargs
):
    img_idx = 0
    img_out = img[img_idx].copy()
    boxes = detections["detection_boxes"]

    # Scale the boxes to match the input image dimensions
    boxes[:, 0::2] *= img_out.shape[1]
    boxes[:, 1::2] *= img_out.shape[0]

    masks = detections["mask"] > mask_thresh
    scores = detections["detection_scores"]
    classes = detections["detection_classes"]
    
    skip_boxes = kwargs.get("meta_arch", "") == "yolov8_seg_postprocess" and kwargs.get("classes", "") == 1
    keep = scores > score_thres
    boxes, masks, scores, classes = boxes[keep], masks[keep], scores[keep], classes[keep]

    max_boxes = min(max_boxes_to_draw, len(keep))
    boxes, masks, scores, classes = boxes[:max_boxes], masks[:max_boxes], scores[:max_boxes], classes[:max_boxes]

    for idx, mask in enumerate(masks):
        xmin, ymin, xmax, ymax = boxes[idx].astype(np.int32)
        color = np.random.randint(0, 255, size=3, dtype=np.uint8)
        
        if not skip_boxes:
            img_out = cv2.rectangle(img_out, (xmin, ymin), (xmax, ymax), [int(c) for c in color], 3)

        if np.sum(mask) > 0:
            polygons, _ = mask_to_polygons(mask)
            mask_overlay = np.repeat(mask[:, :, np.newaxis], 3, axis=2) * color
            img_out = cv2.addWeighted(mask_overlay, alpha, img_out, 1, 0)
            
            for polygon in polygons:
                img_out = cv2.polylines(
                    img_out, [polygon.reshape((-1, 1, 2)).astype(np.int32)], isClosed=True, color=color, thickness=1
                )

            if not skip_boxes:
                label = f"{class_names[int(classes[idx])]}"
                score = f"{int(100 * scores[idx])}%"
                text = f"{label}: {score}"
                (w, h), _ = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.5, thickness=2)
                org = (xmin, ymin)
                img_out = cv2.putText(img_out, text, org, fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.5, color=[255, 255, 255], thickness=2)

    return img_out

def yolov8_seg_postprocess(endnodes, device_pre_post_layers=None, **kwargs):
    num_classes = kwargs["classes"]
    strides = kwargs["anchors"]["strides"][::-1]
    image_dims = tuple(kwargs["img_dims"])
    reg_max = kwargs["anchors"]["regression_length"]
    
    raw_boxes = endnodes[:7:3]
    scores = [np.reshape(s, (-1, s.shape[1] * s.shape[2], num_classes)) for s in endnodes[1:8:3]]
    scores = np.concatenate(scores, axis=1)
    
    decoded_boxes = _yolov8_decoding(raw_boxes, strides, image_dims, reg_max)
    score_thres = kwargs["score_threshold"]
    iou_thres = kwargs["nms_iou_thresh"]
    
    proto_data = endnodes[9]
    batch_size, _, _, n_masks = proto_data.shape
    fake_objectness = np.ones((scores.shape[0], scores.shape[1], 1))
    scores_obj = np.concatenate([fake_objectness, scores], axis=-1)
    
    coeffs = [np.reshape(c, (-1, c.shape[1] * c.shape[2], n_masks)) for c in endnodes[2:9:3]]
    coeffs = np.concatenate(coeffs, axis=1)

    predictions = np.concatenate([decoded_boxes, scores_obj, coeffs], axis=2)
    nms_res = non_max_suppression(predictions, conf_thres=score_thres, iou_thres=iou_thres, multi_label=True)
    
    outputs = []
    for b in range(batch_size):
        protos = proto_data[b]
        masks = process_mask(protos, nms_res[b]["mask"], nms_res[b]["detection_boxes"], image_dims, upsample=True)
        
        output = {
            "detection_boxes": np.array(nms_res[b]["detection_boxes"]),
            "mask": np.transpose(masks, (0, 1, 2)) if masks is not None else None,
            "detection_scores": np.array(nms_res[b]["detection_scores"]),
            "detection_classes": np.array(nms_res[b]["detection_classes"]).astype(int)
        }
        outputs.append(output)
    
    return outputs

Let me know if you have any questions!
Regards

dario.ravarro · September 19, 2024, 4:34am

I copied basically most of the content of instance_segmentation_postprocessing.py into yoloseg_inference and also installed cython to be able to import cython_nms.pyx.
Where can I upload the yoloseg_inference.py?

Running
python yoloseg_inference.py fast_sam_s.hef dog_bicycle.jpg fast
generates an output image in a new folder, with segmentation.
However the segmentation is missing some parts… How to fine tune it?
How to run segmentation on live stream from rpi camera?

omria · September 19, 2024, 5:30am

Great job!

It seems the missing segmentation might be related to a post-processing issue. The object is being detected, but the coverage isn’t complete across the entire object. I recommend making some adjustments to the post-processing and testing it again. Since the app was initially developed for x86 (PC) or VMS with Hailo, it might require some modifications in post-processing to work properly in this setup.

As for live streaming, you can refer to our implementation here: Hailo Live Stream Example.

dario.ravarro · September 23, 2024, 5:48am

The live stream example doesn’t work with RPI camera.
I am trying to use rpicamera2 in the segmentation code, but it’s unclear how to implement it properly in yoloseg_inference.py

dario.ravarro · September 27, 2024, 5:45am

how to modify yoloseg_inference to get images from RPI camera?

abdullayev1705 · December 9, 2024, 2:13am

@omria I’m inferencing yolov8 instance segmentation model and depth model parallel on the device Hailo-8, I manually modified from (hailo_models_zoo) post-processing functions for the segmentation model , But the issue is that the post-processing function takes a longer time than the model inference time, I see there is potential solutions using libyolo_post.so functions available on Tappas, using GStreamer can significantly reduce post-processing time , but I planing use multiprocessing instead of GStreamer, is there possible suggestions using libyolo_post function without gstreamer?

armtronix2021 · January 13, 2025, 6:03am

Hi can you tell me where can i find this function
_yolov8_decoding
Got an error here _yolov8_decoding is not defined

armtronix2021 · January 13, 2025, 12:20pm

Found it here :Hailo zoo github

Also sharing the code which worked on Pi with hailo 8l , image size 640X640(used my custom model)


import numpy as np
import cv2
from hailo_platform import (HEF, Device, VDevice, HailoStreamInterface, InferVStreams, ConfigureParams,
                InputVStreamParams, OutputVStreamParams, FormatType)
from zenlog import log
from PIL import Image
import os
import argparse
from cython_utils.cython_nms import nms as cnms


parser = argparse.ArgumentParser(description='Running a Hailo inference with actual images using Hailo API and OpenCV')
parser.add_argument('hef', help="HEF file path")
parser.add_argument('images', help="Images path to perform inference on. Could be either a single image or a folder containing the images")
parser.add_argument('arch', help="The architecture type of the model: v5, v8 or fast")
parser.add_argument('--class-num', help="The number of classes the model is trained on. Defaults to 80 for v5 and v8, and 1 for fast_sam.", default=80)
parser.add_argument('--output_dir', help="The path to the output directory where the images would be save. Default to the output_images folder in currect directory.")
args = parser.parse_args()


kwargs = {}
kwargs['device_pre_post_layers'] = None


## strides - Constant scalar per bounding box for scaling. One for each anchor by size of the anchor values
## sizes - The actual anchors for the bounding boxes
## regression_length - The regression length required for the distance estimation
arch_dict = {'v5':
                { 'anchors': 
                 {'strides': [32,16,8], 
                  'sizes': np.array([[116, 90, 156, 198, 373, 326], [30, 61, 62, 45, 59, 119], [10, 13, 16, 30, 33, 23]])}
                },
             'v8': {'anchors':
                 {'strides': [32,16,8], 
                  'regression_length' : 15}
                 },
            'fast': {'anchors':
                 {'strides': [32,16,8], 
                  'regression_length' : 15}}
            }


# --------------------------------------------------------- #

# ---------------- Architecture functions ----------------- #
def _softmax(x):
    return np.exp(x) / np.expand_dims(np.sum(np.exp(x), axis=-1), axis=-1)

def mask_to_polygons(mask):
    mask = np.ascontiguousarray(mask)
    res = cv2.findContours(mask.astype("uint8"), cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
    hierarchy = res[-1]
    if hierarchy is None:  # empty mask
        return [], False
    has_holes = (hierarchy.reshape(-1, 4)[:, 3] >= 0).sum() > 0
    res = res[-2]
    res = [x.flatten() for x in res]
    res = [x + 0.5 for x in res if len(x) >= 6]
    return res, has_holes

def crop_mask(masks, boxes):
    """
    Zeroing out mask region outside of the predicted bbox.
    Args:
        masks: numpy array of masks with shape [n, h, w]
        boxes: numpy array of bbox coords with shape [n, 4]
    """

    n_masks, _, _ = masks.shape
    integer_boxes = np.ceil(boxes).astype(int)
    x1, y1, x2, y2 = np.array_split(np.where(integer_boxes > 0, integer_boxes, 0), 4, axis=1)
    for k in range(n_masks):
        masks[k, : y1[k, 0], :] = 0
        masks[k, y2[k, 0] :, :] = 0
        masks[k, :, : x1[k, 0]] = 0
        masks[k, :, x2[k, 0] :] = 0
    return masks

def process_mask(protos, masks_in, bboxes, shape, upsample=True, downsample=False):
    mh, mw, c = protos.shape
    ih, iw = shape
    masks = _sigmoid(masks_in @ protos.reshape((-1, c)).transpose((1, 0))).reshape((-1, mh, mw))

    downsampled_bboxes = bboxes.copy()
    if downsample:
        downsampled_bboxes[:, 0] *= mw / iw
        downsampled_bboxes[:, 2] *= mw / iw
        downsampled_bboxes[:, 3] *= mh / ih
        downsampled_bboxes[:, 1] *= mh / ih

        masks = crop_mask(masks, downsampled_bboxes)

    if upsample:
        if not masks.shape[0]:
            return None
        masks = cv2.resize(np.transpose(masks, axes=(1, 2, 0)), shape, interpolation=cv2.INTER_LINEAR)
        if len(masks.shape) == 2:
            masks = masks[..., np.newaxis]
        masks = np.transpose(masks, axes=(2, 0, 1))  # CHW

    if not downsample:
        masks = crop_mask(masks, downsampled_bboxes)  # CHW

    return masks

def _sigmoid(x):
    return 1 / (1 + np.exp(-x))

def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, max_det=300, nm=32, multi_label=True):
    """Non-Maximum Suppression (NMS) on inference results to reject overlapping detections
    Args:
        prediction: numpy.ndarray with shape (batch_size, num_proposals, 351)
        conf_thres: confidence threshold for NMS
        iou_thres: IoU threshold for NMS
        max_det: Maximal number of detections to keep after NMS
        nm: Number of masks
        multi_label: Consider only best class per proposal or all conf_thresh passing proposals
    Returns:
         A list of per image detections, where each is a dictionary with the following structure:
         {
            'detection_boxes':   numpy.ndarray with shape (num_detections, 4),
            'mask':              numpy.ndarray with shape (num_detections, 32),
            'detection_classes': numpy.ndarray with shape (num_detections, 80),
            'detection_scores':  numpy.ndarray with shape (num_detections, 80)
         }
    """

    assert 0 <= conf_thres <= 1, f"Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0"
    assert 0 <= iou_thres <= 1, f"Invalid IoU threshold {iou_thres}, valid values are between 0.0 and 1.0"

    nc = prediction.shape[2] - nm - 5  # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    max_wh = 7680  # (pixels) maximum box width and height
    mi = 5 + nc  # mask start index
    output = []
    for xi, x in enumerate(prediction):  # image index, image inference
        x = x[xc[xi]]  # confidence
        # If none remain process next image
        if not x.shape[0]:
            output.append(
                {
                    "detection_boxes": np.zeros((0, 4)),
                    "mask": np.zeros((0, 32)),
                    "detection_classes": np.zeros((0, 80)),
                    "detection_scores": np.zeros((0, 80)),
                }
            )
            continue

        # Confidence = Objectness X Class Score
        x[:, 5:] *= x[:, 4:5]

        # (center_x, center_y, width, height) to (x1, y1, x2, y2)
        boxes = xywh2xyxy(x[:, :4])
        mask = x[:, mi:]

        multi_label &= nc > 1
        if not multi_label:
            conf = np.expand_dims(x[:, 5:mi].max(1), 1)
            j = np.expand_dims(x[:, 5:mi].argmax(1), 1).astype(np.float32)

            keep = np.squeeze(conf, 1) > conf_thres
            x = np.concatenate((boxes, conf, j, mask), 1)[keep]
        else:
            i, j = (x[:, 5:mi] > conf_thres).nonzero()
            x = np.concatenate((boxes[i], x[i, 5 + j, None], j[:, None].astype(np.float32), mask[i]), 1)

        # sort by confidence
        x = x[x[:, 4].argsort()[::-1]]

        # per-class NMS
        cls_shift = x[:, 5:6] * max_wh
        boxes = x[:, :4] + cls_shift
        conf = x[:, 4:5]
        preds = np.hstack([boxes.astype(np.float32), conf.astype(np.float32)])

        keep = cnms(preds, iou_thres)
        if keep.shape[0] > max_det:
            keep = keep[:max_det]

        out = x[keep]
        scores = out[:, 4]
        classes = out[:, 5]
        boxes = out[:, :4]
        masks = out[:, 6:]

        out = {"detection_boxes": boxes, "mask": masks, "detection_classes": classes, "detection_scores": scores}

        output.append(out)

    return output

def xywh2xyxy(x):
    y = np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2
    y[:, 1] = x[:, 1] - x[:, 3] / 2
    y[:, 2] = x[:, 0] + x[:, 2] / 2
    y[:, 3] = x[:, 1] + x[:, 3] / 2
    return y

def _yolov8_decoding(raw_boxes, strides, image_dims, reg_max):
    boxes = None
    for box_distribute, stride in zip(raw_boxes, strides):
        # create grid
        shape = [int(x / stride) for x in image_dims]
        grid_x = np.arange(shape[1]) + 0.5
        grid_y = np.arange(shape[0]) + 0.5
        grid_x, grid_y = np.meshgrid(grid_x, grid_y)
        ct_row = grid_y.flatten() * stride
        ct_col = grid_x.flatten() * stride
        center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1)

        # box distribution to distance
        reg_range = np.arange(reg_max + 1)
        box_distribute = np.reshape(
            box_distribute, (-1, box_distribute.shape[1] * box_distribute.shape[2], 4, reg_max + 1)
        )
        box_distance = _softmax(box_distribute)
        box_distance = box_distance * np.reshape(reg_range, (1, 1, 1, -1))
        box_distance = np.sum(box_distance, axis=-1)
        box_distance = box_distance * stride

        # decode box
        box_distance = np.concatenate([box_distance[:, :, :2] * (-1), box_distance[:, :, 2:]], axis=-1)
        decode_box = np.expand_dims(center, axis=0) + box_distance

        xmin = decode_box[:, :, 0]
        ymin = decode_box[:, :, 1]
        xmax = decode_box[:, :, 2]
        ymax = decode_box[:, :, 3]
        decode_box = np.transpose([xmin, ymin, xmax, ymax], [1, 2, 0])

        xywh_box = np.transpose([(xmin + xmax) / 2, (ymin + ymax) / 2, xmax - xmin, ymax - ymin], [1, 2, 0])
        boxes = xywh_box if boxes is None else np.concatenate([boxes, xywh_box], axis=1)
    return boxes  # tf.expand_dims(boxes, axis=2)

def visualize_yolov5_seg_results(
    detections, img, class_names=None, alpha=0.5, score_thres=0.25, mask_thresh=0.5, max_boxes_to_draw=20, **kwargs
):
    img_idx = 0
    img_out = img[img_idx].copy()
    boxes = detections["detection_boxes"]

    # Scale the boxes to match the input image dimensions
    boxes[:, 0::2] *= img_out.shape[1]
    boxes[:, 1::2] *= img_out.shape[0]

    masks = detections["mask"] > mask_thresh
    scores = detections["detection_scores"]
    classes = detections["detection_classes"]
    
    #print("Classes:",classes)
    skip_boxes = kwargs.get("meta_arch", "") == "yolov8_seg_postprocess" and kwargs.get("classes", "") == 1
    keep = scores > score_thres
    boxes, masks, scores, classes = boxes[keep], masks[keep], scores[keep], classes[keep]

    max_boxes = min(max_boxes_to_draw, len(keep))
    boxes, masks, scores, classes = boxes[:max_boxes], masks[:max_boxes], scores[:max_boxes], classes[:max_boxes]
    
    for idx, mask in enumerate(masks):
        xmin, ymin, xmax, ymax = boxes[idx].astype(np.int32)
        color = np.random.randint(0, 255, size=3, dtype=np.uint8)
        color_1 = np.array(np.random.randint(0, 255, size=3, dtype=np.uint8)).tolist()
        #print("Colour:",color)
        if not skip_boxes:
            img_out = cv2.rectangle(img_out, (int(xmin/640), int(ymin/640)), (int(xmax/640), int(ymax/640)), [int(c) for c in color], 1)

        if np.sum(mask) > 0:
            polygons, _ = mask_to_polygons(mask)
            mask_overlay = np.repeat(mask[:, :, np.newaxis], 3, axis=2) * color
            img_out = cv2.addWeighted(mask_overlay, alpha, img_out, 1, 0)
            
            for polygon in polygons:
                img_out = cv2.polylines(
                    img_out, [polygon.reshape((-1, 1, 2)).astype(np.int32)], isClosed=True, color=color_1, thickness=1
                )
            #print("Class_index:",[int(classes[idx])])
            #print("Score_index:",int(100 * scores[idx]))
            #print("Class names:",class_names)
            if not skip_boxes:
                label = f"Class{int(classes[idx])}"
                score = f"{int(100 * scores[idx])}%"
                text = f"{label}: {score}"
                print(text)
                (w, h), _ = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.5, thickness=2)
                org = (int(xmin/640), int(ymin/640))
                print(org)
                img_out = cv2.putText(img_out, text, org, fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.5, color=color_1, thickness=1)

    return img_out

def yolov8_seg_postprocess(endnodes, device_pre_post_layers=None, **kwargs):
    num_classes = kwargs["classes"]
    strides = kwargs["anchors"]["strides"][::-1]
    image_dims = tuple(kwargs["img_dims"])
    reg_max = kwargs["anchors"]["regression_length"]
    
    raw_boxes = endnodes[:7:3]
    scores = [np.reshape(s, (-1, s.shape[1] * s.shape[2], num_classes)) for s in endnodes[1:8:3]]
    scores = np.concatenate(scores, axis=1)
    
    decoded_boxes = _yolov8_decoding(raw_boxes, strides, image_dims, reg_max)
    score_thres = kwargs["score_threshold"]
    iou_thres = kwargs["nms_iou_thresh"]
    
    proto_data = endnodes[9]
    batch_size, _, _, n_masks = proto_data.shape
    fake_objectness = np.ones((scores.shape[0], scores.shape[1], 1))
    scores_obj = np.concatenate([fake_objectness, scores], axis=-1)
    
    coeffs = [np.reshape(c, (-1, c.shape[1] * c.shape[2], n_masks)) for c in endnodes[2:9:3]]
    coeffs = np.concatenate(coeffs, axis=1)

    predictions = np.concatenate([decoded_boxes, scores_obj, coeffs], axis=2)
    nms_res = non_max_suppression(predictions, conf_thres=score_thres, iou_thres=iou_thres, multi_label=True)
    
    outputs = []
    for b in range(batch_size):
        protos = proto_data[b]
        masks = process_mask(protos, nms_res[b]["mask"], nms_res[b]["detection_boxes"], image_dims, upsample=True)
        
        output = {
            "detection_boxes": np.array(nms_res[b]["detection_boxes"]),
            "mask": np.transpose(masks, (0, 1, 2)) if masks is not None else None,
            "detection_scores": np.array(nms_res[b]["detection_scores"]),
            "detection_classes": np.array(nms_res[b]["detection_classes"]).astype(int)
        }
        outputs.append(output)
    
    return outputs

def postproc_yolov8seg(raw_detections):
    
    raw_detections_keys = list(raw_detections.keys())
    layer_from_shape: dict = {raw_detections[key].shape:key for key in raw_detections_keys}
    
    mask_channels = 32
    
    detection_output_channels = (kwargs['anchors']['regression_length'] + 1) * 4 # (regression length + 1) * num_coordinates
    
    endnodes = [raw_detections[layer_from_shape[1, 80, 80, detection_output_channels]],
                raw_detections[layer_from_shape[1, 80, 80, kwargs['classes']]],
                raw_detections[layer_from_shape[1, 80, 80, mask_channels]],
                raw_detections[layer_from_shape[1, 40, 40, detection_output_channels]],
                raw_detections[layer_from_shape[1, 40, 40, kwargs['classes']]],
                raw_detections[layer_from_shape[1, 40, 40, mask_channels]],
                raw_detections[layer_from_shape[1, 20, 20, detection_output_channels]],
                raw_detections[layer_from_shape[1, 20, 20, kwargs['classes']]],
                raw_detections[layer_from_shape[1, 20, 20, mask_channels]],
                raw_detections[layer_from_shape[1, 160, 160, mask_channels]]]
    
    
    predictions_dict = yolov8_seg_postprocess(endnodes, **kwargs)
    
    return predictions_dict
                    

def postproc_yolov5seg(raw_detections):
        
    raw_detections_keys = list(raw_detections.keys())
    layer_from_shape: dict = {raw_detections[key].shape:key for key in raw_detections_keys}
    
    mask_channels = 32
    
    detection_channels = (kwargs['classes'] + 4 + 1 + mask_channels) *  len(kwargs['anchors']['strides']) # (num_classes + num_coordinates + objectness + mask) * strides_list_len 
    
    endnodes = [raw_detections[layer_from_shape[1, 160, 160, mask_channels]],
                raw_detections[layer_from_shape[1, 80, 80, detection_channels]],
                raw_detections[layer_from_shape[1, 40, 40, detection_channels]],
                raw_detections[layer_from_shape[1, 20, 20, detection_channels]]]
    
    predictions_dict = yolov5_seg_postprocess(endnodes, **kwargs)
    
    return predictions_dict


# ---------------- Pre-processing functions ----------------- #

def letterbox_image(image, size):
    '''resize image with unchanged aspect ratio using padding'''
    img_w, img_h = image.size
    model_input_w, model_input_h = size
    scale = min(model_input_w / img_w, model_input_h / img_h)
    scaled_w = int(img_w * scale)
    scaled_h = int(img_h * scale)
    image = image.resize((scaled_w, scaled_h), Image.Resampling.BICUBIC)
    new_image = Image.new('RGB', size, (114,114,114))
    new_image.paste(image, ((model_input_w - scaled_w) // 2, (model_input_h - scaled_h) // 2))
    return new_image

def preproc(image, width=640, height=640, normalized=True):
    image = letterbox_image(image, (width, height))
    if normalized == False:
        ## normalized_image = (base - mean) / std, given mean=0.0, std=255.0
        image = np.array(image)
        image[:,:, 0] = image[:,:, 0] / 255.0
        image[:,:, 1] = image[:,:, 1] / 255.0
        image[:,:, 2] = image[:,:, 2] / 255.0
    
    return image

def load_input_images(images_path, images):
    # if running inference on a single image:
    if (images_path.endswith('.jpg') or images_path.endswith('.png') or images_path.endswith('.bmp') or images_path.endswith('.jpeg')):
        images.append(Image.open(images_path))
    # if running inference on an images directory:
    if (os.path.isdir(images_path)):
        for img in os.listdir(images_path):
            if (img.endswith(".jpg") or img.endswith(".png") or img.endswith('.bmp') or img.endswith('.jpeg')):
                images.append(Image.open(os.path.join(images_path, img)))
                
# ---------------- Start of the example --------------------- #

func_dict = {
            'yolov5_seg': postproc_yolov5seg,
            'yolov8_seg': postproc_yolov8seg,
            'fast_sam': postproc_yolov8seg,
            }

images_path = args.images

images = []

load_input_images(images_path, images)

anchors = {}
meta_arch = ''

arch = args.arch
arch_list = arch_dict.keys()

num_of_classes = args.class_num

if arch in arch_list:
    anchors = arch_dict[arch]
    kwargs['anchors'] = arch_dict[arch]['anchors']
    if arch == 'v5':
        meta_arch = 'yolov5_seg'
        kwargs['score_threshold'] = 0.001
        kwargs['nms_iou_thresh'] = 0.6
    if arch == 'v8':
        meta_arch = 'yolov8_seg'
        kwargs['score_threshold'] = 0.001
        kwargs['nms_iou_thresh'] = 0.7
        kwargs['meta_arch'] = 'yolov8_seg_postprocess'
    if arch == 'fast':
        meta_arch = 'fast_sam'
        kwargs['score_threshold'] = 0.25
        kwargs['nms_iou_thresh'] = 0.7
        kwargs['meta_arch'] = 'yolov8_seg_postprocess'
        num_of_classes = '1'
        kwargs['classes'] = 1
        
else:
    error = 'Not a valid architecture. Please choose an architecture from the this list: v5, v8, fast'
    raise ValueError(error)


kwargs['classes'] = int(num_of_classes)

output_dir = args.output_dir

if not output_dir:
    output_dir = 'output_images'

devices = Device.scan()
hef = HEF(args.hef)

inputs = hef.get_input_vstream_infos()
outputs = hef.get_output_vstream_infos()

with VDevice(device_ids=devices) as target:
    configure_params = ConfigureParams.create_from_hef(hef, interface=HailoStreamInterface.PCIe)
    network_group = target.configure(hef, configure_params)[0]
    network_group_params = network_group.create_params()

    [log.info('Input  layer: {} {}'.format(layer_info.name, layer_info.shape)) for layer_info in inputs]
    [log.info('Output layer: {} {}'.format(layer_info.name, layer_info.shape)) for layer_info in outputs]

    height, width, _ = hef.get_input_vstream_infos()[0].shape
    
    kwargs['img_dims'] = (height,width)

    input_vstream_info = hef.get_input_vstream_infos()[0]

    input_vstreams_params = InputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=FormatType.FLOAT32)
    output_vstreams_params = OutputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=FormatType.FLOAT32)

    with InferVStreams(network_group, input_vstreams_params, output_vstreams_params) as infer_pipeline:
        for i, image in enumerate(images):
            processed_image = preproc(image, height=height, width=width)
                        
            input_data = {input_vstream_info.name: np.expand_dims(processed_image, axis=0).astype(np.float32)}
            
            with network_group.activate(network_group_params):
                raw_detections = infer_pipeline.infer(input_data)
                
                results = func_dict[meta_arch](raw_detections)[0]
                                        
                output_path = os.path.join(os.path.realpath('.'), 'output_images')
                if not os.path.isdir(output_path): 
                    os.mkdir(output_path)
                                                                       
                processed_img = Image.fromarray(visualize_yolov5_seg_results(results, np.expand_dims(np.array(processed_image), axis=0), score_thres=0.3, class_names=num_of_classes, **kwargs))
                
                processed_img.save(f'{output_dir}/output_image{i}.jpg', 'JPEG')
type or paste code here

Had to copy this to the folder where i had my code :cython_utils
refer :from cython_utils.cython_nms import nms as cnms

Thanks to omria and dario.ravarro

Topic		Replies	Views
Raspberry Pi 5 and Hailo and YOLO Conversion General	1	659	August 4, 2024
Hailo-8 model conversion - CLI General hailo8	1	201	February 17, 2025
Using unsupported ai models on Hailo8L + rpi5 General	3	320	February 21, 2025
Raspberry Pi 5 Hailo8L Python Yolo Object detection setup example General raspberry-pi , hailo8 , error	2	320	March 13, 2025
Yolov8m compiled for raspberry pi 5 General	1	469	August 8, 2024

How to install Hailo Model Zoo

Related topics