I’m following the hailo tutorial
notebook DFC_2_Model_Optimization_Tutorial
- this ran fine out of the box, but I’ve tried to adapt this for YOLOv11n. The output has a shape [80, 5, 100]
, based on the documentation, this has to be 80classes, 5 data points, 100 detections per class (based on the nms yaml).
The important bits of code are all added below, if you want to see the notebook, it’s uploaded to colab, of course a lot of this won’t run in colab as is - since I’m using this locally in a docker container. Colab was just an easy way to share the notebook
If I look into the output for a specific class, and ignore the many rows of zeros, I’ll get something like this for that one class:
array([[0.39044172, 0.01051593, 0.61553806, 0.24089272, 0.91469216],
[0.48778102, 0.8696294 , 0.6793776 , 0.9998992 , 0.5478364 ]],
I’m unsure if I’m just getting garbage predictions or I’m misunderstanding the output format. Is this xywhn
+ confidence, or something else? For comparision, running model.predict() in ultralytics (with yolo11n.pt
) will yield these results for this same image:
model = YOLO('yolo11n.pt')
results = model.predict(sample_dataset[0,:,:,:], imgsz=640, conf=0.2)
# Process results list
for result in results:
boxes = result.boxes # Boxes object for bounding box outputs
masks = result.masks # Masks object for segmentation masks outputs
keypoints = result.keypoints # Keypoints object for pose outputs
probs = result.probs # Probs object for classification outputs
obb = result.obb # Oriented boxes object for OBB outputs
that will yield this output:
ultralytics.engine.results.Boxes object with attributes:
cls: tensor([62., 62.], device='cuda:0')
conf: tensor([0.9115, 0.2777], device='cuda:0')
data: tensor([[6.1889e+00, 2.4986e+02, 1.5438e+02, 3.9446e+02, 9.1155e-01, 6.2000e+01],
[5.5910e+02, 3.1260e+02, 6.4000e+02, 4.2976e+02, 2.7771e-01, 6.2000e+01]], device='cuda:0')
id: None
is_track: False
orig_shape: (640, 640)
shape: torch.Size([2, 6])
xywh: tensor([[ 80.2820, 322.1584, 148.1863, 144.6057],
[599.5480, 371.1809, 80.9040, 117.1644]], device='cuda:0')
xywhn: tensor([[0.1254, 0.5034, 0.2315, 0.2259],
[0.9368, 0.5800, 0.1264, 0.1831]], device='cuda:0')
xyxy: tensor([[ 6.1889, 249.8556, 154.3751, 394.4613],
[559.0960, 312.5987, 640.0000, 429.7631]], device='cuda:0')
xyxyn: tensor([[0.0097, 0.3904, 0.2412, 0.6163],
[0.8736, 0.4884, 1.0000, 0.6715]], device='cuda:0')
The code I’ve used is as follows, the notebook is here
import torchvision as tv
import torch
def preproc(image, output_height=640, output_width=640):
preprocess = tv.transforms.Compose([
tv.transforms.Resize((output_height, output_width)),
data = np.array(preprocess(image))
return data
data_batch_size = 1500
images_path = "../data/coco/images/val2017"
images_list = [img_name for img_name in os.listdir(images_path) if os.path.splitext(img_name)[1] == ".jpg"]
calib_dataset = np.zeros((data_batch_size, 640, 640, 3))
for idx, img_name in enumerate(sorted(images_list)):
if idx==data_batch_size:
img = Image.open(os.path.join(images_path, img_name)).convert('RGB')
img_preproc = preproc(img)
calib_dataset[idx, :, :, :] = img_preproc
np.save("calib_set.npy", calib_dataset)
The above was just to setup the calibration dataset, I’ve used COCO2017 for this via manual download (using the ultralytics yaml) and only downloading the val2017 set.
# Second, we will load our parsed HAR from the Parsing Tutorial
model_name = "yolo11n"
hailo_model_har_name = f"{model_name}_hailo_model.har"
assert os.path.isfile(hailo_model_har_name), "Please provide valid path for HAR file"
runner = ClientRunner(har=hailo_model_har_name)
# By default it uses the hw_arch that is saved on the HAR. For overriding, use the hw_arch flag.
# Now we will create a model script, that tells the compiler to add a normalization on the beginning
# of the model (that is why we didn't normalize the calibration set;
# Otherwise we would have to normalize it before using it)
# this was taken from the hailo github alls script for yolov11n
alls = """
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv54, sigmoid)
change_output_activation(conv65, sigmoid)
change_output_activation(conv80, sigmoid)
nms_postprocess("./yolo11n_nms_config.json", meta_arch=yolov8, engine=cpu)
# Load the model script to ClientRunner so it will be considered on optimization
# Call Optimize to perform the optimization process
# Save the result state to a Quantized HAR file
quantized_model_har_path = f"{model_name}_quantized_model.har"
I then take this an run a basic inference to check the output:
sample_dataset = np.zeros((2, 640, 640, 3))
SAMPLE_IMAGE_PATH = "../data/coco/images/val2017/000000000139.jpg"
img = Image.open(SAMPLE_IMAGE_PATH).convert('RGB')
img_preproc = preproc(img)
img_preproc = torch.reshape(img_preproc, (640,640,3))
sample_dataset[0,:,:,:] = img_preproc.numpy()
# #Notice that we use the original images, because normalization is IN the model
with runner.infer_context(InferenceContext.SDK_FP_OPTIMIZED) as ctx:
modified_res = runner.infer(ctx, sample_dataset[:1, :, :, :])
if I take a closer look into the class_id == 62
, the televsion class, there are two detections:
tv_dets = modified_res[:, 62, :, :].reshape(5,100) # this is classid=62, which is a television
tv_dets.transpose()[:2, :]
this outputs:
array([[0.39044172, 0.01051593, 0.61553806, 0.24089272, 0.91469216],
[0.48778102, 0.8696294 , 0.6793776 , 0.9998992 , 0.5478364 ]],
compared to the ground truth for this class (62) in the same image:
62 0.127641 0.505153 0.233312 0.2227
62 0.934195 0.583462 0.127109 0.184812
I can’t seem to find any info regarding the output in the postprocessing docs
Any suggestions or comments on something I’ve missed?