Compile CLIP with the DFC

Hello, i am trying to compile the vision embedding from clip with the DFC. I have seen that hailo provides a demo for clip. But for my application, i have to be able to fine tune the model.

Currently i am getting a IndexError

in get_spatial_unflatten_reshape_info spatial_reshape_sizes = [output_shape[1], output_shape[2]] IndexError: list index out of range

I have read in a different post that hailo is working on the feature to compile Transformer.

> Transformer complation

However the demo for clip exists so there has to be a way to compile a vision transformer.

This is the code i use to get the onnx file of the ViT:

import torch
import clip
from PIL import Image

vision_arch = "ViT-B32"
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

print(clip.available_models())
vit = model.visual
textTransformer = model.token_embedding

image = preprocess(Image.open("CLIP/pics/constructionsite.png")).unsqueeze(0).to(device)  # Picture input

# ==================== #
# Export Model as ONNX #
# ==================== #

torch.onnx.export(vit,         # model being run 
         image,       # model input (or a tuple for multiple inputs) 
         f"{vision_arch}.onnx",       # where to save the model  
         export_params=True,  # store the trained parameter weights inside the model file 
         opset_version=15,    # the ONNX version to export the model to
         verbose=False,       # print network to console
         do_constant_folding=True,  # whether to execute constant folding for optimization 
         input_names = ['modelInput'],   # the model's input names 
         output_names = ['modelOutput'], # the model's output names 
         dynamic_axes={'modelInput' : {0 : 'batch_size'},    # variable length axes 
                        'modelOutput' : {0 : 'batch_size'}})
print(f"model saved as {vision_arch}.onnx")

Further this is the code i try to use to compile the ViT:

import onnx

# General imports used throughout the tutorial
import tensorflow as tf
from IPython.display import SVG

# import the ClientRunner class from the hailo_sdk_client package
from hailo_sdk_client import ClientRunner


chosen_hw_arch = "hailo8"

onnx_model_name = "ViT-B32"
onnx_path = f"models/{onnx_model_name}.onnx"

runner = ClientRunner(hw_arch=chosen_hw_arch)
_ = runner.translate_onnx_model(
    onnx_path,
    onnx_model_name,
    start_node_names=["modelInput"],
    end_node_names=["modelOutput"],
    net_input_shapes={"modelInput": [1, 3, 224, 224]}
)

hailo_model_har_name = f"{onnx_model_name}_hailo_model.har"
runner.save_har(hailo_model_har_name)

Any help is welcome :wink:.

Hi @lukasOST,

The version of CLIP we offer in our ModelZoo is the resnet50 one. The ViT version support is planned for the near future.

Hey @nina-vilela thanks for the quick response.

I tryed the same procedere with the Resnet 50x4. To get the model i again used:

import torch
import clip
from PIL import Image

vision_arch = "RN50x4"
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("RN50x4", device=device)

print(clip.available_models())
vit = model.visual
textTransformer = model.token_embedding

image = preprocess(Image.open("pics/constructionsite.png")).unsqueeze(0).to(device)  # Picture input

# ==================== #
# Export Model as ONNX #
# ==================== #
vit.eval()
torch.onnx.export(vit,          # model being run 
         image,                 # model input (or a tuple for multiple inputs) 
         f"models/{vision_arch}.onnx",       # where to save the model  
         export_params=True,    # store the trained parameter weights inside the model file 
         opset_version=17,      # the ONNX version to export the model to
         verbose=False,         # print network to console
         do_constant_folding=True,          # whether to execute constant folding for optimization 
         input_names = ['modelInput'],      # the model's input names 
         output_names = ['modelOutput'],    # the model's output names 
         dynamic_axes={'modelInput' : {0 : 'batch_size'},    # variable length axes 
                        'modelOutput' : {0 : 'batch_size'}})
print(f"model saved as {vision_arch}.onnx")

After that i tryed to compile it with the DFC.

import onnx

# General imports used throughout the tutorial
import tensorflow as tf
from IPython.display import SVG

# import the ClientRunner class from the hailo_sdk_client package
from hailo_sdk_client import ClientRunner


chosen_hw_arch = "hailo8"

onnx_model_name = "RN50x4"
onnx_path = f"models/{onnx_model_name}.onnx"

runner = ClientRunner(hw_arch=chosen_hw_arch)
_ = runner.translate_onnx_model(
    onnx_path,
    onnx_model_name,
    start_node_names=["modelInput"],
    end_node_names=["modelOutput"],
    disable_rt_metadata_extraction=True,
    disable_shape_inference=True,
    net_input_shapes={"modelInput": [1, 3, 228, 228]}
)

hailo_model_har_name = f"{onnx_model_name}_hailo_model.har"
runner.save_har(hailo_model_har_name)

But it stops a the line which executes translate_onnx_model with this error

hailo_sdk_client.model_translator.exceptions.MisspellNodeError: Unable to find end node name: ['/attnpool/If'], please verify and try again.

The whole console output looks like this:

2024-10-16 14:35:45.951520: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-16 14:35:45.953671: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-10-16 14:35:45.978138: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-10-16 14:35:45.978712: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-16 14:35:46.550715: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[info] Translation started on ONNX model RN50x4
[info] Restored ONNX model RN50x4 (completion time: 00:00:00.91)
[info] Simplified ONNX model for a parsing retry attempt (completion time: 00:00:06.10)

I suspect that the error comes from a If statment at the end of the Onnx graph. Did i forget to install something or is it not posible to compile a if statement?

If its not possible how did Hailo manage to compile their demo from Clip?

@lukasOST

For your reference, here is the last layer that is parsed by the ModelZoo:

If there’s anything after that, it should be performed as a postprocess on the host.

@nina-vilela

Thank you for the reference, sadly mine looks different.


I neither find a part of my graph which looks similar to the reference.
Do you know the source for the network from the demo or if the graph was optimized before?

I found my error. It is the line

dynamic_axes={'modelInput' : {0 : 'batch_size'},    # variable length axes 
              'modelOutput' : {0 : 'batch_size'}}

in the torch.onnx.export function. It modified my onnx graph in a weird way.
If i run the export without this line the Graph looks similar to the one from the demo.
However if i try to compile the onnx file with the ClientRunner i still get the following error: IndexError: list index out of range.
The console output is the same as the one i displayed in the beginning of this post. Is this because there is a limitation in the size of the graph?

This error can happen when there are issues with the arguments passed for the parser. The whole model is supposed to be supported, so please try again without passing anything in the start and end node names.

I tried to compile it without the input and output shape.

# General imports used throughout the tutorial
import tensorflow as tf
from IPython.display import SVG

# import the ClientRunner class from the hailo_sdk_client package
from hailo_sdk_client import ClientRunner

chosen_hw_arch = "hailo8"

onnx_model_name = "RN50"
onnx_path = f"{onnx_model_name}.onnx"

runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_onnx_model(
    onnx_path,
    onnx_model_name,
)

But i get the same error as before. I tried the function with disable_rt_metadata_extraction = True and disable_shape_inference = True but the error still occurs. I also tried with start_node_names=[], end_node_names=[], but nothing changes.

I think i found a error in class ONNXGraphNode(NNGraphNode) in which the function get_spatial_unflatten_reshape_info spatial_reshape_sizes is defined.
Old function:

def get_spatial_unflatten_reshape_info(self):
        if self.op != "Reshape":
            consumed_vertices = [look_for_node(self._graph, self, [FwdChainNode(op="Reshape")])]
        else:
            transpose = look_for_node(self._graph, self, [FwdChainNode(op="Transpose")])
            consumed_vertices = [self, transpose] if transpose else [self]

        if len(consumed_vertices) < 1:
            raise UnexpectedNodeError(f"Failed to find reshape node in format conversion layer near {self.name}.")

        print(f"Failed near {self.name}.")
        output_shape = consumed_vertices[-1].get_output_shapes()[0]
        print(output_shape)
        spatial_reshape_sizes = [output_shape[1], output_shape[2]]
        return consumed_vertices, [output_shape], spatial_reshape_sizes

Changed function:

def get_spatial_unflatten_reshape_info(self):
        if self.op == "Reshape":
            consumed_vertices = [look_for_node(self._graph, self, [FwdChainNode(op="Reshape")])]
        else:
            transpose = look_for_node(self._graph, self, [FwdChainNode(op="Transpose")])
            consumed_vertices = [self, transpose] if transpose else [self]

        if len(consumed_vertices) < 1:
            raise UnexpectedNodeError(f"Failed to find reshape node in format conversion layer near {self.name}.")

        print(f"Failed near {self.name}.")
        output_shape = consumed_vertices[-1].get_output_shapes()[0]
        print(output_shape)
        spatial_reshape_sizes = [output_shape[1], output_shape[2]]
        return consumed_vertices, [output_shape], spatial_reshape_sizes

i changed the if self.op != "Reshape": to if self.op == "Reshape":.
Now it compiles. Maybe i am using a old version of onnx.
Please tell me if i am completely wrong about this.

I am currently working on the optimization part of the compilation with the following code.

# General imports used throughout the tutorial
# file operations
import json
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import numpy as np
import tensorflow as tf
from IPython.display import SVG
from matplotlib import patches
from matplotlib import pyplot as plt
from PIL import Image
from tensorflow.python.eager.context import eager_mode
from pathlib import Path

# import the hailo sdk client relevant classes
from hailo_sdk_client import ClientRunner, InferenceContext

# preprocessing
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize, PILToTensor
import torch

try:
    from torchvision.transforms import InterpolationMode
    BICUBIC = InterpolationMode.BICUBIC
except ImportError:
    BICUBIC = Image.BICUBIC

harPath = Path("hailoDFC/Harfiles")
datafolder = Path("../Data")
input_folder = datafolder / 'data'
calibFolder = datafolder / "calibData"

model_name = "RN50"

def _convert_image_to_rgb(image):
    return image.convert("RGB")

def transform(n_px):
    """
    n_px: input resolution of the network
    """
    return Compose([
        Resize(n_px, interpolation=BICUBIC),
        CenterCrop(n_px),
        _convert_image_to_rgb,
        ToTensor(),
        # Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
    ])

# First, we will prepare the calibration set. Resize the images to the correct size and crop them.
def preproc(image, output_height=224, output_width=224, resize_side=256):
    """imagenet-standard: aspect-preserving resize to 256px smaller-side, then central-crop to 224px"""
    with eager_mode():
        h, w = image.shape[0], image.shape[1]
        scale = tf.cond(tf.less(h, w), lambda: resize_side / h, lambda: resize_side / w)
        resized_image = tf.compat.v1.image.resize_bilinear(tf.expand_dims(image, 0), [int(h * scale), int(w * scale)])
        cropped_image = tf.compat.v1.image.resize_with_crop_or_pad(resized_image, output_height, output_width)

        return tf.squeeze(cropped_image)

preprocess = transform(224)
images_list = [img_name for img_name in os.listdir(input_folder) if os.path.splitext(img_name)[1] == ".jpg"]

calib_dataset = np.zeros((len(images_list), 224, 224, 3))
for idx, img_name in enumerate(sorted(images_list)):
    img = Image.open(os.path.join(input_folder, img_name))
    # img = PILToTensor(img)
    img_preproc = preprocess(img)
    img_transposed = np.transpose(img_preproc.numpy(),(1,2,0))
    calib_dataset[idx, :, :, :] = img_transposed

np.save(calibFolder / f"calib_set_{model_name}.npy", calib_dataset)

# Second, we will load our parsed HAR from the Parsing Tutorial

hailo_model_har_name = f"{model_name}_hailo_model.har"
hailo_model_har_path = harPath / hailo_model_har_name
assert os.path.isfile(hailo_model_har_path), "Please provide valid path for HAR file"
runner = ClientRunner(har=str(hailo_model_har_path),hw_arch="hailo8")
# By default it uses the hw_arch that is saved on the HAR. For overriding, use the hw_arch flag.

# Now we will create a model script, that tells the compiler to add a normalization on the beginning
# of the model (that is why we didn't normalize the calibration set;
# Otherwise we would have to normalize it before using it)

# Batch size is 8 by default
alls = "normalization1 = normalization([123.675, 116.28, 103.53], [58.395, 57.12, 57.375])\n" # From tutorial

# Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)), # From Lia
# alls = "normalization1 = normalization([0.48145466, 0.4578275, 0.40821073], [0.26862954, 0.26130258, 0.27577711])\n"

# Load the model script to ClientRunner so it will be considered on optimization
runner.load_model_script(alls)

# Call Optimize to perform the optimization process
runner.optimize(calib_dataset)

# Save the result state to a Quantized HAR file
quantized_model_har_path = f"{model_name}_quantized_model.har"
runner.save_har(quantized_model_har_path)

The execution stops at the line where `runner.optimize(calib_dataset)gets called.
The error is:

in _minimize_slsqp
    slsqp(m, meq, x, xl, xu, fx, c, g, a, acc, majiter, mode, w, jw,
ValueError: failed to initialize intent(inout) array -- expected elsize=8 but got 4

I suspect it is because of the matmul operation in the onnx graph.
I saw someone is adressing the same problem in this post:
Problems optimizing (quantization) dinov2 - ONNX to HEF
Is there a fix for this or is there a bug in my code?

Hi @lukasOST,

We wouldn’t recommend changing our parser source code - being able to parse the model doesn’t mean that it was correctly parsed.

You can compare your version of the onnx to ours:
https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/Classification/clip_resnet_50x4/pretrained/2023-03-09/clip_resnet_50x4.zip

Hey @nina-vilela
you’re right the error only occurs on the model which I translated to ONNX. My ONNX model again looks different from the one you provided. For example my model doesn’t have any padding blocks in it.

Do you know which tool you used to get your clip model and translate it to a ONNX model? Is it different from the way I am getting my models and translating them?

Hi, @lukasOST,
Could you please try to run this code for exporting?

import onnx  
import torch
import clip
from PIL import Image  

device = "cuda" if torch.cuda.is_available() else "cpu" 
model, preprocess = clip.load("RN50x4", device=device)
input_var = torch.rand(1, 3, 288, 288)
vit = model.visual 
textTransformer = model.token_embedding
image = preprocess(Image.open("CLIP/CLIP.png")).unsqueeze(0).to(device)

vit.eval()
torch.onnx.export(vit,
         image,
         "clip_resnet_50x4.onnx",    
         export_params=True,    
         opset_version=17,   
         verbose=False, 
         do_constant_folding=False,)

Hi @lihis ,

i tried your code. Here is the resulting onnx Graph. I had to crop the picture else i couldn’t upload it. But this shouldn’t be a problem since the top part is recursive.


I also tried to compile the onnx to a har file but the same error as before occurred.

    spatial_reshape_sizes = [output_shape[1], output_shape[2]]
IndexError: list index out of range

Hi @lukasOST,

Please try simplifying the .onnx model before converting it to .har by running these commands:

pip install onnxsim
onnxsim <original.onnx> <simplified.onnx> 

Hey @lihis ,
I simplified the graph. The result is this:

I again tried to compile it to har but the error is still present.

Hey @lihis, do you have an idea where my error lies or do you know how you guys got the model for the CLIP example?

hey @lukasOST,

Thanks for your patience while we were checking this on our side.

Could you please give it a try with opset version 12? This is the opset version that worked for us.

import torch
import clip
from torch.autograd import Variable

vision_arch = "RN50x4"
device = "cpu"
model, preprocess = clip.load("RN50x4", device=device)

vit = model.visual

dummy_input = Variable(torch.randn(1, 3, 288, 288, device="cpu"))

# ==================== #
# Export Model as ONNX #
# ==================== #
vit.eval()
torch.onnx.export(vit,          # model being run 
         dummy_input,                 # model input (or a tuple for multiple inputs) 
         "clip.onnx",       # where to save the model  
         opset_version=12,      # the ONNX version to export the model to
         do_constant_folding=True,          # whether to execute constant folding for optimization 
         )

Hey @nina-vilela,

i tryed with your code with opset_version=12 but i get this error:
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 12 is not supported. Support for this operator was added in version 14, try exporting with this version.

But the good news is the compilation is currently working for me.
Strangely, my fix is to execute the code in a Jupyter notebook.
Before I was just running the files in VSC.

I hope I didn’t miss the point that you have to run the programm in a Jupyter notebook anywhere.

Did you use the same virtual environment in the vsc and the Jupyter notebook? If not, this could be due to package version disparity.

In any case, it’s great to hear that the compilation is now succesful for you.