Large difference between compiled model for CLIP

I am running CLIP from the model zoo on my 8L. I have noticed a relatively large difference in the output values. So I set the output type of the HEF model to float32. Then I send in a uint8 image that is resized properly into the hailo clip. Then I compare the image embedding with the same model size from openai and the difference is pretty large for the scale of the output. Sometimes off by 25x.

I took examples from here https://github.com/hailo-ai/Hailo-Application-Code-Examples to run the inference. I am trying to do a down stream task of storing the output vectors for later text matching use, and these large errors won’t really work with openai’s text embedding. I see in the hailo clip example repo that the text embedding is done with openai’s model and not some compiled hailo model as well. Any ideas of what could be wrong?

Hi @tarmily.wen,
Do you see a large difference in the image embeddings as inferenced by Hailo’s HEF and Openai Pytorch model?
Can you please add more details and also which networks you ran? Please send the link to the Openai network you run.
As you said the best reference should be our hailo-CLIP repo.
I will be happy to hear what you are planning to do with this model :slight_smile:

@tarmily.wen, adding to Gilad’s question, did you use ModelZoo’s model script?

Note that there are 2 models clip_resnet50 and clip_resnet50x4.
The clip demo is using clip_resnet50x4

I am using “RN50” from openai and “clip_resnet_50.hef” downloaded from the model zoo. Yes I am seeing a large difference in image embedding

I am not using the model script. For the preprocessing steps for an image, am I reading this correctly that all of the preocessing steps are baked into the model already? So no need to divide by 255, then subtract the mean and divide by std?

Also for the text embedding, just using openai’s text embeddding for the matching model is fine right?

Normalization is done in the chip.
The network is working with uint8 input. If you use float input you might need to configure hailort to do the conversion.
Which Hailo API are you using?

I am using pyhailort. I am using a ConfiguredInferModel to perform the inference.

I have tried setting the input to both uint8 and float32 and get large value differences