Large difference between compiled model for CLIP

tarmily.wen · October 18, 2024, 4:02pm

I am running CLIP from the model zoo on my 8L. I have noticed a relatively large difference in the output values. So I set the output type of the HEF model to float32. Then I send in a uint8 image that is resized properly into the hailo clip. Then I compare the image embedding with the same model size from openai and the difference is pretty large for the scale of the output. Sometimes off by 25x.

I took examples from here https://github.com/hailo-ai/Hailo-Application-Code-Examples to run the inference. I am trying to do a down stream task of storing the output vectors for later text matching use, and these large errors won’t really work with openai’s text embedding. I see in the hailo clip example repo that the text embedding is done with openai’s model and not some compiled hailo model as well. Any ideas of what could be wrong?

giladn · October 20, 2024, 6:21am

Hi @tarmily.wen,
Do you see a large difference in the image embeddings as inferenced by Hailo’s HEF and Openai Pytorch model?
Can you please add more details and also which networks you ran? Please send the link to the Openai network you run.
As you said the best reference should be our hailo-CLIP repo.
I will be happy to hear what you are planning to do with this model

nina-vilela · October 20, 2024, 8:24am

@tarmily.wen, adding to Gilad’s question, did you use ModelZoo’s model script?

github.com

hailo-ai/hailo_model_zoo/blob/9d54af95a1ec603e02d58efd680f4b9122bb9c0c/hailo_model_zoo/cfg/alls/generic/clip_resnet_50.alls

set_seed(seed=23)
normalization1 = normalization([122.7709383, 116.7460125, 104.09373615000001], [68.5005327, 66.6321579, 70.32316304999999])
model_optimization_flavor(optimization_level=4, compression_level=0)
model_optimization_config(checker_cfg, batch_norm_checker=False)

giladn · October 21, 2024, 7:46am

Note that there are 2 models clip_resnet50 and clip_resnet50x4.
The clip demo is using clip_resnet50x4

tarmily.wen · October 21, 2024, 3:29pm

I am using “RN50” from openai and “clip_resnet_50.hef” downloaded from the model zoo. Yes I am seeing a large difference in image embedding

tarmily.wen · October 21, 2024, 3:31pm

I am not using the model script. For the preprocessing steps for an image, am I reading this correctly that all of the preocessing steps are baked into the model already? So no need to divide by 255, then subtract the mean and divide by std?

tarmily.wen · October 21, 2024, 4:34pm

Also for the text embedding, just using openai’s text embeddding for the matching model is fine right?

giladn · October 24, 2024, 4:52pm

Normalization is done in the chip.
The network is working with uint8 input. If you use float input you might need to configure hailort to do the conversion.
Which Hailo API are you using?

tarmily.wen · October 24, 2024, 7:58pm

I am using pyhailort. I am using a ConfiguredInferModel to perform the inference.

tarmily.wen · October 24, 2024, 7:59pm

I have tried setting the input to both uint8 and float32 and get large value differences

Topic		Replies	Views
Poor Inference Results of My MobileNetV2-UNet on Hailo8L General	10	120	February 27, 2025
Hailo-Application-Code-Example Hailo-8 CLIP ViT-L/14 error General dfc , hailort , hailo8	3	100	February 17, 2025
reconciling different outputs between quantized HAR and compiled HEF General	15	198	July 20, 2025
hailo-CLIP can't load clip_resnet_50x4_h8l.hef and segfaults with --input=rpi General hailo8 , error	1	17	July 26, 2025
Quantized model is giving wrong output while running on the Hailo-8L chip General dfc , hailort	5	439	July 6, 2024

Large difference between compiled model for CLIP

Related topics