How to use clip_text_encoder_resnet50x4.hef?

tarmily.wen · October 25, 2024, 4:27pm

I see that the input shape for the model is (1, 77, 640). However when I run clip.tokenize(text) it only produces (1, 77). What is the 640 input size for the model?

Nadav · October 27, 2024, 12:00pm

Hi @tarmily.wen,
The output that you get (1,77) is the tokenized (ID only), before embedding. You need to create the embedded tokens, those would be 1,77,640 for the resnet50x4 model.

Topic		Replies	Views
Extracting Embeddings from resnet50 General	4	67	March 4, 2025
Large difference between compiled model for CLIP General	9	136	October 24, 2024
Resize Layer compiled into the HEF file General hef , network	1	246	January 24, 2024
Converting Vision Transfromer Tracker to HEF General dfc	6	244	July 30, 2024
Dataflow Compiler errors converting deep and wide feed-forward network General dfc , hailo8 , error	1	167	October 13, 2024

How to use clip_text_encoder_resnet50x4.hef?

Related topics