Compile Zero-Shot Classification

jinsookim · November 12, 2024, 8:04am

Hello, we’re trying to compile two kinds of Zero-Shot Classification with the DFC. We have seen hailo provides ViT and CLIP. But for our ONNX models, we have some error as below.

CLIP-ViT-B/32 (model weight is FP16)
We converted the OpenAI model to ONNX format.
We tried to compile it with the DFC, but got an error when parsing:

ONNX shape inference failed: [ONNXRuntimeError] : 1 : FAIL : Type Error : Type parameter (T) of Optype (Conv) bound to different types (tensor(float) and tensor(float16) in node (/conv1/Conv)).

CLIP4Clip (model weight is FP32)
We tried to compile it with the DFC, but got an error when parsing:

UnsupportedShuffleLayerError in op /…/…/Transpose_5 : Failed to determine type of layer to create in node /…/…/Transpose_5
UnsupportedShuffleLayerError in op /…/…/Reshape_9: Failed to determine type of layer to create in node /…/…/Reshape_9

Any guidance on what is going wrong or what we could do to fix it would be helpful! (When we compiled CNN based models there were no big errors.)

omria · November 12, 2024, 11:19am

Hey @jinsookim ,

The Issue with CLIP-ViT-B/32 arises because the DFC expects consistent data types across the model, but here the model has mixed float and float16 (FP32 and FP16) data types. This usually happens when the model was initially trained or optimized for FP16 but contains residual FP32 layers or constants. The DFC cannot handle mixed types within a single model.
You have two solutions for this :
1- Retry compiling with DFC: Use the clip_vit_b32_fp32.onnx model in DFC to avoid FP16 type conflicts.
2- Convert the model to FP32 entirely: Ensure all layers are converted to FP32 to avoid type mismatch issues. Use the onnx Python library or an ONNX conversion tool for this.

For the CLIP4Clip model with FP32 weights:

The UnsupportedShuffleLayerError is often triggered by complex reshaping or transposing layers that aren’t supported by DFC.
Simplify Transpose and Reshape Operations:
- Try using the onnx-simplifier library to streamline these operations within the model.
- If simplification doesn’t resolve the issue, you may need to provide specific shape hints for the layers causing the error.

Best Regards,
Omri

jinsookim · November 14, 2024, 7:38am

Thanks for your response - We have tested two kinds of CLIP-ViT-B/32 models.

First, we tried our model but got this error:

ValueError: cannot reshape array of size 589824 into shape (768,768,50,1)

Second, we used ViT-B-32-laion2b_e16/visual.onnx but got some other errors:

hailo_sdk_common.hailo_nn.exceptions.UnsupportedModelError: Unsupported dimensions: clip_vit_b_32_fp32_visual/format_conversion1 - [-1, 1, 49, 768], clip_vit_b_32_fp32_visual/const_input1 - [-1, 1, 50, 768] at ew add layer clip_vit_b_32_fp32_visual/ew_add1 (translated from Add_28)

Are we doing something wrong or is the DFC not updated on this type of error?
(We’re using the latest version of DFC.)

jinsookim · November 14, 2024, 7:57am

And we also test OpenAI ViT-B-32/visual.onnx model and got an error below.

hailo_sdk_common.hailo_nn.exceptions.UnsupportedModelError: Invalid kernel shape for base conv layer base_conv50 (translated from MatMul_1535).
Either the input shape doesn't match the kernel shape, or the calculated groups number doesn't match the expected ratio between kernel shape and input shape.
Kernel features: 768 Input features: 1 Groups: 0

We successfully compiled CLIP-ResNet50 and checked .hef file running on Hailo-8.

jinsookim · November 20, 2024, 8:04am

Hey @omria,
Is there a solution to the error we’ve mentioned above?

Topic		Replies	Views
Compile CLIP with the DFC General dfc , raspberry-pi , error	30	530	December 7, 2024
Hailo-Application-Code-Example Hailo-8 CLIP ViT-L/14 error General dfc , hailort , hailo8	3	99	February 17, 2025
help converting onnx model General dfc	4	1136	September 9, 2024
Dataflow compiler parsing failure - incorrectly reshapes array General dfc	3	91	April 24, 2025
Error converting onnx model using dfc General dfc	6	359	June 11, 2025

Compile Zero-Shot Classification

For the CLIP4Clip model with FP32 weights:

Related topics