Hello, we’re trying to compile two kinds of Zero-Shot Classification with the DFC. We have seen hailo provides ViT and CLIP. But for our ONNX models, we have some error as below.
CLIP-ViT-B/32 (model weight is FP16)
We converted the OpenAI model to ONNX format.
We tried to compile it with the DFC, but got an error when parsing:
ONNX shape inference failed: [ONNXRuntimeError] : 1 : FAIL : Type Error : Type parameter (T) of Optype (Conv) bound to different types (tensor(float) and tensor(float16) in node (/conv1/Conv)).
UnsupportedShuffleLayerError in op /…/…/Transpose_5 : Failed to determine type of layer to create in node /…/…/Transpose_5
UnsupportedShuffleLayerError in op /…/…/Reshape_9: Failed to determine type of layer to create in node /…/…/Reshape_9
Any guidance on what is going wrong or what we could do to fix it would be helpful! (When we compiled CNN based models there were no big errors.)
The Issue with CLIP-ViT-B/32 arises because the DFC expects consistent data types across the model, but here the model has mixed float and float16 (FP32 and FP16) data types. This usually happens when the model was initially trained or optimized for FP16 but contains residual FP32 layers or constants. The DFC cannot handle mixed types within a single model.
You have two solutions for this :
1- Retry compiling with DFC: Use the clip_vit_b32_fp32.onnx model in DFC to avoid FP16 type conflicts.
2- Convert the model to FP32 entirely: Ensure all layers are converted to FP32 to avoid type mismatch issues. Use the onnx Python library or an ONNX conversion tool for this.
For the CLIP4Clip model with FP32 weights:
The UnsupportedShuffleLayerError is often triggered by complex reshaping or transposing layers that aren’t supported by DFC.
Simplify Transpose and Reshape Operations:
Try using the onnx-simplifier library to streamline these operations within the model.
If simplification doesn’t resolve the issue, you may need to provide specific shape hints for the layers causing the error.
hailo_sdk_common.hailo_nn.exceptions.UnsupportedModelError: Invalid kernel shape for base conv layer base_conv50 (translated from MatMul_1535).
Either the input shape doesn't match the kernel shape, or the calculated groups number doesn't match the expected ratio between kernel shape and input shape.
Kernel features: 768 Input features: 1 Groups: 0
We successfully compiled CLIP-ResNet50 and checked .hef file running on Hailo-8.