CLIP ResNet-50 / ResNet-50x4 Training Pipeline and Deployment on Hailo-8L

Hello Hailo Community,

I’m considering purchasing a Hailo-8L for accelerating CLIP-based workloads and have a few questions about training and deployment:

Background & Goals:

  1. I’d like to fine-tune a CLIP model (either clip_resnet_50 or clip_resnet_50x4) on custom image-text data. The ultimate goal is to perform zero-shot classification on new images, but also obtain bounding box coordinates for detected objects.

Questions:

  1. Training Pipeline Documentation:

Is there any official or community-supported documentation, example repo, or recommended workflow for fine-tuning [clip_resnet_50 or clip_resnet_50x4](https://github.com/hailo-ai/hailo_model_zoo/blob/master/docs/public_models/HAILO8L/HAILO8L_zero_shot_classification.rst) on custom datasets targeting Hailo deployment?

Are there Hailo-provided tools, Docker environments, or scripts that streamline the training→compile→deploy process for CLIP variants?

  1. Deployment on Hailo-8L:

After fine-tuning a custom clip_resnet_50, can I deploy it directly into the Hailo CLIP-based classification & detection application?

  1. Bounding Box / Region-Level Predictions:

Does the existing Hailo CLIP-based detection application support returning bounding boxes from a fine-tuned CLIP model?

If not, is there guidance on using Region CLIP (or an equivalent approach) for region-level inference on Hailo-8L? For example, any reference implementations or tips on compiling such models for Hailo?

Hey @Milind_Rampure,

Welcome to the Hailo Community!

Let me address each of your questions:

1. Training Pipeline Documentation:
We do have resources for this! Check out our Hailo-CLIP repository at GitHub - hailo-ai/hailo-CLIP: Real-time zero-shot classifier app - this uses the clipresnet_50x4 architecture you mentioned.

For training configurations and resources, here are the key files in our model zoo:

These configuration files contain the training parameters and setup details you’ll need for the train→compile→deploy workflow.

2. Deployment on Hailo-8L:
Yes, you can deploy your fine-tuned clip_resnet_50 model using the repository I mentioned above. The Hailo-CLIP implementation supports custom trained models.

3. Bounding Box / Region-Level Predictions:
The current implementation doesn’t provide bounding boxes out of the box, but you can customize this by modifying the post-processing pipeline. This gives you flexibility to implement region-level predictions or adapt the output format to include coordinate information.

Hope this helps get you started! Let me know if you need more specific guidance on any of these aspects.