Hello Hailo Community,
I’m considering purchasing a Hailo-8L for accelerating CLIP-based workloads and have a few questions about training and deployment:
Background & Goals:
- I’d like to fine-tune a CLIP model (either
clip_resnet_50orclip_resnet_50x4) on custom image-text data. The ultimate goal is to perform zero-shot classification on new images, but also obtain bounding box coordinates for detected objects.
Questions:
- Training Pipeline Documentation:
Is there any official or community-supported documentation, example repo, or recommended workflow for fine-tuning [clip_resnet_50 or clip_resnet_50x4](https://github.com/hailo-ai/hailo_model_zoo/blob/master/docs/public_models/HAILO8L/HAILO8L_zero_shot_classification.rst) on custom datasets targeting Hailo deployment?
Are there Hailo-provided tools, Docker environments, or scripts that streamline the training→compile→deploy process for CLIP variants?
- Deployment on Hailo-8L:
After fine-tuning a custom clip_resnet_50, can I deploy it directly into the Hailo CLIP-based classification & detection application?
- Bounding Box / Region-Level Predictions:
Does the existing Hailo CLIP-based detection application support returning bounding boxes from a fine-tuned CLIP model?
If not, is there guidance on using Region CLIP (or an equivalent approach) for region-level inference on Hailo-8L? For example, any reference implementations or tips on compiling such models for Hailo?