Running CLIP Classification for Both Scene and Person on 10H

Has anyone had success implementing both “–detector none” and “–detector person” in parallel at once on the 10H? I’m working on providing a client with the ability to classify on both a person-cropped basis and overall scene basis.

It’d be nice if both could be done by a single CLIP instance, but I’m not seeing a straightforward method of doing this without running two models in parallel which would be a substantial performance issue.

Hi,
The simplest solution is to use the person detector and add another “full frame” crop to the person cropper.
This why you’ll get another clip inference on the entire frame.
The person detector pipeline uses the detection network before the clip to use the person bboxes as crop regions for the clip.
So adding additional artificial detection bbox before the hailocropper will force the clip to run also in the full frame.
You can add it with a simple user callback before the hailocropper or even just adding a simple pad probe which will add the bbox to the roi.