pySDK long delay on first inference, subsequent fast

user116 · June 27, 2025, 3:59pm

I’ve pretty easily implemented the Facial Recognition system documented here Facial Recognition with PySDK and it’s working great. My one concern is that the very first time I use the facial detection model it takes about 1.5 seconds. Same for the first time I use the facial vectoring model (again about 1.5 seconds). After that first use though the inferencing goes to 0.02 or so. Huge difference. Is this a known thing where we should be doing some sort of initialization before using it in real-time frame analysis? Should we just run a blank inference on start-up to “prime” it? Images show times on first inference and second.

Code:

face_rec_model_name = "arcface_mobilefacenet--112x112_quant_hailort_hailo8l_1"
face_det_model_name = "scrfd_500m--640x640_quant_hailort_hailo8l_1"

# Face Model
FaceModel = dg.load_model(
    model_name=face_det_model_name,
    inference_host_address=your_host_address,
    zoo_url=your_model_zoo,
    token=your_token,
    device_type=device_type,
    overlay_color=[(255,255,0),(0,255,0)]   
)

# Face Recognition Model
FaceVectorModel = dg.load_model(
    model_name=face_rec_model_name,
    inference_host_address=your_host_address,
    zoo_url=your_model_zoo,
    token=your_token,
    device_type=device_type,
)

##. INSIDE SOME FUNCTION....

        print(f"{time.monotonic()}: Before Face Detection")
        faceDetectResult = FaceModel(imageForFaceRec)  
        print(f"{time.monotonic()}: After Facial Detection")
        if len(faceDetectResult.results) > 0:
            face = faceDetectResult.results[0]
            if "label" in face and face["label"] == "face":
                if "score" in face and face["score"] > min_confidence:
                    # Extract bounding box (assumed in [x1, y1, x2, y2] format)
                    x1, y1, x2, y2 = map(int, face["bbox"])  # Convert bbox coordinates to integers
                    cropped_face, _ = self.crop_with_context(imageForFaceRec, [x1, y1, x2, y2])

                    # Display cropped faces for testing
                    if DEBUG:
                        cv2.imshow('Face', cropped_face)

                    # align and crop face.
                    landmarks = [landmark["landmark"] for landmark in face["landmarks"]]
                    print(f"{time.monotonic()}: Before align and crop")
                    aligned_face, _ = self.align_and_crop(imageForFaceRec, landmarks)
                    print(f"{time.monotonic()}: After align and crop")
                    #
                    #  Display the aligned face
                    if DEBUG:
                        cv2.imshow('Aligned Face', aligned_face)

                    # Get the embeddings
                    print(f"{time.monotonic()}: Before Facial Vecotring")
                    face_embedding = FaceVectorModel(aligned_face).results[0]["data"][0]
                    print(f"{time.monotonic()}: After Facial Vecotring")
                    self.faceVector = face_embedding

Thank you.

shashi · June 28, 2025, 12:09am

Hi @user116
Yes, first inference for a model has model loading overhead. As you suggested, you can run a warm up inference on all models before the msin application code.

user116 · June 28, 2025, 8:21am

How many models can be loaded at one time.? I have pose detection as my main. Then face detection and facial analysis (vector creation) if certain conditions are met. Would I need to warm up between each switch of inference model?

Topic		Replies	Views
Best approach for multi-model video inference in rpi General raspberry-pi	1	137	January 13, 2025
hailo8 retinaface mobilenet model postprocessing General hailort , python , hailo8 , hailo-api	3	41	May 28, 2025
User Guide 3: Simplifying Object Detection on a Hailo Device Using DeGirum PySDK Guides	57	1526	May 31, 2025
Help with inference/post-processing for the scrfd model for face detection General hailo8	5	560	May 21, 2025
Improve FPS using Python API General fps , python , hailo8	1	137	April 22, 2025

pySDK long delay on first inference, subsequent fast

Related topics