High RAM usage with RetinaFace MobileNet model

I’m trying to run my first model on a Raspberry Pi 5 with a Hailo-8 AI accelerator. I converted the RetinaFace MobileNet model from the model zoo to the HEF format and are able to load the model successfully. However, when I try to infer a single image with the model using the HailoAsyncInference from the examples repo, the RAM increases continuously before the device is out of memory and crashes.

The code I’ve used is approximately the following:

import queue
import numpy as np
from cognicam.inference import HailoAsyncInference
from functools import partial


def inference_callback(
        completion_info,
        bindings_list: list,
        input_batch: list,
        output_queue: queue.Queue
) -> None:
    output_queue.put(completion_info)


input_queue = queue.Queue()
output_queue = queue.Queue()

inference_callback_fn = partial(inference_callback, output_queue=output_queue)

net_path = 'retinaface_mobilenet_v1.hef'

hailo_inference = HailoAsyncInference(
    net_path,
    input_queue,
    inference_callback_fn,
    1,
    input_type="UINT8",
    output_type="FLOAT32",
    send_original_frame=False
)

height, width, channels = hailo_inference.get_input_shape()

img = np.random.randint(0, 255, (width, height, channels), dtype=np.uint8)
input_queue.put(img.flatten())

# RAM begins to increase continously after this call
hailo_inference.run()