High RAM usage with RetinaFace MobileNet model

Kristian_Klemon · July 1, 2025, 1:42am

I’m trying to run my first model on a Raspberry Pi 5 with a Hailo-8 AI accelerator. I converted the RetinaFace MobileNet model from the model zoo to the HEF format and are able to load the model successfully. However, when I try to infer a single image with the model using the HailoAsyncInference from the examples repo, the RAM increases continuously before the device is out of memory and crashes.

The code I’ve used is approximately the following:

import queue
import numpy as np
from cognicam.inference import HailoAsyncInference
from functools import partial


def inference_callback(
        completion_info,
        bindings_list: list,
        input_batch: list,
        output_queue: queue.Queue
) -> None:
    output_queue.put(completion_info)


input_queue = queue.Queue()
output_queue = queue.Queue()

inference_callback_fn = partial(inference_callback, output_queue=output_queue)

net_path = 'retinaface_mobilenet_v1.hef'

hailo_inference = HailoAsyncInference(
    net_path,
    input_queue,
    inference_callback_fn,
    1,
    input_type="UINT8",
    output_type="FLOAT32",
    send_original_frame=False
)

height, width, channels = hailo_inference.get_input_shape()

img = np.random.randint(0, 255, (width, height, channels), dtype=np.uint8)
input_queue.put(img.flatten())

# RAM begins to increase continously after this call
hailo_inference.run()

Topic		Replies	Views
Face detector inference in hailo General hailort , raspberry-pi	0	178	January 2, 2025
hailo8 retinaface mobilenet model postprocessing General hailort , python , hailo8 , hailo-api	3	41	May 28, 2025
General questions about hailo8l and object detection models General python , hailo8	2	292	November 4, 2024
Inference for Mobilenet_v1 classification using gstreamer Community Projects gstreamer , hailo8	1	93	February 19, 2025
Model inference on Hailo-8 vs Hailo-8L General	4	860	August 29, 2024

High RAM usage with RetinaFace MobileNet model

Related topics