I’m trying to run my first model on a Raspberry Pi 5 with a Hailo-8 AI accelerator. I converted the RetinaFace MobileNet model from the model zoo to the HEF format and are able to load the model successfully. However, when I try to infer a single image with the model using the HailoAsyncInference
from the examples repo, the RAM increases continuously before the device is out of memory and crashes.
The code I’ve used is approximately the following:
import queue
import numpy as np
from cognicam.inference import HailoAsyncInference
from functools import partial
def inference_callback(
completion_info,
bindings_list: list,
input_batch: list,
output_queue: queue.Queue
) -> None:
output_queue.put(completion_info)
input_queue = queue.Queue()
output_queue = queue.Queue()
inference_callback_fn = partial(inference_callback, output_queue=output_queue)
net_path = 'retinaface_mobilenet_v1.hef'
hailo_inference = HailoAsyncInference(
net_path,
input_queue,
inference_callback_fn,
1,
input_type="UINT8",
output_type="FLOAT32",
send_original_frame=False
)
height, width, channels = hailo_inference.get_input_shape()
img = np.random.randint(0, 255, (width, height, channels), dtype=np.uint8)
input_queue.put(img.flatten())
# RAM begins to increase continously after this call
hailo_inference.run()