Questions about runtime warnings and the multi-process inference

Nhien_Truong · July 27, 2025, 2:26pm

Hi everyone, I need some help regarding my personal project using Raspberry Pi 5 with hailo8l. For context, I’m trying to implement an architecture for anomaly detection called EfficientAD for real-time video stream. Since it uses a teacher-student architecture, my goal firstly is to run both of the models (custom compiled). I’ve compiled the models from onnx following this guide Creating Custom Hef using DFC/Model Zoo but I use the optimization level 0 for quick testing. In my scenario, since both the teacher and the student models (both use (256, 256, 3) shape) need to give a prediction for each object, which are then aggregated to detect anomalies, I chose to follow the tutorials on using the InferModel for multi-model and multi-process inference which can be found here.

I have used the exact copy from the tutorials for multi-process with some tweaks. The code is presented below, any modification is marked:

import numpy as np
from multiprocessing import Process
from functools import partial
from hailo_platform import VDevice, HailoSchedulingAlgorithm, FormatType

number_of_frames = 4
timeout_ms = 10000

def example_callback(completion_info, bindings):
    if completion_info.exception:
        # handle exception
        pass

    output = bindings.output().get_buffer()

def infer(should_use_multi_process_service):
    # Create a VDevice
    params = VDevice.create_params()
    params.scheduling_algorithm = HailoSchedulingAlgorithm.ROUND_ROBIN
    params.group_id = "SHARED"
    if should_use_multi_process_service:
        params.multi_process_service = should_use_multi_process_service

    with VDevice(params) as vdevice:

        # MODIFIED: change path
        infer_model = vdevice.create_infer_model('../data/student.hef')

        infer_model.set_batch_size(2)

        # MODIFIED: no input_layer_1 because this is a single input layer model
        infer_model.input().set_format_type(FormatType.FLOAT32)
        infer_model.output().set_format_type(FormatType.FLOAT32)

        with infer_model.configure() as configured_infer_model:
            for _ in range(number_of_frames):
                bindings = configured_infer_model.create_bindings()
                bindings.input().set_buffer(np.empty(infer_model.input().shape).astype(np.float32))
                bindings.output().set_buffer(np.empty(infer_model.output().shape).astype(np.float32))

                configured_infer_model.wait_for_async_ready(timeout_ms=10000)

                job = configured_infer_model.run_async([bindings], partial(example_callback, bindings=bindings))

            job.wait(timeout_ms)

if __name__ =="__main__":

    pool = [
        Process(target=infer, args=(True,)),
        Process(target=infer, args=(True,))
    ]

    print('Starting async inference on multiple models using processes')

    for job in pool:
        job.start()
    for job in pool:
        job.join()
    print('Done inference')

When I run this code, it prints RuntimeWarning: overflow encountered in cast and RuntimeWarning: invalid value encountered in cast at both lines bindings.input().set_buffer(np.empty(infer_model.input().shape).astype(np.float32)) and bindings.output().set_buffer(np.empty(infer_model.output().shape).astype(np.float32). Initially, I thought I’ve messed up the input and float32 for input seems to be the cause for overflowing. So I tried changing the buffer format type setting to FormatType.UINT8 and convert the bindings input/output to np.uint8, and it did resolved the overflow warnings, but the invalid value warnings still remained. I suspect the problem is caused by the multi-process service, so I use the single-model inference using InferModel tutorial code to verify this.

import numpy as np
from hailo_platform import VDevice, HailoSchedulingAlgorithm

timeout_ms = 1000

params = VDevice.create_params()
params.scheduling_algorithm = HailoSchedulingAlgorithm.ROUND_ROBIN

with VDevice(params) as vdevice:

    # MODIFIED: hef path
    infer_model = vdevice.create_infer_model('../data/student.hef')

    with infer_model.configure() as configured_infer_model:
        bindings = configured_infer_model.create_bindings()

        buffer = np.empty(infer_model.input().shape).astype(np.uint8)
        bindings.input().set_buffer(buffer)

        buffer = np.empty(infer_model.output().shape).astype(np.uint8)
        bindings.output().set_buffer(buffer)

        # Run synchronous inference and access the output buffers
        configured_infer_model.run([bindings], timeout_ms)
        buffer = bindings.output().get_buffer()

        # Run asynchronous inference
        job = configured_infer_model.run_async([bindings])
        job.wait(timeout_ms)

The script runs perfectly fine with no warning, even when I configured it to use float32 for both inputs and outputs. Moreover, when I run the multi-process code, the student model outputs are all zeroes, but the same model in this single process code returns arrays of 86 (I’m not sure an array with all elements having the same value is expected, but I will inspect this later since it could be caused by the low level quantization). My questions are:

What does the warning “invalid value encountered in cast” mean? What could be the cause for this?
Does the low-level optimization cause these warnings?
In my experiments, I changed from float32 to uint8 for both inputs and outputs for both processes to address the warning “overflow encountered in cast”. Does it make sense for me to make both input and output accept UINT8, then I will use some tools to dequantize the result for aggregation (haven’t researched about this yet)? I think I need float outputs for an accurate result aggregation, so does it make sense for me to make the input as uint8 but output float32?
This is about the tutorials. What are the differences between using VStream and using InferModel? I’ve noticed that people tend to use the VStream one.

Additional question: do the model script add the layers to the model directly, so the final hef file has the layers defined in the model script? My model script only has a normalization layer, but I’m not sure if this layer is included in the final model so I don’t need to manually normalize the inputs.

Thanks for reading all of this. Any answer or suggestion is very very much appreciated.

Nhien_Truong · August 10, 2025, 1:33am

Update: I used np.empty(infer_model.output().shape, dtype=np.uint8) instead to solve the RuntimeWarning: invalid value encountered in cast.

Topic		Replies	Views
Multi Network Process Initialization Issue in Python General	4	391	October 13, 2024
Facing issue when using multimodal when run on raspberry pi 5 AI kit+ General raspberry-pi , error	2	156	May 7, 2025
Multiple Models Inference on Hailo 8 General hailo8	6	973	August 12, 2024
Multi-devices Python API Community Projects	0	70	February 27, 2025
Help regarding: Python Inference Tutorial - Multi Process Service and Model Scheduler General hailort , raspberry-pi , python , hailo8 , error	3	65	May 26, 2025

Questions about runtime warnings and the multi-process inference

Related topics