Segmentation fault / std::bad_function_call while idling

Hello!

I’ve been using this script to test the inference on the HAILO-8L:

import numpy as np
from hailo_platform import VDevice, HailoSchedulingAlgorithm, FormatType
import time

class PerformanceCounter:
    def __init__(self, name: str = ''):
        self.start_time = None
        self.name = name
    
    def __enter__(self):
        self.start_time = time.time()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        elapsed_time = time.time() - self.start_time
        print(f"Finished '{self.name}' in {elapsed_time:.2f}s.")
        return False

class HailoInferenceTestClass:
    def __init__(self, hef: str, batch_size: int = 63, timeout_ms: int = 1000, classes: list[str] = ['ADDED', 'IDLE', 'REMOVED']):
        self.hef = hef
        self.batch_size = batch_size
        self.timeout_ms = timeout_ms
        self.classes = classes
        
        # connect to hailo device and create infer model
        params = VDevice.create_params()
        params.scheduling_algorithm = HailoSchedulingAlgorithm.ROUND_ROBIN
        self.vdevice = VDevice(params)
        self.infer_model = self.vdevice.create_infer_model(self.hef)
        self.infer_model.set_batch_size(self.batch_size)
        self.output_buffers = []
        
        # quantize/dequantize inputs/outputs
        for input in self.infer_model.inputs:
            input.set_format_type(FormatType.FLOAT32)
        for output in self.infer_model.outputs:
            output.set_format_type(FormatType.FLOAT32)
            self.output_buffers.append([])
        
        # create configured infer model
        self.configured_infer_model = self.infer_model.configure()
        # pre-allocate bindings and buffers
        self.bindings = self._allocate_bindings(self.batch_size)

    # delete resources in order because it segfaults at the end
    def __del__(self):
        del self.configured_infer_model
        del self.infer_model
        del self.vdevice

    def _allocate_bindings(self, size: int):
        binding_list = []
        for _ in range(size):
            bindings = self.configured_infer_model.create_bindings()
            
            # Set input and output buffers
            input_shape = self.infer_model.input().shape
            buffer = np.empty(input_shape).astype(np.float32)
            bindings.input().set_buffer(buffer)
            
            for idx, o in enumerate(self.infer_model.outputs):
                buffer = np.empty(o.shape).astype(np.float32)
                self.output_buffers[idx].append(buffer)
                bindings.output(o.name).set_buffer(buffer)
            
            binding_list.append(bindings)
        
        return binding_list

    def predict(self, X):
        with PerformanceCounter('Input transformation'):
            X = np.transpose(X, (0, 2, 1))
            X = np.expand_dims(X, axis=1)
            x_len = len(X)
            b_len = len(self.bindings)
            # create additional buffers if necessary
            if x_len > b_len:
                self.bindings += self._allocate_bindings(x_len - b_len)

        with PerformanceCounter('Set input buffers'):
            for idx, input in enumerate(X):
                self.bindings[idx].input().set_buffer(input)
        
        with PerformanceCounter('Run inference'):
            self.configured_infer_model.run(self.bindings[:x_len], self.timeout_ms)

        with PerformanceCounter('Result interpretation'):
            y_pred = np.mean(np.array(self.output_buffers), axis=0)[:x_len]
            y_pred = [
                self.classes[int(np.random.choice(np.flatnonzero(prob == prob.max())))]
                for prob in y_pred
            ]

        return y_pred


if __name__ == '__main__':
    model = HailoInferenceTestClass('./combined_2d_quant.hef')
    X = np.array([[[-4.59000000e+02, -8.70056152e-02],
                   [-4.47000000e+02, -1.74003601e-01],
                   [-8.00000000e+02 ,-8.00000000e+02],
                   [-4.60000000e+02, -4.34997559e-01],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-4.21000000e+02,  2.60986328e-01],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-5.30000000e+02, -1.73995972e-01],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-5.06000000e+02, -7.48200226e+00],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-4.23000000e+02, -8.69979858e-02],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-8.00000000e+02, -8.00000000e+02],
                   [-8.00000000e+02, -8.00000000e+02]]] * 64).astype(np.float32)

    y_pred = model.predict(X)
    print(y_pred)
    input('Done\n')

While the process waits for the input, about 1 minute or so, it constantly gets terminated and the output is either of the two below:

Finished 'Input transformation' in 0.00s.
Finished 'Set input buffers' in 0.00s.
Finished 'Run inference' in 0.16s.
Finished 'Result interpretation' in 0.03s.
....
Done
terminate called after throwing an instance of 'std::bad_function_call'
  what():  bad_function_call
Aborted

or

Finished 'Input transformation' in 0.00s.
Finished 'Set input buffers' in 0.00s.
Finished 'Run inference' in 0.16s.
Finished 'Result interpretation' in 0.03s.
....
Done
Segmentation fault

There must be something running in the background that gets corrupted.

Hey @engarlanded_boa,

It seems you’re facing memory management issues that result in either a bad function call or a segmentation fault after inference. Let’s break down the potential problems and their solutions:

  1. Resource Cleanup Problem: The destructor might not be properly releasing all resources during cleanup. Solution: Implement a dedicated cleanup function and call it before the program exits. This ensures all resources are properly deallocated.
  2. Threading Issues Problem: Background threads may not be correctly joined or terminated before program exit. Solution: Implement a context manager for your HailoInferenceTestClass. This approach ensures proper resource management and thread cleanup.
  3. Memory Leaks Problem: Possible memory leaks leading to resource exhaustion over time. Solution:
    a)Use a memory profiler to identify and address potential memory leaks.
    b) Implement batch processing to manage memory more efficiently:
def predict(self, X):
    results = []
    for i in range(0, len(X), self.batch_size):
        batch = X[i:i+self.batch_size]
        results.extend(self._predict_batch(batch))
    return results
  1. Buffer Management Problem: The dynamic allocation in _allocate_bindings might be causing issues. Solution: Consider using a fixed-size buffer pool instead of dynamic allocation.

Additional Recommendations:

  • Ensure you’re using the latest version of the Hailo SDK.
  • Implement more robust error handling and logging to pinpoint issues.
  • Replace the input() wait with a timed approach using time.sleep() to avoid potential input-related issues.

If you implement these changes and still face problems, please share your updated code along with any new error messages or logs. Also, providing information about your Hailo SDK version and system specifications would be helpful for further troubleshooting.

None of the above apply.

Just to clarify, when running model.predict in a loop with a 5 second delay between calls for a total of 500 seconds, there’s no Segmentation fault or std::bad_function_call.

Strangely, I timed when the error happens multiple times and it pops out at exactly 1 minute after the last inference.

The problem is in the C++ bindings for Python.

void ConfiguredInferModelWrapper::execute_callbacks()
{
    while (true)
    {
        std::unique_lock<std::mutex> lock(m_queue_mutex);
        m_cv.wait_for(lock, std::chrono::minutes(1), [this](){ return !m_callbacks_queue->empty() || !m_is_alive.load(); });
     
        ...........................

        // FIX
        // std::condition_variable::wait_for will return on timeout even if the predicate is false
        // front/pop on an empty queue -> undefined behavior
        if (m_callbacks_queue->empty()) {
            continue;
        }
        // END FIX

        auto cb_status_pair = m_callbacks_queue->front();

        m_callbacks_queue->pop();
        lock.unlock(); // release the lock before calling the callback, allowing other threads to push to the queue

        auto &cb = cb_status_pair.first;
        auto status = cb_status_pair.second;
        cb(status.status);
    }
}

So after one minute of inactivity, the condition variable times out and the code proceeds operating on an empty queue resulting in undefined behavior.