Using Multiple Hailo-8 M.2 Accelerators: Running Different Models and Sequential Acceleration

ivanstepanovftw · October 9, 2024, 8:41am

Hi,

I’m working with a setup that includes two or three Hailo-8 M.2 AI Acceleration modules, and I have a couple of questions regarding their capabilities:

Is it possible to run different neural network models on separate Hailo accelerators? For example, having one model run on one Hailo-8 module and another model on a different module simultaneously?
Can I distribute the workload of a single model across multiple Hailo accelerators? Specifically, is there a way to offload part of the neural network (e.g., some layers or weights) to one accelerator and the rest to another to achieve sequential acceleration?

I would appreciate any insights or examples on how to configure and optimize the use of multiple Hailo-8 modules for these purposes.

Thanks!

omria · October 9, 2024, 8:59am

Hey @ivanstepanovftw

Running Multiple Models on Hailo-8 M.2 AI Acceleration Modules

Hailo-8 M.2 AI acceleration modules support running different neural network models simultaneously and distributing workload across multiple accelerators. Here’s how:

1. Running Different Models on Separate Hailo-8 Modules

Hailo’s architecture allows execution of different neural network models on separate accelerators concurrently.

Key Components:
- HailoRT framework
- Multi-Process Service
- Model Scheduler
Implementation:
- Assign each accelerator to a separate task
- Use Multi-Process Service for resource allocation
- Utilize Model Scheduler to manage different tasks and models

2. Distributing a Single Model Across Multiple Accelerators

HailoRT supports model parallelism, allowing distribution of a single neural network across multiple accelerators.

Key Features:
- Sequential acceleration
- Model Scheduler
- Cascaded Networks Structure (TAPPAS framework)
Implementation:
- Use Model Scheduler to define which portions of the model run on which device
- Implement Cascaded Networks Structure for flexible workload distribution

3. Sequential Acceleration

This approach involves sequentially offloading network layers or portions of the model across multiple devices.

Key Components:
- Stream Multiplexer
- Model Scheduler Optimizations
Benefits:
- Higher throughput
- Reduced latency

Configuration

To achieve these setups:

Use appropriate APIs provided by HailoRT
Leverage HailoRT features for optimal performance
Configure the system to distribute tasks or model layers as needed

This approach allows for efficient utilization of multiple Hailo accelerators, whether running different models simultaneously or distributing a single model’s workload.

Topic		Replies	Views
Using multiple models on the same hailo 8 module General gstreamer , hailort , raspberry-pi , hailo8	4	198	April 29, 2025
Multi model setup on RPI5 Hailo AI kit for python api General hailort , raspberry-pi , hailo8	2	539	September 5, 2024
How the scheduler works for multiple models and multiple Hailo chips? General multi_device	1	464	February 12, 2024
Multithreading inference of two models on a single Hailo-8 General hailort , hailo8	6	209	January 2, 2025
Do I need 2 hailo devices if I use 2 hef models? General	2	191	January 7, 2025