Using Multiple Hailo-8 M.2 Accelerators: Running Different Models and Sequential Acceleration

Hi,

I’m working with a setup that includes two or three Hailo-8 M.2 AI Acceleration modules, and I have a couple of questions regarding their capabilities:

  1. Is it possible to run different neural network models on separate Hailo accelerators? For example, having one model run on one Hailo-8 module and another model on a different module simultaneously?

  2. Can I distribute the workload of a single model across multiple Hailo accelerators? Specifically, is there a way to offload part of the neural network (e.g., some layers or weights) to one accelerator and the rest to another to achieve sequential acceleration?

I would appreciate any insights or examples on how to configure and optimize the use of multiple Hailo-8 modules for these purposes.

Thanks!

Hey @ivanstepanovftw

Running Multiple Models on Hailo-8 M.2 AI Acceleration Modules

Hailo-8 M.2 AI acceleration modules support running different neural network models simultaneously and distributing workload across multiple accelerators. Here’s how:

1. Running Different Models on Separate Hailo-8 Modules

Hailo’s architecture allows execution of different neural network models on separate accelerators concurrently.

  • Key Components:

    • HailoRT framework
    • Multi-Process Service
    • Model Scheduler
  • Implementation:

    • Assign each accelerator to a separate task
    • Use Multi-Process Service for resource allocation
    • Utilize Model Scheduler to manage different tasks and models

2. Distributing a Single Model Across Multiple Accelerators

HailoRT supports model parallelism, allowing distribution of a single neural network across multiple accelerators.

  • Key Features:

    • Sequential acceleration
    • Model Scheduler
    • Cascaded Networks Structure (TAPPAS framework)
  • Implementation:

    • Use Model Scheduler to define which portions of the model run on which device
    • Implement Cascaded Networks Structure for flexible workload distribution

3. Sequential Acceleration

This approach involves sequentially offloading network layers or portions of the model across multiple devices.

  • Key Components:

    • Stream Multiplexer
    • Model Scheduler Optimizations
  • Benefits:

    • Higher throughput
    • Reduced latency

Configuration

To achieve these setups:

  1. Use appropriate APIs provided by HailoRT
  2. Leverage HailoRT features for optimal performance
  3. Configure the system to distribute tasks or model layers as needed

This approach allows for efficient utilization of multiple Hailo accelerators, whether running different models simultaneously or distributing a single model’s workload.