Hailo farm of 16 for ollama.

Hello everyone. I have a few questions for you about hailo modules.
I would like to buy several Hailo units (8 10 or future ones) but I would like to know if they are able of doing what I plan to do with them first.

1 Is it possible to run the Hailo units on a normal PC (cpu + motherboard + windows) on a pcie socket ? (With an m.2 > pcie adapter obviously)

2 Given that there are Pcie > 4 m.2 slot adapters.
Would a computer then be able to use the 4 modules all together on a single LLM application?

3 Given that there are several Pcie sockets on a motherboard, and that multi-gpu is sometimes supported up to 4 graphics cards.
Is it possible to have 4 strips of 4 hailo modules working on the same pc?

There is the subject of pcie lane multipliers, but it’s a too much hard subject for my under moderate level of IA user.

The final idea is to have a super-pc capable of doing:
16 x 26 = 416 TFlops (hailo 8)
Of only 16 x 2.5 = 40 watts
For only 16 x ~200 = $ ~3200
All while using not the vram but the cpu ram instead for the LLMs (which goes up to ~200-300 max at the moment) to be able to use any of the biggest LLMs.

So I could eventually have an imitation of a H100 for much less money when my computer are fully evolved (2030).

4 How far can a such an idea work?

((Will the hailo 10 be compatible in such a setup with their future versions…))

I want to use : Ollama, Comfyui, OpenHands, Home Assistant.

If yes to all these questions, then bonus question: is it still possible to invest in you?

Hey @Sabium3,

Welcome to the Hailo Community! Your planned PC project sounds fantastic, and we’re here to help address your questions:


  1. Compatibility:

    • Hailo accelerators are compatible with x86/ARM architectures and support both Windows and Linux operating systems.
    • Using an M.2 to PCIe adapter is a valid and supported method for integrating the modules into your PC.
  2. & 3. Multiple Card Configurations:

    • Yes, you can run multiple Hailo cards together. Many of our clients use configurations with up to 64 cards for various applications.
  3. LLM (Large Language Model) Considerations:

    • Using 4 Hailo-8 cards with CPU RAM for LLMs is an innovative idea, but this setup hasn’t been tested or validated for such use cases.
    • For LLM workloads, we recommend the Hailo-10H, which is specifically optimized for these tasks. A single Hailo-10H can handle certain LLMs effectively, and we’ve tested models like LLaMA internally with promising results (though not officially released yet).
    • I will check with our R&D team to explore the feasibility of running LLMs with 4 Hailo-8 cards and provide updates.

Additional Notes:

  • Running applications like ComfyUI, OpenHands, or Home Assistant should work seamlessly. However, for LLMs, the Hailo-10H offers the best performance.
  • For multi-card setups (Hailo-8 or Hailo-10H), ensure your PC has adequate PCIe lanes, sufficient power supply, and proper cooling solutions.

Let me know if you have more questions, and I’ll follow up with insights from R&D regarding your specific use case.

Note: For information about investments, please contact our team through the website.

Best regards,
Omri
Application Engineer & Community Manager
Hailo