Hailo farm of 16 for ollama

Sabium3 · December 22, 2024, 3:01am

Hello everyone. I have a few questions for you about hailo modules.
I would like to buy several Hailo units (8 10 or future ones) but I would like to know if they are able of doing what I plan to do with them first.

1 Is it possible to run the Hailo units on a normal PC (cpu + motherboard + windows) on a pcie socket ? (With an m.2 > pcie adapter obviously)

2 Given that there are Pcie > 4 m.2 slot adapters.
Would a computer then be able to use the 4 modules all together on a single LLM application?

3 Given that there are several Pcie sockets on a motherboard, and that multi-gpu is sometimes supported up to 4 graphics cards.
Is it possible to have 4 strips of 4 hailo modules working on the same pc?

There is the subject of pcie lane multipliers, but it’s a too much hard subject for my under moderate level of IA user.

The final idea is to have a super-pc capable of doing:
16 x 26 = 416 TFlops (hailo 8)
Of only 16 x 2.5 = 40 watts
For only 16 x ~200 = $ ~3200
All while using not the vram but the cpu ram instead for the LLMs (which goes up to ~200-300 max at the moment) to be able to use any of the biggest LLMs.

So I could eventually have an imitation of a H100 for much less money when my computer are fully evolved (2030).

4 How far can a such an idea work?

((Will the hailo 10 be compatible in such a setup with their future versions…))

I want to use : Ollama, Comfyui, OpenHands, Home Assistant.

If yes to all these questions, then bonus question: is it still possible to invest in you?

omria · December 23, 2024, 2:39pm

Hey @Sabium3,

Welcome to the Hailo Community! Your planned PC project sounds fantastic, and we’re here to help address your questions:

Compatibility:
- Hailo accelerators are compatible with x86/ARM architectures and support both Windows and Linux operating systems.
- Using an M.2 to PCIe adapter is a valid and supported method for integrating the modules into your PC.
& 3. Multiple Card Configurations:
- Yes, you can run multiple Hailo cards together. Many of our clients use configurations with up to 64 cards for various applications.
LLM (Large Language Model) Considerations:
- Using 4 Hailo-8 cards with CPU RAM for LLMs is an innovative idea, but this setup hasn’t been tested or validated for such use cases.
- For LLM workloads, we recommend the Hailo-10H, which is specifically optimized for these tasks. A single Hailo-10H can handle certain LLMs effectively, and we’ve tested models like LLaMA internally with promising results (though not officially released yet).
- I will check with our R&D team to explore the feasibility of running LLMs with 4 Hailo-8 cards and provide updates.

Additional Notes:

Running applications like ComfyUI, OpenHands, or Home Assistant should work seamlessly. However, for LLMs, the Hailo-10H offers the best performance.
For multi-card setups (Hailo-8 or Hailo-10H), ensure your PC has adequate PCIe lanes, sufficient power supply, and proper cooling solutions.

Let me know if you have more questions, and I’ll follow up with insights from R&D regarding your specific use case.

Note: For information about investments, please contact our team through the website.

Best regards,
Omri
Application Engineer & Community Manager
Hailo

Sabium3 · December 31, 2024, 10:31am

Thanks you really much for such information ^^ !
Sorry for taking time to answer you, christmas ^^'.

So my idea/dream can be real ! it’s work, i only need to wait for the hailo 10.
I have my time, i’m no rush, I still need to learn how to operate these little AIs.
I think my idea might even help other people, like v7lb4l85 who also want work on LLms under Hailo.

You said 64?! … 64 hailo × 26 = 1 664 TFlops > 1.7 PFlops !!! Wow !!!
Real server work at low cost with many hailos ^^ !

I have also 4 little question :

-1 I have read somewhere that Hailo 10 was coming out around April-August 2025 ? Any news ? (If you have)

-2 hailo TFlops is similar to fp 16 or fp 32 ?
I have never seen that information anywhere.

For comparison the H100 do :
200 TFlops (fp16) (techpowerup. Com my source)
50 TFlops (fp32)

(((If Fp16 case : So 64 hailo for 13000€ = 8 H100 ! Only need to have the same Vram or Cpu Ram. ^^)))

-3 What is the lithographie nm of the hailo 8 and 10 ? (If it’s not a secret for sur.)
I did not have found that information anywhere.
I have seen the last outsider Elbrus gonna do 7nm with his latest 32s. So by deducte i think you can also have 14 or even 7nm for hailo 10.

-4 A complexe one. While looking at other topic i have seen “FALCON-H8 (A ; D)” from “Lanner”
lannerinc .com
Who sells a Pcie>m.2 hailo, exactly like what i have said but they have 6 hailo instead of 4.

And you have say 64… I wonder, what could be the theoretical maximum number of hailo on the same pcie line ?

If it’s only the hardware… The bandwidth of the pcie 5 -4 -3.

So can you tell me/us (internet) how much bandwidth each hailo module need for his workload ? (If not a secret obviously)
And i can do the math for the theoretical maximum on each motherboard pcie ^^.

Thanks you again to taking time to answer me ^^.

Topic		Replies	Views
Swap Hailo 8L for 8 General	4	696	March 26, 2025
Bottlenecks using multiple Hailo8 modules General hailo8	2	494	September 18, 2024
Running local LLM using Hailo-8L General hailo8	5	10504	September 13, 2024
Does Hailo 8 work with raspberry pi 5? General	15	4278	June 14, 2024
Is there an Ollama alternative that support Hailo? General	2	900	March 24, 2025

Hailo farm of 16 for ollama

Related topics