Can the Raspberry Pi AI Kit support running large language models (LLMs)?

I purchased the Raspberry Pi AI Kit and want to accelerate the execution speed of LLMs. I would like to ask how the Hailo-8 can support running LLMs on the Raspberry Pi 5 (e.g., Ollama, PyTorch).
Are there any SDKs or tutorials available?

Thank you.

1 Like

Hey @emmet.yu ,

Hailo’s tech needs models to be converted using their Dataflow Compiler, which only works with the original model in floating point format. The compiler quantizes the network during optimization, so models that have already been quantized in another framework can’t be converted. Why? Because going from one quantization scheme to another would mess up the accuracy big time.

LLMs are way bigger than your typical CNNs when it comes to parameters. Hailo’s Dataflow Compiler can handle models that are too big for a single device, but converting an LLM would mean dealing with over 100 context switches. That’s a ton of work and RAM the host system would have to handle. To make things easier, Hailo came up with a new product called the Hailo-10H. It’s got a DDR interface and local DDR memory right on the module, so the host doesn’t have to worry about managing all those context switches.

Hailo-10H M.2 Generative AI Acceleration Module

Note: As of June 2024 the Hailo-10H is not yet available to a wider audience.

Well, I have some odd info to add… I just installed ollama running deepseek-r1:1.5b on 4 machines. Two Jetson nano’s with 4gb ram, 1 pi 5 with the hailo-8l and 4 gb ram and a pi 5 with 8 gb ram. Given the same question, the pi 5 with the hailo-8l smoked all of them. All gave exactly the same answer. The Jetson nano’s took 22 seconds. The pi 5 8gb took 6 seconds. The pi 5 4gb with hailo-8l took 4 1/2 seconds, typing faster than i could read. So, “something” was helping it.

3 Likes

Hi! I am working on running LLM on Raspberry Pi too. Can you add some details on the framework you used for the inference on the Pi5 with the hailo-8L?