Can the Raspberry Pi AI Kit support running large language models (LLMs)?

I purchased the Raspberry Pi AI Kit and want to accelerate the execution speed of LLMs. I would like to ask how the Hailo-8 can support running LLMs on the Raspberry Pi 5 (e.g., Ollama, PyTorch).
Are there any SDKs or tutorials available?

Thank you.

Hey @emmet.yu ,

Hailo’s tech needs models to be converted using their Dataflow Compiler, which only works with the original model in floating point format. The compiler quantizes the network during optimization, so models that have already been quantized in another framework can’t be converted. Why? Because going from one quantization scheme to another would mess up the accuracy big time.

LLMs are way bigger than your typical CNNs when it comes to parameters. Hailo’s Dataflow Compiler can handle models that are too big for a single device, but converting an LLM would mean dealing with over 100 context switches. That’s a ton of work and RAM the host system would have to handle. To make things easier, Hailo came up with a new product called the Hailo-10H. It’s got a DDR interface and local DDR memory right on the module, so the host doesn’t have to worry about managing all those context switches.

Hailo-10H M.2 Generative AI Acceleration Module

Note: As of June 2024 the Hailo-10H is not yet available to a wider audience.