Hailo 10H LLM parameter count

From the product brief:

High performance processing of
generative AI models including LLMs
with minimal CPU / GPU load

A Phi-3-mini-4k Q5 model has 3B parameters, a context length of 4K tokens and I run it on my APU at around 10 to 15 tokens/s

I run a llama 3.1 8B Q4 model at a context length of 10K tokens at 4 tokens/s

What kind of LLM are you targeting with the 10H? key metrics are parameter count, tokens per second and context length.

Hi @orso.eric,

We’re thrilled to see your interest in our next-gen AI accelerator! :blush:

Currently, the Hailo-10H is available exclusively to selected customers. We eagerly anticipate the day it will be generally available.