Hailo 10H LLM parameter count

orso.eric · September 14, 2024, 2:33pm

From the product brief:

High performance processing of
generative AI models including LLMs
with minimal CPU / GPU load

A Phi-3-mini-4k Q5 model has 3B parameters, a context length of 4K tokens and I run it on my APU at around 10 to 15 tokens/s

I run a llama 3.1 8B Q4 model at a context length of 10K tokens at 4 tokens/s

What kind of LLM are you targeting with the 10H? key metrics are parameter count, tokens per second and context length.

Nadav · September 15, 2024, 2:29pm

We’re thrilled to see your interest in our next-gen AI accelerator!

Currently, the Hailo-10H is available exclusively to selected customers. We eagerly anticipate the day it will be generally available.

Ganesh_Suryanarayana · March 15, 2025, 3:45pm

Hello,
Is the Hailo 10H available for evaluation?

Topic		Replies	Views
Hailo-10 and LLM General	1	117	July 4, 2025
Running local LLM using Hailo-8L General hailo8	5	9257	September 13, 2024
LLaMA 2 7B HEF download request General	1	209	October 11, 2024
Hailo-8 for attention-based object detection models General raspberry-pi , hailo8	3	58	July 16, 2025
Inquiry: Updates on LLM Deployment and Whisper Integration on Hailo's NPU General dfc , raspberry-pi , hailo8	5	1409	January 14, 2025