From the product brief:
High performance processing of
generative AI models including LLMs
with minimal CPU / GPU load
A Phi-3-mini-4k Q5 model has 3B parameters, a context length of 4K tokens and I run it on my APU at around 10 to 15 tokens/s
I run a llama 3.1 8B Q4 model at a context length of 10K tokens at 4 tokens/s
What kind of LLM are you targeting with the 10H? key metrics are parameter count, tokens per second and context length.