Inquiry: Updates on LLM Deployment and Whisper Integration on Hailo's NPU

Are there any updates regarding the deployment of large language models (LLMs) on Hailo’s NPU?
Currently, the Dataflow Compiler (DFC) fails to compile, and the stack of supported models for the NPU remains limited and repetitive. This constraint significantly DIMINISHES THE UTILITY of a device that claims a performance of 13 and 26 TOPS.

Additionally, I came across a forum post mentioning ongoing work to enable Whisper on Hailo’s platform. Could you please share the timeline or expected release date for this feature?

Welcome to the Hailo Community!

To run LLMs efficiently we have designed a new accelerator called Hailo-10H. It has a DDR interface to allow it to store large models locally to free the host from managing the context switching.

It is currently not generally available. Our R&D are working on all the supporting software needed.

You can build any model based on the supported layers in the Hailo Dataflow Compiler (see User Guide for details) and run it on the Hailo hardware. The Hailo Model Zoo is just a collection of open-source models for popular AI tasks many of our customers are interested in. If you have some other AI tasks and models please let us know.

I do not have a timeline. If you want to integrate this into a product, please send me a PM with some details about your company, product and timeline and I will check with my colleagues what information we can share.

Hello klausk! Thank you for your response .

Its good that you have been building a new accelerator intended for LLM’s, but its not helpful as intended if i we have a seperate accelerator for each and every AI stack (like having an independent hardware for vision, Gen-AI, LLM’s, Reinforcement Learning) and also piles-up the cost. Think about applications such as assitance systems, visually impaired navigation systems etc. That require multiple models to work with.

And also should have specified the targeted application of product in product description (maybe i havent found right).

Yes, the model zoo works fine, but as you claim on supported layers even the yolov11 fails to compile (it dosent even parser) using a DFC even after converting to onnx, removing dynamic shaping, checking with different onnx ir versions, shape converting etc.

The DFC is failing to identify the right input and output layers itself if unspecified and throws an error on output layer if specified in arguments, and if we some how move on (accecpting the layer it considered as output layer which is not) then the shape format and size errors arise.

The Parser CLI will try to identify the right layers but it is not guaranteed to work in every case. It is one of the task required to identify the right start- and end node. This requires some experience and experimentation.

We do have a new GUI tool that will help you with that task. It is in preview and you can try it by running the following command inside the AI Software Suite docker.

hailo dfc-studio

Did you identify the right start- and end-nodes?

You will be able to run CNNs on the Hailo-10 as well.

Having different products allows you to find the best fit for your application.