Automatic Speech Recognition Pipeline with Whisper model

pierrem · March 18, 2025, 7:58am

Whisper Update

Hello Hailo Community!

We are excited to share that we’ve been working on an automatic speech recognition pipeline based on the Whisper-tiny model, running it on a Raspberry Pi with our Hailo-8/8L AI accelerator.
Given how much interest this topic has generated in the community, we wanted to provide a quick status update.

Our focus so far has been on optimizing performance to make the pipeline faster and more efficient for real-time applications. The attached video showcases some of our initial results, and we’re actively refining the system to improve latency, resource usage, and overall efficiency.

This is not an official release yet, but we plan to share a working application with the Community soon. In addition, we are also evaluating larger Whisper variants to explore the trade-offs between accuracy and performance.

If you have any thoughts, suggestions, or experience with similar optimizations, we’d love to hear from you! Stay tuned for more updates as we continue improving the pipeline!

Constantine_Vainstei · April 11, 2025, 1:42pm

Hello,
Are you planning to share the project and the source code? If you are, is there any ETA?

Thank you

pierrem · May 23, 2025, 7:45am

@Constantine_Vainstei a Python example using the Whisper-tiny model has been released in the Hailo Application Code Examples repository. Further improvements will be released in the next weeks. Feel free to test it, as well as to join our Community Challenge!

Christian_Schnizler · May 28, 2025, 9:44am

Very cool - i got this working.
Are there some instructions on how to get/train/convert whisper tiny models for other languages?
Is there a repository that shows how the en hef model was created?

visionish · May 30, 2025, 5:16am

I am really excited for this one. Could you please provide the YAML recipe and any required assets (ONNX, checkpoints, HEF files) for the whisper_tiny_encoder_10s_15db and decoder models used in your speech_recognition demo? I feel like I may be missing something obvious.

giladn · June 1, 2025, 7:35am

The installation flow will download the required HEFs.

pierrem · June 4, 2025, 7:54am

The Whisper model is multi-language, but for this specific project we used English audio for calibration, and only English is considered in the decoding phase.
We are discussing internally about releasing the conversion flow. In case, we will update this thread with an ETA.

MafiaCoconut · June 24, 2025, 10:11pm

Hallo, Do you plan to create instructions in the future on how to create models for other languages so that everyone can configure other languages themselves?

Miah_Way · June 25, 2025, 3:46am

How did you get this to work on the Pi? I couldnt get past the SDK being needed

pierrem · June 26, 2025, 7:44am

Yes, we will provide the conversion script int he coming months. Actually, since we are already using the multilingual version of Whisper (see here, you will simply need to calibrate the model with audio from a specific language, or from multiple languages, In our example, calibration was done only on English audio.

pierrem · June 26, 2025, 7:47am

@Miah_Way the installation instructions in the Github page are valid also for the Raspberry Pi5. The SDK is not needed, since the app downloads a precompiled version of the model.
Also, the SDK (i.e. the DataFlow Compiler) can run only on an x86 machine, not on the Pi.

pierrem · September 12, 2025, 3:08pm

@MafiaCoconut @Christian_Schnizler We just published a repo with the conversion scripts for the tiny/base variants.

The scripts uses LibriSpeech dev-clean dataset for model optimization, but of course you can adapt the script to load your own dataset.

MafiaCoconut · September 12, 2025, 3:39pm

Hi, thanks for notifications!

Just to be clear, what exactly is the difference between the tiny/base version and tiny.en/base.en?
Is “tiny/base“ for multi language and “tiny.en/base.en” specifically for English?

pierrem · September 16, 2025, 9:41am

@MafiaCoconut yes, the architecture is the same, but the .en models are trained on English language only. They are supposed to be better at English than the standard tiny and base models, but from our tests we didn’t see a lot of difference at English between them

Topic		Replies	Views
Whisper Full Release - Now Available! Announcements	12	4083	September 19, 2025
Whisper Example with Hailo Acceleration! Community Projects whisper	11	2267	July 1, 2025
Whisper + Hailo8(L) + FastAPI Community Projects raspberry-pi , hailo8 , whisper	11	903	July 15, 2025
Real-time ASR on Raspberry Pi + Hailo8L with Whisper Community Projects	1	133	October 19, 2025
Whisper support for RPI5-Hailo8L Kit General raspberry-pi , hailo8	34	8247	May 31, 2025

Automatic Speech Recognition Pipeline with Whisper model

Whisper Update

Hello Hailo Community!

Related topics