Speech-to-text does not work well/upgrading to HailoRT 5.2.0/using Whisper-Small

Hi! I am trying to implement a simple speech-to-text solution. I am using a RPi 5, 8 GB and the Hailo 10h HAT+ 2.

I have followed all steps to install hailo-apps and to configure the system. In general, it works.

But…

The speech-to-text-transcription is awful. I have used the voice_assistant example solution to test (at the end just reduced to the STT part, only without LLM) and the transcriptions match on average max. 40% with what was said. Sometimes it totally halluzinates. I have tried different USB microphones and speaking way louder. Even, if the words were identified, there might be characters missing (e.g. saying “characters” gets you “characte” as text).

After some time of digging I identified, that there would be a Whisper-Small modell which - I guess - might work better. Unfortunately, hailo-download-resources was not able to download the model - as the official Pi installation instructions get you HailoRT 5.1.1 installed and Whisper-Small is only available for 5.2.0.

I have found hints on how to upgrade the system/HailoRT to 5.2.0 ( Raspberry Pi 5 and AI Hat +2 5.2 Driver Issues 5.1.1 - #4 by user491 which gets you to How to Run Local LLMs on Raspberry Pi 5 with AI HAT+ 2 (Hailo-10H) & Rons amazíng Hailo Raspberry Pi 5 build tutorial).

But can I just upgrade HailoRT to 5.2.0 and the Hailo Apps adapt? Or do I have to change configuration files manually? Or is there an official “How to upgrade to 5.2.0” guide? Or is there a way better model/solution (with Hailo AI Hat+ 2) for speech-to-text transcriptions?

Any help is appreciated!

Regards,

HerrB92

Hi @HerrB92,

Whisper-Small requires HailoRT 5.2.0 - you’re correct. It’s ~3x larger and should be significantly more accurate. hailo-apps supports both 5.1.1 and 5.2.0 for Hailo-10H, so once you upgrade the runtime the app code should work without manual config changes. You’d need to grab the 5.2.0 packages (PCIe driver .deb, hailort .deb, PyHailoRT .whl, and GenAI Model Zoo) from the Hailo Developer Zone and install manually, since the RPi apt repo serve 5.1.1. Reboot after installing.

Before upgrading, worth checking:

  • Make sure you’re setting language=“en” explicitly otherwise the model wastes capacity on language detection
  • Verify audio is captured at 16 kHz, mono, float32 normalized to [-1.0, 1.0) - wrong format/sample rate can cause garbled output
  • Test with a known good WAV file using the simple_whisper_chat example to rule out mic issues

Hope this helps, can you please try the WAV file test first to narrow things down?

Thanks,

Complete stepo-by-step guide for the HailoRT 5.2.0 update + Whisper Small: Upgrading to HailoRT 5.2.0 - step by step (Raspberry PI & Hailo Apps)