Hi! I am trying to implement a simple speech-to-text solution. I am using a RPi 5, 8 GB and the Hailo 10h HAT+ 2.
I have followed all steps to install hailo-apps and to configure the system. In general, it works.
But…
The speech-to-text-transcription is awful. I have used the voice_assistant example solution to test (at the end just reduced to the STT part, only without LLM) and the transcriptions match on average max. 40% with what was said. Sometimes it totally halluzinates. I have tried different USB microphones and speaking way louder. Even, if the words were identified, there might be characters missing (e.g. saying “characters” gets you “characte” as text).
After some time of digging I identified, that there would be a Whisper-Small modell which - I guess - might work better. Unfortunately, hailo-download-resources was not able to download the model - as the official Pi installation instructions get you HailoRT 5.1.1 installed and Whisper-Small is only available for 5.2.0.
I have found hints on how to upgrade the system/HailoRT to 5.2.0 ( Raspberry Pi 5 and AI Hat +2 5.2 Driver Issues 5.1.1 - #4 by user491 which gets you to How to Run Local LLMs on Raspberry Pi 5 with AI HAT+ 2 (Hailo-10H) & Rons amazíng Hailo Raspberry Pi 5 build tutorial).
But can I just upgrade HailoRT to 5.2.0 and the Hailo Apps adapt? Or do I have to change configuration files manually? Or is there an official “How to upgrade to 5.2.0” guide? Or is there a way better model/solution (with Hailo AI Hat+ 2) for speech-to-text transcriptions?
Any help is appreciated!
Regards,
HerrB92