it was easier than I remembered (and @user540 already posted some tests above as well):
- Installed Hailo Apps (the version before the current one)
- Updated to HailoRT 5.2.0 and Model Zoo 5.2.0 (as described here)
- Downloaded Whisper-Small.hef and changed the model in the config file (as described here)
- Used a slightly modified version of
hailo-apps/hailo_apps/python/gen_ai_apps/voice_assistant/voice_assistant.py(without LLM):
The changed part in voice_assistant.py (LLM not actually used):
def on_audio_ready(self, audio):
self.abort_event.clear()
# 1. Transcribe
user_text = self.s2t.transcribe(audio, language="de")
if not user_text:
print("No speech detected.")
return
print(f"\nYou: {user_text}")
# 2. Output directly via TTS (no LLM)
if self.tts:
self.tts.clear_interruption()
self.tts.queue_text(user_text.strip())
# 3. Handshake: wait until TTS is finished, then listen again
if self.interaction:
try:
self.interaction.restart_after_tts()
except Exception:
pass
These are the test results (menu output such as “Press SPACE to start/stop recording.” omitted):
python -m hailo_apps.python.gen_ai_apps.voice_assistant.voice_assistant
2026-04-23 13:21:07.779126749 [W:onnxruntime:Default, device_discovery.cc:325 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:92 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
Initializing AI components... (This might take a moment)
INFO | common.core | Using default model: Whisper-Small
INFO | common.core | Found HEF in resources: /usr/local/hailo/resources/models/hailo10h/Whisper-Small.hef
INFO | common.core | Found HEF in resources: /usr/local/hailo/resources/models/hailo10h/Qwen2.5-1.5B-Instruct.hef
✅ AI components ready!
==================================================
Voice Assistant
==================================================
...
INFO | voice_processing.audio_player | Audio output stream started.
🔴 Recording started. Press SPACE to stop.
Processing... Please wait.
You: Mal sehen, was jetzt passiert. Ich bin sehr gespannt. This is a test. Dies ist ein Test Understood.
Press SPACE to start recording.
...
Processing... Please wait.
You: Mal sehen, was jetzt passiert. I am very curious. This is a test
In both cases, the same sentence in German was used:
“Mal sehen, was jetzt passiert. Ich bin sehr gespannt. Dies ist ein Test.”
(Translation: “Let’s see what happens. I am very curious. This is a test.”)
- “Dies ist ein Test” always becomes “This is a test”. In one case, it was even detected twice, once in German and once in English, with an added “Understood” (which was not spoken).
- “Ich bin sehr gespannt” was detected once correctly and once transcribed into English.
The result was similar for other texts in previous tests as well: a mixture of languages and sometimes also word fragments (for example, “curiou” instead of “curious”).
The microphone is fine: I tested 7 additional STT solutions with the same one. ![]()