Real-time ASR on Raspberry Pi + Hailo8L with Whisper

Hey everyone,

I’ve built on top of Hailo’s release of their Whisper model conversion and found that running a hybrid inference mode with the encoder running on the Hailo 8L and the decoder on CPU (in my case a Raspberry Pi 5) gives actually great real-time ASR performance and allows for live captioning (with ~250ms refresh time).

The hybrid mode works much better than relying only on the Hailo 8L for whisper inference (it is really slow in autoregessive decoding without KV cache).

I’ve released the code here:

Overall, compared to FasterWhisper – a good baseline for running Whisper on CPU – this gives me a 8.4x speedup.

1 Like

@Katrin_Tomanek thanks a lot for sharing with the community :grinning_face_with_smiling_eyes: