Hello everyone,
I’m developing a research-driven radio project that combines large language models, real-time dialogue generation, and audio pattern recognition.
The system runs on a TUXEDO Nano (Ryzen AI 7 + Hailo-8) and aims to create an interactive, multilingual radio environment — one that responds to people in a friendly, human-like, and culturally engaging way.
My focus is on achieving low-latency inference in continuous audio interaction loops. Specifically, I’m exploring how to optimize short overlapping audio frames and mixed workloads (speech recognition + language response) within the Hailo SDK.
Has anyone here experimented with such real-time pipelines or found best practices for managing inference buffers or scheduling on Hailo-8 in similar conditions?
Any insights, references, or even cautionary notes are warmly appreciated.
Best regards,
D. T.