Exploring real-time neural inference on Hailo-8 for a multilingual cultural radio project

Hello everyone,

I’m developing a research-driven radio project that combines large language models, real-time dialogue generation, and audio pattern recognition.

The system runs on a TUXEDO Nano (Ryzen AI 7 + Hailo-8) and aims to create an interactive, multilingual radio environment — one that responds to people in a friendly, human-like, and culturally engaging way.

My focus is on achieving low-latency inference in continuous audio interaction loops. Specifically, I’m exploring how to optimize short overlapping audio frames and mixed workloads (speech recognition + language response) within the Hailo SDK.

Has anyone here experimented with such real-time pipelines or found best practices for managing inference buffers or scheduling on Hailo-8 in similar conditions?

Any insights, references, or even cautionary notes are warmly appreciated.

Best regards,
D. T.