Automatic Speech Recognition Pipeline with Whisper model

hyungjun_Byun · October 17, 2025, 7:28am

Hi, thanks for notifications.

Are there any plans to support Hailo conversion for models larger than Base (e.g., Small, Medium, Large)?

pierrem · November 4, 2025, 2:09pm

@hyungjun_Byun support to larger Whisper models is planned for Hailo-10H architecture, which allows faster throughput when running those models.

For Hailo-8, larger models may be supported, but of course the FPS will be limited, making them less suitable for real time applications.
You can give a look at how we converted the tiny and base model in the hailo-whisper repo. For example, you can adapt the different scripts (export, convert_whisper_encoder, …) to add support for Whisper-small

Kieran_Coulter · January 6, 2026, 11:44pm

Do you know if this example is already supposed to work for Hailo-10H, or not yet? Trying it now with 10H yields:

[HailoRT] [error] CHECK failed - Failed to create vdevice. there are not enough free devices. requested: 1, found: 0
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)

The check_installed_packaged.sh script shows “OK” for everything, so I’m not sure what else I can do to get this working (driver and HailoRT was built and installed from source as I don’t think there are binaries for 10H available yet)

pierrem · January 7, 2026, 8:46am

Hi @Kieran_Coulter,
The example works with Hailo-10H as well, but the required version for that platform is 5.x, not 4.x (which is for Hailo-8/8L). For example, you can install version 5.1.0 to test the application. Apologies for the inconvenience, we will update the documentation accordingly.

Anyway, since you have Hailo-10H, we strongly recommend to move to the Speech-To-Text API, which is integrated in HailoRT and allows you to use optimized models from the Model Zoo.

C++ example, requires version 5.1.0 or successive
Python example, requires version 5.1.1 or successive

Kieran_Coulter · January 12, 2026, 9:32pm

Thanks Pierre. I have updated to the latest version but still get the same error.
I’m happy to move to the STT API, but I can’t find a speech recognition model in HEF format in the model zoo. Should I make my own instead? I imagine I will just need to convert a Whisper model to HEF format.

Kieran_Coulter · January 12, 2026, 10:31pm

I see now, there is a hailo-whisper repository a few replies above with the instructions to convert the models ourselves.

Kieran_Coulter · January 13, 2026, 12:53am

One last question. I noticed this comment in the Conversion README:

”Since the embedding operators have been removed from the model at conversion time, they must run on the host CPU.”

Is this really the case - Whisper (as an ONNX or a HEF) cannot run on the NPU? This is a potential showstopper. If it’s true I won’t bother continuing with the conversion. I am curious why this would be the case.

pierrem · January 14, 2026, 5:30pm

Let me clarify the options:

The speech_recognition application was created for Hailo-8 and then made compatible for Hailo-10H. The models used in this application were created with the hailo-whisper repo. The decoder embeddings were removed at conversion time for compatibility with the Hailo-8 architecture, and must be run on the host CPU. All the other operators run on the accelerator.
Running the embeddings on the host CPU does not add much overhead.
For Hailo-10H only, we released other models in the Model Zoo Gen AI. These models can be used only with the Speech-To-Text API (and the models from the application example will not run with this API ).
When using the Speech-To-Text API, operations like tokenization and embeddings will be offloaded to the Hailo-10H completely.