Hi, thanks for notifications.
Are there any plans to support Hailo conversion for models larger than Base (e.g., Small, Medium, Large)?
Hi, thanks for notifications.
Are there any plans to support Hailo conversion for models larger than Base (e.g., Small, Medium, Large)?
@hyungjun_Byun support to larger Whisper models is planned for Hailo-10H architecture, which allows faster throughput when running those models.
For Hailo-8, larger models may be supported, but of course the FPS will be limited, making them less suitable for real time applications.
You can give a look at how we converted the tiny and base model in the hailo-whisper repo. For example, you can adapt the different scripts (export, convert_whisper_encoder, …) to add support for Whisper-small
Do you know if this example is already supposed to work for Hailo-10H, or not yet? Trying it now with 10H yields:
[HailoRT] [error] CHECK failed - Failed to create vdevice. there are not enough free devices. requested: 1, found: 0
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_OUT_OF_PHYSICAL_DEVICES(74)
The check_installed_packaged.sh script shows “OK” for everything, so I’m not sure what else I can do to get this working (driver and HailoRT was built and installed from source as I don’t think there are binaries for 10H available yet)
Hi @Kieran_Coulter,
The example works with Hailo-10H as well, but the required version for that platform is 5.x, not 4.x (which is for Hailo-8/8L). For example, you can install version 5.1.0 to test the application. Apologies for the inconvenience, we will update the documentation accordingly.
Anyway, since you have Hailo-10H, we strongly recommend to move to the Speech-To-Text API, which is integrated in HailoRT and allows you to use optimized models from the Model Zoo.
Thanks Pierre. I have updated to the latest version but still get the same error.
I’m happy to move to the STT API, but I can’t find a speech recognition model in HEF format in the model zoo. Should I make my own instead? I imagine I will just need to convert a Whisper model to HEF format.
I see now, there is a hailo-whisper repository a few replies above with the instructions to convert the models ourselves.
One last question. I noticed this comment in the Conversion README:
”Since the embedding operators have been removed from the model at conversion time, they must run on the host CPU.”
Is this really the case - Whisper (as an ONNX or a HEF) cannot run on the NPU? This is a potential showstopper. If it’s true I won’t bother continuing with the conversion. I am curious why this would be the case.
Let me clarify the options:
The speech_recognition application was created for Hailo-8 and then made compatible for Hailo-10H. The models used in this application were created with the hailo-whisper repo. The decoder embeddings were removed at conversion time for compatibility with the Hailo-8 architecture, and must be run on the host CPU. All the other operators run on the accelerator.
Running the embeddings on the host CPU does not add much overhead.
For Hailo-10H only, we released other models in the Model Zoo Gen AI. These models can be used only with the Speech-To-Text API (and the models from the application example will not run with this API ).
When using the Speech-To-Text API, operations like tokenization and embeddings will be offloaded to the Hailo-10H completely.