Using RPI5-Hailo8 AI HAT+ 26 TOPS for local Voice transcription

Hello Hailo Community,

I am seeking assistance with a persistent issue encountered while attempting to compile the distil-whisper/distil-small.en model for a Hailo-8L accelerator. The goal is to deploy a low-latency speech-to-text model for an embedded voice assistant project.

Methodology: The model was sourced from Hugging Face and exported to ONNX format using the recommended optimum-cli tool. The compilation is being attempted with the Hailo AI Software Suite (DFC Version 3.31.0) running within its official Docker container on a WSL2 host.

The Core Issue: The process fails during the parsing stage when running hailo parser onnx. Despite numerous attempts with different command-line flags, the process consistently terminates with the following error:

hailo_sdk_common.hailo_nn.exceptions.UnsupportedModelError: Invalid kernel shape for base conv layer base_conv1 (translated from /conv1/Conv).

This strongly suggests the Dataflow Compiler is struggling to interpret the initial 1D Convolutional layer in the Whisper encoder’s architecture, which is fundamental to how it processes audio spectrograms.

Troubleshooting Steps Performed: To resolve this, I have exhaustively tried to guide the parser by:

  1. Explicitly defining the input tensor name and shape using the --tensor-shapes flag.
  2. Explicitly defining the graph boundaries using --start-node-names and --end-node-names flags.

Unfortunately, none of these steps have resolved the underlying UnsupportedModelError. Furthermore, a review of the included Hailo Model Zoo for this SDK version shows no available models for Speech-to-Text or ASR tasks.

My Questions for the Community:

  1. Has anyone successfully compiled a Whisper-family model, or any audio model with initial 1D convolutions, for the Hailo-8L?
  2. Is there a known workaround, a required ONNX graph modification (“model surgery”), or a specific compilation recipe (YAML configuration) that can resolve this incompatibility with 1D Conv layers?
  3. Does a newer version of the Hailo SDK offer improved support for these audio-centric architectures or include a pre-configured ASR model?

Any guidance, insights, or successful workflows from the community would be immensely appreciated.

Error :"(hailo_virtualenv) hailo@docker-desktop:/local/shared_with_docker/whisper_onnx$ hailo parser onnx encoder_model.onnx --tensor-shapes ‘input_features=[1,80,3000]’ --start-node-names input_features --end-node-names last_hidden_state
[info] Current Time: 07:06:01, 07/09/25
[info] CPU: Architecture: x86_64, Model: 13th Gen Intel(R) Core™ i7-13650HX, Number Of Cores: 20, Utilization: 0.1%
[info] Memory: Total: 17GB, Available: 16GB
[info] System info: OS: Linux, Kernel: 6.6.87.2-microsoft-standard-WSL2
[info] Hailo DFC Version: 3.31.0
[info] HailoRT Version: 4.21.0
[info] PCIe: No Hailo PCIe device was found
[info] Running hailo parser onnx encoder_model.onnx --tensor-shapes input_features=[1,80,3000] --start-node-names input_features --end-node-names last_hidden_state
[info] Translation started on ONNX model encoder_model
[info] Restored ONNX model encoder_model (completion time: 00:00:06.08)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:11.51)
[info] Simplified ONNX model for a parsing retry attempt (completion time: 00:00:44.24)
Traceback (most recent call last):
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py”, line 239, in translate_onnx_model
parsing_results = self._parse_onnx_model_to_hn(
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py”, line 320, in _parse_onnx_model_to_hn
return self.parse_model_to_hn(
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py”, line 371, in parse_model_to_hn
fuser = HailoNNFuser(converter.convert_model(), net_name, converter.end_node_names)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/translator.py”, line 92, in convert_model
self._calculate_shapes(validate_shapes=False)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/onnx_translator/onnx_translator.py”, line 207, in _calculate_shapes
self._layers_graph.calculate_shapes(meta_edges_graph=self._meta_graph, validate_shapes=validate_shapes)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/hailo_nn/hailo_nn.py”, line 761, in calculate_shapes
self.update_input_shapes_from_predecessors(layer)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/hailo_nn/hailo_nn.py”, line 826, in update_input_shapes_from_predecessors
layer.input_shapes = input_shapes
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/hailo_nn/hn_layers/layer.py”, line 541, in input_shapes
self.set_input_shapes(input_shapes)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/hailo_nn/hn_layers/conv2d.py”, line 565, in set_input_shapes
raise UnsupportedModelError(
hailo_sdk_common.hailo_nn.exceptions.UnsupportedModelError: Invalid kernel shape for base conv layer base_conv1 (translated from /conv1/Conv).
Either the input shape doesn’t match the kernel shape, or the calculated groups number doesn’t match the expected ratio between kernel shape and input shape.
Kernel features: 80 Input features: 3000 Groups: 37

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/tools/parser_cli.py”, line 213, in run
self._parse(net_name, args, tensor_shapes)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/tools/parser_cli.py”, line 297, in _parse
self.runner.translate_onnx_model(
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/states/states.py”, line 16, in wrapped_func
return func(self, *args, **kwargs)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py”, line 1187, in translate_onnx_model
parser.translate_onnx_model(
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py”, line 280, in translate_onnx_model
parsing_results = self._parse_onnx_model_to_hn(
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py”, line 320, in _parse_onnx_model_to_hn
return self.parse_model_to_hn(
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py”, line 371, in parse_model_to_hn
fuser = HailoNNFuser(converter.convert_model(), net_name, converter.end_node_names)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/translator.py”, line 92, in convert_model
self._calculate_shapes(validate_shapes=False)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/model_translator/onnx_translator/onnx_translator.py”, line 207, in _calculate_shapes
self._layers_graph.calculate_shapes(meta_edges_graph=self._meta_graph, validate_shapes=validate_shapes)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/hailo_nn/hailo_nn.py”, line 761, in calculate_shapes
self.update_input_shapes_from_predecessors(layer)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/hailo_nn/hailo_nn.py”, line 826, in update_input_shapes_from_predecessors
layer.input_shapes = input_shapes
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/hailo_nn/hn_layers/layer.py”, line 541, in input_shapes
self.set_input_shapes(input_shapes)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_common/hailo_nn/hn_layers/conv2d.py”, line 565, in set_input_shapes
raise UnsupportedModelError(
hailo_sdk_common.hailo_nn.exceptions.UnsupportedModelError: Invalid kernel shape for base conv layer base_conv1 (translated from /conv1/Conv).
Either the input shape doesn’t match the kernel shape, or the calculated groups number doesn’t match the expected ratio between kernel shape and input shape.
Kernel features: 80 Input features: 3000 Groups: 37

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/local/workspace/hailo_virtualenv/bin/hailo”, line 8, in
sys.exit(main())
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/tools/cmd_utils/main.py”, line 111, in main
ret_val = client_command_runner.run()
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_platform/tools/hailocli/main.py”, line 64, in run
return self._run(argv)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_platform/tools/hailocli/main.py”, line 104, in _run
return args.func(args)
File “/local/workspace/hailo_virtualenv/lib/python3.10/site-packages/hailo_sdk_client/tools/parser_cli.py”, line 237, in run
raise ParserCLIException(str(err).replace(“net_input_format”, “input_format”)) from err
hailo_sdk_client.tools.parser_cli.ParserCLIException: Invalid kernel shape for base conv layer base_conv1 (translated from /conv1/Conv).
Either the input shape doesn’t match the kernel shape, or the calculated groups number doesn’t match the expected ratio between kernel shape and input shape.
Kernel features: 80 Input features: 3000 Groups: 37"

1 Like

Hey @Arvind_R,

Welcome to the Hailo Community!

Just wanted to let you know that we’ve put together a whisper model that works with both hailo8 and hailo8l. You can find the app we published here:

I’d definitely recommend using the models we’ve included there for whisper - compiling whisper yourself can be pretty tricky, so this should save you some headaches!

Hope this helps!

Thanks to the support of the community, we’ve successfully resolved the initial compilation challenges we faced while adapting the Whisper model for deployment. Specifically, we were able to overcome early-stage issues related to unsupported 1D convolution layers by restructuring the ONNX model—an approach we refer to as “model surgery.”

However, we’ve now encountered a more advanced compatibility issue during parsing. The compiler fails on a Transpose operation within the first transformer block’s self-attention mechanism, mistakenly treating it as an unsupported operation. This appears to be a deeper architectural limitation in how the Dataflow Compiler handles certain transformer internals—particularly in the attention flow—rather than a fixable modeling issue.

We understand that your team has successfully compiled variants of the Whisper model, such as Whisper Tiny. If you could provide any insight into how you addressed this transformer-related compatibility—especially around the handling of Transpose layers in attention—we would be extremely grateful. Even high-level guidance would be immensely helpful in advancing our efforts to run real-time ASR workloads efficiently on your hardware.

I am fine-tuning whisper to work for non-standard speech. To run those fine-tuned models with Hailo 8L acceleration, I’d need to repeat the conversion you have been doing. @Hailo team - would you mind making your conversion script accessible ?

Hey @Arvind_R , @Katrin_Tomanek ,

I will check with our ML team and update on this matter!

1 Like