Comparing Processing speed with and without NPU capabilities

Hi

I am trying to compare the processing speed of an ML pipelines with and without NPU. I was told in a different community post, that it can be done depending on how the code is written. I can specifically write code to use or not use the NPU capabilities.

Can you please suggest how this can be achieved? If I look at the basic pipeline code examples that have been provided, do they already have something specific that makes them utilize the NPU? Or is that something I would need to modify somewhere else?

Some of the information I found also suggested that the Hailo SDK does all this when converting an ML Model to be deployable in Raspberry Pi utilizing Hailo AI Hat.

However, I could not find much information about that. I am new to this and hence lack experience.

Can you please point me to the right direction?

Hi, to run networks on Hailo, you first need to compile them using our SDK into a format executable on the device, called HEF.
Our comprehensive SDK guides you through the entire process—from a trained model in PyTorch, TensorFlow, ONNX, etc., to a final HEF file. This involves steps such as conversion, quantization, and compilation.
You can find our SDK documentation on the Hailo.ai website.
Once the model is compiled, you can run it directly on the Hailo accelerator.

Thank you for the response.

That would run it utilising the NPU capabilities of Hailo, I assume.

Is there a way to run the same model in Raspberry Pi without utilising the AI Kit? This is just to compare the processing speed.

I do not want to physically detach the AI Kit from the pi. Is there a way to run any command on the Command line to disable the AI Kit, or run the ML Model with a specific attribute that would not utilise the AI Kit?