Combining Hailo and ONNX Runtime

Hi everyone!

I was looking around the Application Code Examples and stumbled upon the Hailo ONNX Runtime Example. My idea was to apply this approach to porting models outside of the Hailo Model Zoo to the chip - run the first half with all compatible steps on the Hailo 8 chip, then directly feed the output from the Hailo 8 to the rest of the ONNX and run it locally. In my mind, you can rather easily port models to the Hailo 8, while offering a (hopefully) significant performance boost. Does this approach makes sense, or are there any downsides I might be missing?

While ONNX Runtime can simplify the execution of a model’s pre- and post-processing steps, it has the drawback of using a blocking API. On the Hailo device, layers are executed concurrently - once the first layer finishes processing all rows of an image, it can immediately begin processing the next image. However, with a blocking API, you must wait for the current image’s result before sending the next one, which can reduce performance. Depending on your requirements, this may still be sufficient.