Frustrating experience of converting a model from Torch to Hailo

The whole process of using Hailo-8 with Raspberry PI 5 was pretty frustrating so far. Hope that somebody could help me figuring out the ways to work around the problems.

  1. The documentation is cryptic and hard to find.

  2. Parts of the system work only on the RPI, and other parts work only on the x86 host, and need to be downloaded from different places.

  3. I had to spend couple of hours finding out best params for the model to fix “shift delta” errors.

  4. I had to convert the model from Torch to ONNX, and then from ONNX to HEF.

  5. The final compilation step just takes multiple hours or is just plain stuck. I have a beefy 16-core CPU with 128GB of RAM and it’s using just two cores.

  6. All the command line scripts show cryptic error messages and it’s pretty hard to figure out why e.g. it’s not using my GPU for the quantization step.

Overall, quantizing a model for ONNX was easy and fun, the process for Hailo is just frustrating.

Hi @heratsi

If you are looking to compile a YOLO checkpoint, you can check our cloud compiler: Early Access to DeGirum Cloud Compiler

I need to run my custom segmentation model, based on deeplabv3+resnet50 or lraspp+mobilenet.

I really don’t understand how am I supposed to interpret messages like:

Iteration failed on: Automri finished with too many resources on context_0
Single context flow failed: Recoverable single context error

Please advise!

And my personal favorite:

Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered

Sounds like the compiler is forced to a single context for compilation.

Add this line to your alls file:

context_switch_param(mode=enabled)

I did, still uses just two cores.

$ cat model-script.alls (hailo)
context_switch_param(mode=enabled)
$ hailo compiler --hw-arch hailo8 --model-script model-script.alls deeplabv3_resnet50_10_256_0_005_0_0001_export_optimized.har (hailo)

Now it worked about 15 minutes and failed with the following assert:

compiler: ../src/allocator/network_graph_appender.cpp:402: Status<network_graph::NetworkNode*> allocator::NetworkGraphAppender::AddShortcut(std::string, network_graph::NetworkNode&, std::vector<network_graph::NetworkNode*>): Assertion `src_node.output_format() == (*first_succ)->input_format()’ failed.

The alls line has nothing to do with compilation speed. It simply allows one to split the model into multiple contexts so that the Hailo chip can execute each context sequentially in chunks. The error message that you posted earlier suggested that the compiler was trying to fit everything into one context.

The compilation time if there is no finetuning/adaround (these two optimizations should use GPU, everything isn’t necessary) is dominated by LP/MIP problems. The hailo compiler looks like it is using COIN-OR, which needs to be specifically built to support multithreading. Even then, multithreading is not guaranteed to accelerate MIP solving, depending on the problem itself.

Anyways, you need to post more info on what you’re doing or the onnx itself. There’s not enough information to replicate or help.

See,

Network Graph Compilation Failure - #3 by omria for the same exception you just posted.