I am just curious if anyone has successfully compiled qwen 3 4B to .hef format. I have been working on it but since it is not a supported model and running into a lot of issues with the DFC converting it. Seems like there is still a lot of assumptions on models being vision centric instead of genAI models on the onnx_graph.py. I picked qwen 3 4B mostly because I wanted to train in agentic capabilities into it for a robot. If there is a better model for this I am also interested in suggestions. Mainly my goal is to allow the LLM to respond with agentic commands to sensor input.
Hi @Allan ,
Please take a look here - maybe it would be helpful: hailo-apps/hailo_apps/python/gen_ai_apps/agent_tools_example at main · hailo-ai/hailo-apps · GitHub
Thanks, I have that fully working with the Qwen 2.5 model. Which is great. I was just working to customize a Qwen 3 4B model which I am aware is not yet supported. Mostly because it is a better model for agentic capabilities. Since I was adding in my own custom skills I was using LORA to train a Qwen 3 4B model. You then have to quantatize it down to a HEF model for the Hailo 10h which it appears the Dataflow Compiler SDK is not updated to handle the newer model yet. I was just curious if anyone else had tried to make it work. I am in the final stages of getting the HAR file from the 17GB ONNX file. So we will see if I can get there. I was just curious if anyone else was trying something like that.
I have been pushing for this too. The 1B and 1.5B supported models currently available are simply not very accurate. A Raspberry Pi 5 can easily do 3B, 5B, and even 7B models depending on your RAM amount.
I understand the need to test and validate, but it is taking a long time. Not sure I get why only small models are supported by HailoRT when the 10h chip should handle bigger models.
Going through and doing Monkey patches and updating the SDK locally I can tell the newer Qwen 3 models and probably others require a lot of changes to their SDK. So I can understand it takes time especially when the DFC SDK initially was primarily focused on vision centric models. The GenAI models require updated math functions. I was just curious if others tinkered beyond what was supported. The Hailo 10h on a Raspberry Pi 5 16GB is really responsive and as you said it has the headroom to handle bigger models. Even with both a Qwen 2.5 and whisper model loaded on the Hailo it works extremely well. It would be nice if a roadmap and timeline was published though for new models and which ones they were targeting.
I think I came to my conclusion that until hailomz support qwen 3 or genAI models custom training them it isn’t realistic. As all the tools currently in place from the SDK and the GenAI Zoo do not support custom training GenAI models. No amount of tinkering around with the DFC SDK is going to work until Hailo releases updated tools. Someone can correct me if I am wrong. but I currently don’t see a solution around with the toolsets provided. You can custom train vision models though.
I have the qwen 2.5 2B vlm working great.
I was really wondering what models does pi5 support with hailo 10 hat. Bout haw manny pipelines caz i have 2 cameras running yolo8m and yolo 6n and i whant to run qwen 2.5 2b
Our R&D is working on new support and models for our Hailo accelerators. We cannot share anything publicly yet.
We appreciate your patience and enthusiasm, and we will share more information as soon as we are ready.
Stay tuned!
From what I have found here Hailo GenAI Zoo and the Hailo model explorer it is currently the Qwen 2.5 1.5B models that are supported plus Deepseek R1 1.5B. You can download the HEF file from the model explorer. I haven’t done YOLO , GenAI and whisper together. I think it can be done depending on how big the models are and if you are running the pipelines concurrently. For me whisper is only needed during the asking phase and Qwen is not actively processing tokens and vice versa. So there isn’t a lot of competition for resourced.
I run qwen vlm 2B on hailo 10, works great
I.m not sure you are correct, haw about the time it taks to answer. I think the 2B is best.
I Am running qwen 2.5vlm 2B and i see that there are some limitations fron runing sripts on thonny. In bash works great. Also have anu ideea of how do i geat yolo v8m vreaming on web?