Hi all,
I downloaded Qwen3-VL-2B-Instruct.hef from the Hailo Model Explorer after seeing it announced in the 2026-04 suite release, but I cannot get inference to work on my setup. Qwen2-VL-2B-Instruct works perfectly on the same hardware and runtime, so I am fairly confident this is specific to Qwen3-VL.
Setup:
-
Raspberry Pi 5 with Hailo AI HAT+ 2 (Hailo-10H)
-
HailoRT 5.3.0, hailort-pcie-driver 5.3.0, hailo-tappas-core 5.3.0
-
hailo-gen-ai-model-zoo 5.3.0, hailo-apps 26.3.0
-
PyHailoRT installed via hailort-5.3.0-cp313-cp313-linux_aarch64.whl
-
Python 3.13, Debian 13 Trixie, kernel 6.12.75+rpt-rpi-2712 aarch64
The error: The model loads fine — VLM object is created and HEF chunks are sent to the server — but every call to generate_all() fails immediately:
[vlm.cpp:129] [create_unique] Sending 6 HEF chunks to server
[vlm.cpp:684] [generate_impl] CHECK_SUCCESS failed with status=HAILO_INVALID_OPERATION(6) - Failed to generate. Make sure the input data matches what the model expects and there is no other generation in progress
What I tested:
Minimal prompt with no system role:
python
prompt = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "What is in this image?"}
]}
]
response = vlm.generate_all(prompt=prompt, frames=[image], temperature=0.1, seed=42, max_generated_tokens=100)
With system role:
python
prompt = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "What is in this image?"}
]}
]
Both fail with the same error. Image preprocessing is 336x336 RGB uint8 which is identical to what works for Qwen2-VL.
What I ruled out:
-
File is not corrupt — 3.0GB file, sizes match between download and models folder, HEF loads successfully
-
Not a prompt format issue — even the most minimal possible prompt fails
-
Not an image size/format issue — same preprocessing that works for Qwen2-VL
-
Not a state/context issue — fresh VDevice and VLM instance per run
-
Not a path issue — absolute path confirmed correct
Observations:
-
Qwen3-VL is not referenced anywhere in hailo-apps 26.3.0 codebase
-
resources_config.yaml has no entry for Qwen3-VL
-
Qwen2-VL (2.2GB) works perfectly, Qwen3-VL (3.0GB) fails every time on same runtime
Question: Does Qwen3-VL require a different API call, input format, tokenizer configuration, or preprocessing compared to Qwen2-VL? Is there something not yet exposed in the public Python GenAI API in 5.3.0 that Qwen3-VL depends on? Or is this model simply not ready for public use yet despite being available on the Model Explorer?
Any guidance appreciated. Happy to provide additional logs if needed.