Qwen3-VL-2B-Instruct.hef fails with HAILO_INVALID_OPERATION(6) on HailoRT 5.3.0 — generate_impl fails despite successful model load

Hi all,

I downloaded Qwen3-VL-2B-Instruct.hef from the Hailo Model Explorer after seeing it announced in the 2026-04 suite release, but I cannot get inference to work on my setup. Qwen2-VL-2B-Instruct works perfectly on the same hardware and runtime, so I am fairly confident this is specific to Qwen3-VL.

Setup:

  • Raspberry Pi 5 with Hailo AI HAT+ 2 (Hailo-10H)

  • HailoRT 5.3.0, hailort-pcie-driver 5.3.0, hailo-tappas-core 5.3.0

  • hailo-gen-ai-model-zoo 5.3.0, hailo-apps 26.3.0

  • PyHailoRT installed via hailort-5.3.0-cp313-cp313-linux_aarch64.whl

  • Python 3.13, Debian 13 Trixie, kernel 6.12.75+rpt-rpi-2712 aarch64

The error: The model loads fine — VLM object is created and HEF chunks are sent to the server — but every call to generate_all() fails immediately:

[vlm.cpp:129] [create_unique] Sending 6 HEF chunks to server
[vlm.cpp:684] [generate_impl] CHECK_SUCCESS failed with status=HAILO_INVALID_OPERATION(6) - Failed to generate. Make sure the input data matches what the model expects and there is no other generation in progress

What I tested:

Minimal prompt with no system role:

python

prompt = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "What is in this image?"}
    ]}
]
response = vlm.generate_all(prompt=prompt, frames=[image], temperature=0.1, seed=42, max_generated_tokens=100)

With system role:

python

prompt = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "What is in this image?"}
    ]}
]

Both fail with the same error. Image preprocessing is 336x336 RGB uint8 which is identical to what works for Qwen2-VL.

What I ruled out:

  • File is not corrupt — 3.0GB file, sizes match between download and models folder, HEF loads successfully

  • Not a prompt format issue — even the most minimal possible prompt fails

  • Not an image size/format issue — same preprocessing that works for Qwen2-VL

  • Not a state/context issue — fresh VDevice and VLM instance per run

  • Not a path issue — absolute path confirmed correct

Observations:

  • Qwen3-VL is not referenced anywhere in hailo-apps 26.3.0 codebase

  • resources_config.yaml has no entry for Qwen3-VL

  • Qwen2-VL (2.2GB) works perfectly, Qwen3-VL (3.0GB) fails every time on same runtime

Question: Does Qwen3-VL require a different API call, input format, tokenizer configuration, or preprocessing compared to Qwen2-VL? Is there something not yet exposed in the public Python GenAI API in 5.3.0 that Qwen3-VL depends on? Or is this model simply not ready for public use yet despite being available on the Model Explorer?

Any guidance appreciated. Happy to provide additional logs if needed.