Same input buffer + same HEF → output changes between runs (determinism issue?) Hailo8 + CM4 + InferModel API

Hi Hailo community,
I’m seeing what looks like non-deterministic inference output on Hailo8 when running the exact same input multiple times.

Environment

  • Device: Hailo-8

  • Platform: Raspberry Pi CM4 / Linux

  • API: HailoRT C++ InferModel + ConfiguredInferModel

  • Batch size: 1

  • Inference mode: synchronous (configured_model->run())

What I Already Tried

Switched from async (run_async) to sync (configured_model->run())
Ensured batch=1 and same binding is used
Removed preprocessing completely, forcing the same static input buffer
Increased timeout
Verified model works normally on other pipelines / sensors (issue appears only in this specific case)

Questions

  1. Is deterministic output guaranteed for Hailo8 for identical input (same bytes)?

  2. Could this be related to buffer allocation method (mmap + MemoryView) or cache/DMA coherency on ARM (CM4)?

  3. Should I always use DmaMappedBuffer / VStreams instead of raw mmap buffers?

  4. Any known issues if requesting output as FLOAT32 when HEF output is UINT8 (format conversion path)?

Hi @gje_gje,

This shouldn’t be happening and suggests something is off. I’d look closely at the format conversion. How did you do it? Did you set the network output to float32?

Yes, I am explicitly setting the output format to FLOAT32:

for (const auto &output_name : infer_model->get_output_names())
infer_model->output(output_name)->set_format_type(HAILO_FORMAT_TYPE_FLOAT32);

This exact same inference code (InferModel API, synchronous run(), same HEF, same post-processing) works correctly and deterministically on multiple other edge systems we use.
The issue only appears on one specific platform / setup.

That’s why I suspect this might be related to a platform-specific behavior, rather than a general misuse of the API.

That definitely should not be happening. Could you please share some details on the setup?