does output of the same input date vary?

candy24910 · June 17, 2025, 1:06am

Hello,

I have a quick question regarding the Hailo-8, and I apologize if it’s a naive one as I don’t have a deep understanding of NPU principles.

If I activate the exact same HEF file (e.g., yolov7.hef) on the Hailo-8, and then write the same input data to the device, should I always get the exact same output?
I’ve tested this using yolov7.hef & vstreams API and noticed that the outputs vary. Is this expected behavior?
Is this variability a characteristic of the Hailo hardware itself, or is it due to the use of vstreams API which inclues the conversion between Float and UINT8, and NMS? Would using the streams API (instead of vstreams) guarantee consistent results for the same input?

Thank you for your time and insights.

omria · June 22, 2025, 10:48am

Hey @candy24910 ,

Great questions! You’re absolutely right to be curious about this - it’s not naive at all, and understanding determinism in AI hardware is important.

To answer your questions directly:

Yes, you should get identical outputs - but only if you’re looking at the raw hardware output. The Hailo-8 accelerator itself is completely deterministic. Give it the same HEF and identical input tensors, and the silicon will produce bit-for-bit identical results every single time.
The variability is coming from vstreams, not the hardware. You’ve hit the nail on the head with your second question. The inconsistency you’re seeing is entirely due to the vstreams API’s convenience features.

Here’s what’s happening under the hood:

vstreams adds several processing layers that introduce variability:

Quantization/dequantization: Converting between your float inputs and the device’s fixed-point format involves rounding operations
NMS processing: Non-Max Suppression runs on the CPU and can handle boxes with identical confidence scores in slightly different orders
Multi-threaded I/O: Background threading can affect processing order

If you need perfect repeatability:

Switch to the raw streams API - this gives you direct access to the hardware’s fixed-point output with zero post-processing
Handle quantization/dequantization yourself using InputTransformContext/OutputTransformContext
You’ll get bit-exact results every time

If you want to stick with vstreams:

Use the tf_nms_format=True flag on your OutputVStream to get consistent NMS ordering
Accept that tiny floating-point rounding differences are normal

The hardware itself is rock solid - all the variability comes from the software convenience layers. Raw streams will give you the deterministic behavior you’re looking for!

Topic		Replies	Views
Quantized model is giving wrong output while running on the Hailo-8L chip General dfc , hailort	5	448	July 6, 2024
Cannot get correct out put for yolov5m from hailo_infer() API General	5	161	August 9, 2024
Inference output dtype leads to different results General hailo8	6	176	December 4, 2024
Model inference on Hailo-8 vs Hailo-8L General	4	961	August 29, 2024
The given output format type UINT8 is not supported, should be HAILO_FORMAT_TYPE_FLOAT32 General hailo8 , yolov5	5	179	March 5, 2025

does output of the same input date vary?

Related topics