I have a quick question regarding the Hailo-8, and I apologize if it’s a naive one as I don’t have a deep understanding of NPU principles.
If I activate the exact same HEF file (e.g., yolov7.hef) on the Hailo-8, and then write the same input data to the device, should I always get the exact same output?
I’ve tested this using yolov7.hef & vstreams API and noticed that the outputs vary. Is this expected behavior?
Is this variability a characteristic of the Hailo hardware itself, or is it due to the use of vstreams API which inclues the conversion between Float and UINT8, and NMS? Would using the streams API (instead of vstreams) guarantee consistent results for the same input?
Great questions! You’re absolutely right to be curious about this - it’s not naive at all, and understanding determinism in AI hardware is important.
To answer your questions directly:
Yes, you should get identical outputs - but only if you’re looking at the raw hardware output. The Hailo-8 accelerator itself is completely deterministic. Give it the same HEF and identical input tensors, and the silicon will produce bit-for-bit identical results every single time.
The variability is coming from vstreams, not the hardware. You’ve hit the nail on the head with your second question. The inconsistency you’re seeing is entirely due to the vstreams API’s convenience features.
Here’s what’s happening under the hood:
vstreams adds several processing layers that introduce variability:
Quantization/dequantization: Converting between your float inputs and the device’s fixed-point format involves rounding operations
NMS processing: Non-Max Suppression runs on the CPU and can handle boxes with identical confidence scores in slightly different orders
Multi-threaded I/O: Background threading can affect processing order
If you need perfect repeatability:
Switch to the raw streams API - this gives you direct access to the hardware’s fixed-point output with zero post-processing
Handle quantization/dequantization yourself using InputTransformContext/OutputTransformContext
You’ll get bit-exact results every time
If you want to stick with vstreams:
Use the tf_nms_format=True flag on your OutputVStream to get consistent NMS ordering
Accept that tiny floating-point rounding differences are normal
The hardware itself is rock solid - all the variability comes from the software convenience layers. Raw streams will give you the deterministic behavior you’re looking for!