Hey @SAN
It sounds like you’re experiencing a discrepancy between the outputs generated by the .har file (used on Ubuntu with native and FP optimized SDKs) and the .hef file (used on the Hailo8L with the Raspberry Pi). This issue may stem from a few key factors related to model optimization, buffer size handling, and the nature of the .hef file conversion.
1. Difference Between .har and .hef Models
The .har file is a Hailo Architecture Runtime file used for inferencing on a host machine (like your Ubuntu system), whereas the .hef file is a Hailo Executable File compiled specifically for the Hailo hardware, such as the Hailo8L accelerator on the Raspberry Pi. The key difference lies in the hardware optimization during the compilation of the .hef file.
Does the model structure change when compiling to .hef?
Yes, when you compile a .har file to a .hef using the runner.compile() method, the model is optimized specifically for the Hailo hardware. This may involve:
- Quantization: If your model was floating-point-based during training, it may be quantized to INT8 when compiled to
.hef to run efficiently on Hailo’s architecture.
- Layer Fusion: Some layers may be fused or optimized during compilation, which could alter the structure of the model slightly.
- Memory Optimization: The memory handling and buffer sizes may be adjusted for efficiency on the Hailo chip.
These optimizations can lead to slight variations in the model’s behavior and output when compared to the original .har file.
2. Input Buffer Size Mismatch
The error you’re encountering regarding input buffer size:
[HailoRT] [error] CHECK failed - Input buffer size 9830400 is different than expected 1228800 for input ‘net_9conv/input_layer1’
This indicates that the expected buffer size for the input layer differs between your .har and .hef setups. The root cause here might be due to the data type and the quantization of the model.
- For
.har (native/FP), you’re likely using float32 (each element is 4 bytes).
- For
.hef, the model is likely quantized to UINT8, where each element is only 1 byte. This explains why the expected input buffer size is smaller for the .hef.
Solution:
To resolve the buffer size issue, ensure that:
- You preprocess the input for
.hef to be UINT8 instead of float32. In your preprocessing step, convert the input image data to uint8 before feeding it into the .hef model.
For example:
input_data = (image * 255).astype(np.uint8) # Assuming image is already normalized
3. Output Differences
The reason why the output differs between .har and .hef files is likely due to the quantization applied during the conversion to .hef. In quantized models, there is a known difference in output behavior compared to floating-point models due to precision loss during quantization.
- Floating-Point Precision (FP) models retain a higher level of accuracy but may be slower to run on hardware like Hailo8L.
- Quantized (INT8) models are faster but may result in slightly different outputs due to reduced precision.
You can try quantizing the .har model using the Hailo SDK and comparing its output to the .hef version to see if the differences persist.
4. Ensure Consistent Preprocessing
To minimize differences in output, ensure that you’re applying the same preprocessing steps for both .har and .hef models:
- Resize and normalize the images in the same way.
- Ensure that the input format (e.g., RGB/BGR) is consistent across both models.
Conclusion:
- Model Structure: The model structure can change when converting
.har to .hef due to quantization and hardware optimization.
- Input Buffer Size: Ensure that the input buffer is adjusted for the data type (uint8 for
.hef models). This should resolve the buffer size mismatch error.
- Output Differences: These are likely due to quantization effects. You may want to compare the quantized
.har model with the .hef output to understand how quantization affects your results.
Let me know if you need further assistance, and I’ll be happy to help!
Best regards,
Omri