We are running the Hailo8 M.2 chip in our camera, and during stress testing, stressing CPU, GPU, Camera capture, and the Hailo8, we sometimes see the messages below in dmesg.
It even seems like these message occur more frequently after we also started doing power/temperature measurements for the Hailo8. We’ve mostly run with v4.15, but we’ve also tested with the latest driver + firmware.
This is not something we see for other PCIe devices (eg. our sensor device), which we also stress.
Should we be worried about these messages? Do they indicate some issue with our setup? Or are they safe to ignore?
Thanks for reaching out about the PCIe errors you’re seeing with your Hailo-8 M.2! First, I need to ask - do you have a heatsink or thermal pad attached to the Hailo? If not, please add one as this is important for proper operation.
Regarding the dmesg errors you’re seeing - these are PCIe “RxErr” (receiver) errors that are being corrected automatically. While not critical, here’s what you should know:
These are normal corrected errors that don’t cause data loss
They may increase during heavy testing due to:
High system load/interference
Power fluctuations
Temperature changes
To minimize these errors:
Ensure you have latest Hailo drivers/firmware
Check cooling and ventilation
Verify stable power supply
Since these are corrected errors, they’re generally safe to ignore unless you notice performance issues. Let me know if you need any other guidance!