Hailo8 PCIe error corrected

Hi

We are running the Hailo8 M.2 chip in our camera, and during stress testing, stressing CPU, GPU, Camera capture, and the Hailo8, we sometimes see the messages below in dmesg.

[ 3023.174299] pcieport 0000:00:01.5: AER: Corrected error received: 0000:05:00.0
[ 3023.181596] hailo 0000:05:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[ 3023.190945] hailo 0000:05:00.0:   device [1e60:2864] error status/mask=00000001/00006000
[ 3023.199055] hailo 0000:05:00.0:    [ 0] RxErr                  (First)

It even seems like these message occur more frequently after we also started doing power/temperature measurements for the Hailo8. We’ve mostly run with v4.15, but we’ve also tested with the latest driver + firmware.

This is not something we see for other PCIe devices (eg. our sensor device), which we also stress.

Should we be worried about these messages? Do they indicate some issue with our setup? Or are they safe to ignore?

Hey @dlp

Welcome to the Hailo Community!

Thanks for reaching out about the PCIe errors you’re seeing with your Hailo-8 M.2! First, I need to ask - do you have a heatsink or thermal pad attached to the Hailo? If not, please add one as this is important for proper operation.

Regarding the dmesg errors you’re seeing - these are PCIe “RxErr” (receiver) errors that are being corrected automatically. While not critical, here’s what you should know:

  1. These are normal corrected errors that don’t cause data loss
  2. They may increase during heavy testing due to:
    • High system load/interference
    • Power fluctuations
    • Temperature changes

To minimize these errors:

  • Ensure you have latest Hailo drivers/firmware
  • Check cooling and ventilation
  • Verify stable power supply

Since these are corrected errors, they’re generally safe to ignore unless you notice performance issues. Let me know if you need any other guidance!

Hi @omria

Thanks for the quick reply.

We do have a heatsink, and the temperature maxes out at around 34c in my tests. But yes the system load is high.

I’ll probably try to see if I can reproduce it on another platform at some point. But for now we’ll just ignore these error corrections.

1 Like