[HailoRT] Got CPU ECC Fatal Event During Stress Test on Hailo-8L with Raspberry Pi 5

Hello Hailo Community,

I encountered a critical issue while running a stress test on the Hailo-8L processor with a Raspberry Pi 5 using the 57.hef AI model. The error message I received is as follows:

[HailoRT] [critical] Got health monitor CPU ECC fatal event. memory_bitmap=4096

Test Details:

  • Hardware:
    • Hailo-8L
    • Raspberry Pi 5
  • AI Model: 57.hef (default model provided by Hailo)
  • Command Used:
    hailortcli run /var/hailo_integration_tool/thermal/HAILO8L/57.hef -t 172800 --measure-temp
    

Questions:

  1. What does the memory_bitmap=4096 indicate in the context of the ECC fatal event?
  2. Is this issue indicative of a hardware fault, or could it be related to software/firmware?
  3. Are there recommended steps to debug or resolve this issue?
  4. Has anyone else encountered similar errors during stress tests, and if so, how were they mitigated?

I would greatly appreciate any guidance or insights from the community regarding this issue. If further details are needed, I am happy to provide them.

Thank you for your support!


@omria any update on this issue

@nina-vilela @Nadav @Omer can anyone please give some update on this? It been a week

I’ve also experienced this problem when the system overheating to temperatures around 103-104 degrees Celsius. I’m using the Hailo-8L with a Raspberry Pi 5. Has anyone found a solution to this issue? @Omer can you suggest some solution for this

Hi @prasaanthg, @manikantaj, we are checking on this.

1 Like

Do you have a fan running? What is the CPU temperature when this occurs? For the record, I’m running object detection continuously for months now (surveillance) on a Rpi5 + M2 Hailo8L and I’ve never seen this. But my CPU load is quite low, since all the object detection is being done by the Hailo. My CPU temp is around 50 degrees.