SW is shuting down due to high temperature

Hi,
Our team testing two RTSP streams with Hailo-8:
./tappas/apps/gstreamer/general/multistream_detection/multi_stream_detection_rtsp.sh

After a while it stops running with the below error:

Setting pipeline to PLAYING …
New clock: GstSystemClock
Progress: (request) Sending PLAY request
Progress: (request) Sending PLAY request
Progress: (request) Sending PLAY request
Progress: (open) Opened Stream
Progress: (request) Sending PLAY request
Progress: (request) Sent PLAY request
Progress: (request) Sent PLAY request
Redistribute latency…
[HailoRT] [warning] Got health monitor notification - temperature reached orange zone. sensor id=0, TS00=103.98878c, TS01=103.634674c
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 400000000 to 350000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 350000000 to 300000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 300000000 to 250000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 250000000 to 200000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 200000000 to 250000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 250000000 to 200000000
[HailoRT] [critical] Got health monitor closed streams notification. temperature: TS00=119.97402 c, TS01=119.265816 c, inputs bitfield:1, outputs bitfield:0
[HailoRT] [critical] Channel 0:2 was aborted by an external source!
[HailoRT] [error] CHECK_SUCCESS_AS_EXPECTED failed with status=HAILO_STREAM_ABORTED(62)
[HailoRT] [error] CHECK_EXPECTED failed with status=HAILO_STREAM_ABORTED(62)
ERROR: from element /GstPipeline:pipeline0/GstHailoNet:hailonet0/GstHailoSend:hailosend: Failed writing to input vstream yolov5m_wo_spp_60p/input_layer1, status = 62

What can I do to fix this?

Hi,
This is a thermal issue, where the device is overheated to the pointed of a shutdown.
The first indication of overheat is this line:

[HailoRT] [warning] Got health monitor notification - temperature reached orange zone. sensor id=0, TS00=103.98878c, TS01=103.634674c

THe Orange zone meaning is that unless that device would redice heat, the device throttling will kick into action. You can see that the SW is reving down the clock in these lines:

[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 400000000 to 350000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 350000000 to 300000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 300000000 to 250000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 250000000 to 200000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 200000000 to 250000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 250000000 to 200000000

Next, if this isn’t sufficient, and the device continues to overheat, the device would shutdown the NN core. This is done in order to not harm the device. You can see the indication in these lines:

[HailoRT] [critical] Got health monitor closed streams notification. temperature: TS00=119.97402 c, TS01=119.265816 c, inputs bitfield:1, outputs bitfield:0
[HailoRT] [critical] Channel 0:2 was aborted by an external source!

To fix this issue, you need to address the thermal guidelines of the selected platform, you do not have such you can refer to the general thermal design consideration document.