Hi,
Our team testing two RTSP streams with Hailo-8: ./tappas/apps/gstreamer/general/multistream_detection/multi_stream_detection_rtsp.sh
After a while it stops running with the below error:
Setting pipeline to PLAYING …
New clock: GstSystemClock
Progress: (request) Sending PLAY request
Progress: (request) Sending PLAY request
Progress: (request) Sending PLAY request
Progress: (open) Opened Stream
Progress: (request) Sending PLAY request
Progress: (request) Sent PLAY request
Progress: (request) Sent PLAY request
Redistribute latency…
[HailoRT] [warning] Got health monitor notification - temperature reached orange zone. sensor id=0, TS00=103.98878c, TS01=103.634674c
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 400000000 to 350000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 350000000 to 300000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 300000000 to 250000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 250000000 to 200000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 200000000 to 250000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 250000000 to 200000000
[HailoRT] [critical] Got health monitor closed streams notification. temperature: TS00=119.97402 c, TS01=119.265816 c, inputs bitfield:1, outputs bitfield:0
[HailoRT] [critical] Channel 0:2 was aborted by an external source!
[HailoRT] [error] CHECK_SUCCESS_AS_EXPECTED failed with status=HAILO_STREAM_ABORTED(62)
[HailoRT] [error] CHECK_EXPECTED failed with status=HAILO_STREAM_ABORTED(62)
ERROR: from element /GstPipeline:pipeline0/GstHailoNet:hailonet0/GstHailoSend:hailosend: Failed writing to input vstream yolov5m_wo_spp_60p/input_layer1, status = 62
Hi,
This is a thermal issue, where the device is overheated to the pointed of a shutdown.
The first indication of overheat is this line:
[HailoRT] [warning] Got health monitor notification - temperature reached orange zone. sensor id=0, TS00=103.98878c, TS01=103.634674c
THe Orange zone meaning is that unless that device would redice heat, the device throttling will kick into action. You can see that the SW is reving down the clock in these lines:
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 400000000 to 350000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 350000000 to 300000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 300000000 to 250000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 250000000 to 200000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 200000000 to 250000000
[HailoRT] [warning] Got health monitor notification - System’s clock has been changed from 250000000 to 200000000
Next, if this isn’t sufficient, and the device continues to overheat, the device would shutdown the NN core. This is done in order to not harm the device. You can see the indication in these lines:
[HailoRT] [critical] Got health monitor closed streams notification. temperature: TS00=119.97402 c, TS01=119.265816 c, inputs bitfield:1, outputs bitfield:0
[HailoRT] [critical] Channel 0:2 was aborted by an external source!
To fix this issue, you need to address the thermal guidelines of the selected platform, you do not have such you can refer to the general thermal design consideration document.
when I am doing stress test on multi hailo-8 chips, it has the same thermal issue. But the error report didn’t indicate the process of reving down the clock and just killed the kernel.
Hi @1042958010,
The throttling mechanism is implemented in the FW of each device, and as a result it’s unaware of additional Hailo-8 connected on the same system.
A scenarion that might have happened, that, the thermals are no handled and that the temperature increase is so rapid that the throttling is skipped and it is going straight to shut-off.