Hailort driver fails after few minutes

Hello,

When performing inference using the Hailo8, the process terminates after a certain period due to a driver error.

[ 699.456865] hailo 0000:01:00.0: Failed launch transfer -6
[ 699.468962] Channel 2 state out of sync. num available is 2954, expected 3441
[ 699.476112] hailo 0000:01:00.0: Failed launch transfer -14
[ 699.483156] Channel 2 state out of sync. num available is 2954, expected 3922
[ 699.490335] hailo 0000:01:00.0: Failed launch transfer -14
[ 699.496941] Channel 2 state out of sync. num available is 2954, expected 4403
[ 699.504125] hailo 0000:01:00.0: Failed launch transfer -14
[ 699.510640] Channel 2 state out of sync. num available is 2954, expected 4884
[ 699.517773] hailo 0000:01:00.0: Failed launch transfer -14

I am conducting inference tests using YOLOv7-Tiny with the Hailo8 connected to the PCI slot of the Novatek EVM. The Hailo driver is being used after being compiled directly from the source code obtained from Git. The above error occurs within approximately 30 minutes. Could you possibly identify the cause of this issue?

@jspark I passed this issue to our RnD.

Also, can you check if there are message in sudo dmesg right before these failures?

No. There were no particular messages before this error occurred. The application left the following message.

Kernel dmesg

[19514.944843] hailo 0000:01:00.0: Failed launch transfer -6
[19514.966797] Channel 3 state out of sync. num available is 3108, expected 3355
[19514.973977] hailo 0000:01:00.0: Failed launch transfer -14
[19514.985156] Channel 3 state out of sync. num available is 3108, expected 3596
[19514.992299] hailo 0000:01:00.0: Failed launch transfer -14
[19515.020891] Channel 3 state out of sync. num available is 3108, expected 3837
[19515.028092] hailo 0000:01:00.0: Failed launch transfer -14
[19515.035379] Channel 3 state out of sync. num available is 3108, expected 4078
[19515.042517] hailo 0000:01:00.0: Failed launch transfer -14
[19515.049642] Channel 3 state out of sync. num available is 3108, expected 4319
[19515.056810] hailo 0000:01:00.0: Failed launch transfer -14
[19515.063927] Channel 3 state out of sync. num available is 3108, expected 4560
[19515.071064] hailo 0000:01:00.0: Failed launch transfer -14
[19515.081639] Channel 3 state out of sync. num available is 3108, expected 4801
[19515.088929] hailo 0000:01:00.0: Failed launch transfer -14
[19515.111438] potentially unexpected fatal signal 6.
[19515.116311] CPU: 2 PID: 544 Comm: vstream_async_d Tainted: P B O 4.19.148 #25
[19515.124586] Hardware name: Novatek NA51090 (DT)
[19515.129112] pstate: 00000000 (nzcv daif -PAN -UAO)
[19515.133900] pc : 0000007fa0bc9290
[19515.137206] lr : 0000007fa0bc9220
[19515.140507] sp : 0000007fd8e06ac0
[19515.143857] x29: 0000007fd8e06ac0 x28: 0000000000000000
[19515.149157] x27: 0000000000000000 x26: 0000000000000000
[19515.154463] x25: 0000000023557ee0 x24: 0000000000000000
[19515.159762] x23: 0000000000000024 x22: 00000000235580d0
[19515.165098] x21: 0000000000462d38 x20: 0000007fa2835010
[19515.170399] x19: 0000000000000006 x18: 0000000000000001
[19515.175717] x17: 0000007fa0f1f430 x16: 0000007fa0bb6918
[19515.181016] x15: 000000006474e550 x14: 0000000000000000
[19515.186329] x13: ffffffffffffffff x12: ffffffffffffffff
[19515.191627] x11: ffffffffffffffff x10: ffffffffffffffff
[19515.196936] x9 : ffffffffffffffff x8 : 0000000000000087
[19515.202233] x7 : ffffffffffffffff x6 : ffffffffffffffff
[19515.207542] x5 : 0000007fd8e06ae0 x4 : 0000000000000000
[19515.212841] x3 : 0000000000000008 x2 : 0000000000000000
[19515.218149] x1 : 0000007fd8e06ae0 x0 : 0000000000000000

Application Error Log

[HailoRT] [error] CHECK failed - Failed launch transfer errno: 6
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Infer request callback failed with status = HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Non-recoverable Async Infer Pipeline error. status error code: HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Shutting down the pipeline with status HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36) - Can’t handle infer request since Pipeline status is HAILO_DRIVER_FAIL(36).
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK failed - Failed launch transfer errno: 14
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Infer request callback failed with status = HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK failed - Failed launch transfer errno: 14
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Infer request callback failed with status = HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK failed - Failed launch transfer errno: 14
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Infer request callback failed with status = HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK failed - Failed launch transfer errno: 14
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Infer request callback failed with status = HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK failed - Failed launch transfer errno: 14
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Infer request callback failed with status = HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK failed - Failed launch transfer errno: 14
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Infer request callback failed with status = HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK failed - Failed launch transfer errno: 14
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)
[HailoRT] [error] Infer request callback failed with status = HAILO_DRIVER_FAIL(36)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_DRIVER_FAIL(36)

It looks like something is not right on the exceptions between the hailo driver and the kernel. Does that happen on every net or ontly the tinyv7?

No, I haven’t tested other networks. To be precise, the network in question is a modified YOLOv7-Tiny where the activation function was changed to SiLU and trained with 1 class. We tested a hef file that included NV12 to RGB conversion preprocessing, which resulted in issues. I plan to test the model again after removing the NV12 to RGB conversion.