I am currently using Infer() to run inference with an ML model. The model is executed periodically. Initially, the inference worked as expected, but when I run the inference at 20Hz for about 1 to 5 seconds (this duration varies significantly), the following error occurs.
[HailoRT] [error] CHECK failed - Failed in fw_control, errno:4
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_FW_CONTROL_FAILURE(18) - Failed to send fw control
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_FW_CONTROL_FAILURE(18)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_FW_CONTROL_FAILURE(18)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_FW_CONTROL_FAILURE(18) - Failed to reset context switch state machine
[HailoRT] [error] Failed deactivating core-op (status HAILO_FW_CONTROL_FAILURE(18))
[HailoRT] [error] Failed to deactivate low level streams with HAILO_FW_CONTROL_FAILURE(18)
[HailoRT] [error] Failed deactivating core-op (status HAILO_FW_CONTROL_FAILURE(18))
[HailoRT] [error] Failed deactivate HAILO_FW_CONTROL_FAILURE(18)
What could be causing this? I would appreciate your help. Thank you.
Here is some additional information. The OS version is 24.04, and the hardware is Hailo-8 M.2 key M. The model I am trying to run inference on is as mentioned here, openpilot’s supercombo.onnx.
It seems the issue you’re experiencing may be related to a timeout. The first error code you’re seeing suggests this could be the case.
If you’re running the inference process continuously at a higher rate, does the problem still occur? Adjusting the rate at which you run the inference might help pinpoint the root cause and resolve the timeout issue.
Thank you for your response. Based on my investigation, I have obtained additional information to share.
First of all, the Hailo inference is currently integrated into the system and operates as one of multiple processes. When I stopped the other processes and ran only the inference process, no such errors occurred regardless of the inference cycle settings.
However, when I ran the inference together with the process that acquires images and sends them to the inference process via msgq, this error occurred. Upon checking the journal log, I found that a segmentation fault had occurred.
The cause has finally been identified. Due to implementation convenience, I created a wrapper class for inference and separated the initialization function and the inference function. The issue was that I called network_group.activate() during each inference.
By calling activate() during initialization only once, this error no longer occurred.