HAILO_FW_CONTROL_FAILURE(18) error

I am currently using Infer() to run inference with an ML model. The model is executed periodically. Initially, the inference worked as expected, but when I run the inference at 20Hz for about 1 to 5 seconds (this duration varies significantly), the following error occurs.

[HailoRT] [error] CHECK failed - Failed in fw_control, errno:4
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_FW_CONTROL_FAILURE(18) - Failed to send fw control
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_FW_CONTROL_FAILURE(18)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_FW_CONTROL_FAILURE(18)
[HailoRT] [error] CHECK_SUCCESS failed with status=HAILO_FW_CONTROL_FAILURE(18) - Failed to reset context switch state machine
[HailoRT] [error] Failed deactivating core-op (status HAILO_FW_CONTROL_FAILURE(18))
[HailoRT] [error] Failed to deactivate low level streams with HAILO_FW_CONTROL_FAILURE(18)
[HailoRT] [error] Failed deactivating core-op (status HAILO_FW_CONTROL_FAILURE(18))
[HailoRT] [error] Failed deactivate HAILO_FW_CONTROL_FAILURE(18)

What could be causing this? I would appreciate your help. Thank you.

Here is some additional information. The OS version is 24.04, and the hardware is Hailo-8 M.2 key M. The model I am trying to run inference on is as mentioned here, openpilot’s supercombo.onnx.

Hey @koki.igari ,

It seems the issue you’re experiencing may be related to a timeout. The first error code you’re seeing suggests this could be the case.

If you’re running the inference process continuously at a higher rate, does the problem still occur? Adjusting the rate at which you run the inference might help pinpoint the root cause and resolve the timeout issue.

Hi @omria

Thank you for your response. Based on my investigation, I have obtained additional information to share.

First of all, the Hailo inference is currently integrated into the system and operates as one of multiple processes. When I stopped the other processes and ran only the inference process, no such errors occurred regardless of the inference cycle settings.

However, when I ran the inference together with the process that acquires images and sends them to the inference process via msgq, this error occurred. Upon checking the journal log, I found that a segmentation fault had occurred.

The cause has finally been identified. Due to implementation convenience, I created a wrapper class for inference and separated the initialization function and the inference function. The issue was that I called network_group.activate() during each inference.
By calling activate() during initialization only once, this error no longer occurred.