Async_detection_inference example with v4l2 camera input

ksj · November 11, 2024, 8:18am

So, I’m trying to run async_detection_inference on

To be more specific, our own hardware platform supports DAM buffer export on v4l2 devices and it returns virtual address of void *. So, what I am trying to do is hooking that up -set_buffer(MemoryView()) with the camera image data- with the application to run it and I’m having some setbacks.

In the example code, it declares frame_count amount of frames_promises and frames_futures so that each frame has promises and futures.

In case of v4l2 devices -camera for instance- there is no set amount of frames to run the application. So, I decided to declared the std::vector variables just with 1 element each and
when it comes down to

frames_promises[i].set_value(org_frame);

in the hailo_status run_preprocess(), following error has been occurred.

Obviously, my approach of having just 1 size of promise and future has been failed.

So my question comes down to how should I modify your example to run with indefinite amount of real time frames -such as v4l2 camera images-?

Is there any other point in the code that I should look into more closely?

ksj · November 11, 2024, 8:19am

By the way, DAM is typo.
I meant, DMA.

ksj · November 11, 2024, 8:25am

Here is the snippet codes that I modified

modified code snippet:

original code snippet

By the way, your example automatically sets frame_size to -1 if the input path has not been given. Which does not make sense to my setting

modified:

original:

modified:

original:

omria · November 11, 2024, 11:44am

Hey @ksj,

To modify the async_detection_inference example to work with a V4L2 camera that supports DMA buffer export and to allow indefinite frame capture, here are the recommended adjustments. These changes ensure that frames are handled dynamically without a pre-set frame count and adapt the promises and futures logic accordingly.

Key Adjustments

Dynamic Frame Handling:
- Since V4L2 doesn’t have a fixed frame count, avoid setting a predefined size for frames_promises and frames_futures. Instead, manage frames dynamically by creating a new promise and future for each frame as it is captured, which allows for flexible frame handling without limiting the frame count.

Setting Up Promises and Futures for Each Frame:

Use a loop to handle promises and futures dynamically for each incoming frame. Here’s an example structure to manage this:

std::vector<std::promise<cv::Mat>> frames_promises;
std::vector<std::future<cv::Mat>> frames_futures;

while (capture_frames) {  // Main loop for continuous frame capture
    frames_promises.emplace_back(); // Create a new promise for each frame
    frames_futures.push_back(frames_promises.back().get_future()); // Get the corresponding future
    
    // Capture frame (DMA buffer from V4L2 device)
    cv::Mat frame = capture_from_camera();  // Capture the frame using your camera API

    frames_promises.back().set_value(frame); // Set the captured frame

    // Process the frame using Hailo's inference functions
    auto preprocessed_image = frames_futures.back().get();
    process_frame(preprocessed_image); // Run inference

    frames_promises.pop_back(); // Remove processed promise to prevent overflow
    frames_futures.pop_back();   // Remove processed future to prevent overflow
}

Using DMA Address in Buffer Setup:
- In the code, ensure that set_buffer(MemoryView()) receives the DMA buffer’s virtual address. Replace references to frame data with buf_vaddr, the buffer’s address from the camera’s DMA:
```
auto status = bindings->input(input_name)->set_buffer(MemoryView(reinterpret_cast<void*>(d_buf->buf_vaddr), input_frame_size));
```
Modify run_preprocess() for Continuous Frames:
- Adjust the run_preprocess() function to handle frames in a loop, removing any checks or conditions related to a fixed frame count. This will support a continuously running application.
Avoid Overflow with Promises/Futures:
- To prevent memory overflow, pop the processed frames out of frames_promises and frames_futures after each processing loop. This approach keeps memory usage stable by removing old frames as new ones are added.

Summary of Changes

Dynamically create frames_promises and frames_futures as new frames arrive.
Pass the DMA address (buf_vaddr) to set_buffer(MemoryView()).
Adjust run_preprocess() to handle frames continuously without a fixed count.
Clear processed promises and futures from the vectors to prevent memory overflow.

With these changes, your async_detection_inference setup should be able to run continuously with your V4L2 camera, processing frames indefinitely. Let me know if there are further questions or specific issues with the implementation!

ksj · November 12, 2024, 12:10am

There are several points I would like to make.

First of all,
you proposed the scheme to pop back and forth the vector everytime I get an image frame. That would certainly avoid the error, but the whole point of async API is set_value on the promise and get the data are called on different thread. Therefore, if I get_future on one thread, that future would be useless to get on the actual inference thread, as the original code states. I need a better way to tackle this problem other than using std::vector or maybe using something other than std::promise and std::future.

Second of all,
as I stated, the final image data that I get my hands on is void *. Which means I don’t have to declare -if I really have to- my promise and future with cv::Mat.

omria · November 12, 2024, 1:41am

Hey @ksj,

Sorry if I didn’t catch what you meant exactly.

So, here’s how the example app works when using a camera as the input:

    cv::VideoCapture capture;
    double frame_count;
    if (input_path.empty()) {
        capture.open(0, cv::CAP_ANY);
        if (!capture.isOpened()) {
            throw "Error in camera input";
        }
        frame_count = -1;
    }
    else {
        capture.open(input_path, cv::CAP_ANY);
        if (!capture.isOpened()){
            throw "Error when reading video";
        }
        if (!image_num.empty()){
            if (input_path.find(".avi") == std::string::npos && input_path.find(".mp4") == std::string::npos){
                frame_count = std::stoi(image_num);
            }
            else {
                frame_count = capture.get(cv::CAP_PROP_FRAME_COUNT);
                image_num = "";
            }
        }
        else {
            frame_count = capture.get(cv::CAP_PROP_FRAME_COUNT);
        }
    }

    double org_height = capture.get(cv::CAP_PROP_FRAME_HEIGHT);
    double org_width = capture.get(cv::CAP_PROP_FRAME_WIDTH);

    capture.release();

In this setup, if frame_count is set to -1, the app will keep processing frames indefinitely when it’s using a camera input (or if the input is empty, it just assumes it’s a camera). Basically, it captures each frame one at a time and treats them as single images, so there’s no “video” handling here. To keep it running indefinitely, just make sure frame_count is -1.

To get images from the DMA buffer, you can tweak the code like this:

        // Get DMA buffer pointer directly
        void* dma_buffer = frames_futures[i].get();

        // Set input from DMA buffer
        for (const auto &input_name : infer_model->get_input_names()) {
            size_t input_frame_size = infer_model->input(input_name)->get_frame_size();
            auto status = bindings->input(input_name)->set_buffer(MemoryView(dma_buffer, input_frame_size));
            if (HAILO_SUCCESS != status) {
                std::cerr << "Failed to set infer input buffer, status = " << status << std::endl;
                return status;
            }
        }

 std::vector<std::promise<void*>>& frames_promises,    // Changed to void* for DMA
 std::vector<std::future<void*>>& frames_futures,      // Changed to void* for DMA

Alternatively:

Here’s an untested bit of code that skips the whole “futures and promises” part to work directly with the DMA buffer, which might simplify things:

  while (true) {
        try {
            // Set input buffer using DMA buffer directly
            for (const auto &input_name : infer_model->get_input_names()) {
                size_t input_frame_size = infer_model->input(input_name)->get_frame_size();
                auto status = bindings->input(input_name)->set_buffer(MemoryView(dma_buffer, input_frame_size));
                if (HAILO_SUCCESS != status) {
                    std::cerr << "Failed to set infer input buffer, status = " << status << std::endl;
                    return status;
                }
                
                // Store DMA buffer pointer in guards
                input_buffer_guards.push_back(std::make_shared<void*>(dma_buffer));
            }

            // Handle output buffers
            std::vector<std::pair<uint8_t*, hailo_vstream_info_t>> output_data_and_infos;
            for (const auto &output_name : output_names) {
                size_t output_frame_size = infer_model->output(output_name)->get_frame_size();
                output_buffer = page_aligned_alloc(output_frame_size);
                
                auto status = bindings->output(output_name)->set_buffer(MemoryView(output_buffer.get(), output_frame_size));
                if (HAILO_SUCCESS != status) {
                    std::cerr << "Failed to set infer output buffer, status = " << status << std::endl;
                    return status;
                }

                output_data_and_infos.push_back(std::make_pair(
                    bindings->output(output_name)->get_buffer()->data(),
                    infer_model->hef().get_output_vstream_infos().release()[0]
                ));

                output_buffer_guards.push_back(output_buffer);
            }

            auto status = configured_infer_model->wait_for_async_ready(std::chrono::milliseconds(1000));
            if (HAILO_SUCCESS != status) {
                std::cerr << "Failed to run wait_for_async_ready, status = " << status << std::endl;
                return status;
            }

            auto job = configured_infer_model->run_async(bindings.value(),
                [&inferred_data_queue, output_data_and_infos, output_buffer](const hailort::AsyncInferCompletionInfo& info) {
                    inferred_data_queue.push(output_data_and_infos);
                    (void)output_buffer;
                });

            if (!job) {
                std::cerr << "Failed to start async infer job, status = " << job.status() << std::endl;
                return job.status();
            }

            job->detach();
            if (frame_count != -1 && current_frame == frame_count - 1) {
                last_infer_job = job.release();
                break;
            }
            
            current_frame++;
            
        } catch (const std::exception& e) {
            std::cerr << "Error during inference: " << e.what() << std::endl;
            break;
        }
    }

Hopefully, this clears things up! If you have any more questions, don’t hesitate to reach out.

Best Regards,
Omri

ksj · November 12, 2024, 2:01am

Let’s start from the beginning, shall we?
What I meant by obtaining v4l2 image data with DMA buffer is that we directly query the v4l2 buffer with ioctl call. And then, we export that said buffer with another ioctl call to DMA’s virtual/physical address scheme to import already mentioned exported buffer, so that we can send it directly to the GPU and several other components that our custom board utilizes for our purpose.

In other words, we don’t have to use cv::VideoCapture to obtain data. That’s entirely different story.

With that being said, setting frame_count to -1 just like your example would be a huge problem because

it’s initially double

and then gets casted to size_t

which means by the time that it comes to this part

the number of elements inside of the vector would be the maximum number that size_t allows and it would definitely fail the program, as far as I know.

Another thing is, std::promise and std::future, especially std::promise is a sort of one-way state machine, which means I can call set_value() only on time for one std::promise object.

So, at first, what you proposed -having the vector of std::promise and std::future dynamically popping back and forth with the frame- does not seem to be catching
"Promise already satisfied "
error, but again, since we have to get the actual data from future in another thread, it’s not feasible either.

Right now, I’m doing it with std::queue instead of std::vector and having std::shared_future to call get() mixed with condition variable, but since it has to wait for the condition variable to change the state of the queue, the performance is not that great…

Topic		Replies	Views
Multicamera inference General raspberry-pi , hailo8	6	186	June 2, 2025
Problems with concurrent/streaming webcam inferencing on imx6 General gstreamer , hailo8	3	147	July 10, 2024
CHECK_SUCCESS failed with status=HAILO_NOT_FOUND(61) after period of time General hailort , error	1	125	December 10, 2024
Inference using Arducam frames General dfc , hailo8	9	72	March 21, 2025
How to run Hailo inference (object detection with custom yolo model) in ROS2 callback? General hailo8 , yolov8	0	79	April 5, 2025

Async_detection_inference example with v4l2 camera input

Key Adjustments

Summary of Changes

Alternatively:

Related topics