Async_detection_inference example with v4l2 camera input

So, I’m trying to run async_detection_inference on

To be more specific, our own hardware platform supports DAM buffer export on v4l2 devices and it returns virtual address of void *. So, what I am trying to do is hooking that up -set_buffer(MemoryView()) with the camera image data- with the application to run it and I’m having some setbacks.

  1. In the example code, it declares frame_count amount of frames_promises and frames_futures so that each frame has promises and futures.

In case of v4l2 devices -camera for instance- there is no set amount of frames to run the application. So, I decided to declared the std::vector variables just with 1 element each and
when it comes down to

frames_promises[i].set_value(org_frame);

in the hailo_status run_preprocess(), following error has been occurred.
image

Obviously, my approach of having just 1 size of promise and future has been failed.

So my question comes down to how should I modify your example to run with indefinite amount of real time frames -such as v4l2 camera images-?

  1. Is there any other point in the code that I should look into more closely?

By the way, DAM is typo.
I meant, DMA.

Here is the snippet codes that I modified

modified code snippet:
image
original code snippet
image
By the way, your example automatically sets frame_size to -1 if the input path has not been given. Which does not make sense to my setting

modified:
image
original:
image

modified:

original:

Hey @ksj,

To modify the async_detection_inference example to work with a V4L2 camera that supports DMA buffer export and to allow indefinite frame capture, here are the recommended adjustments. These changes ensure that frames are handled dynamically without a pre-set frame count and adapt the promises and futures logic accordingly.

Key Adjustments

  1. Dynamic Frame Handling:

    • Since V4L2 doesn’t have a fixed frame count, avoid setting a predefined size for frames_promises and frames_futures. Instead, manage frames dynamically by creating a new promise and future for each frame as it is captured, which allows for flexible frame handling without limiting the frame count.
  2. Setting Up Promises and Futures for Each Frame:

    • Use a loop to handle promises and futures dynamically for each incoming frame. Here’s an example structure to manage this:

      std::vector<std::promise<cv::Mat>> frames_promises;
      std::vector<std::future<cv::Mat>> frames_futures;
      
      while (capture_frames) {  // Main loop for continuous frame capture
          frames_promises.emplace_back(); // Create a new promise for each frame
          frames_futures.push_back(frames_promises.back().get_future()); // Get the corresponding future
          
          // Capture frame (DMA buffer from V4L2 device)
          cv::Mat frame = capture_from_camera();  // Capture the frame using your camera API
      
          frames_promises.back().set_value(frame); // Set the captured frame
      
          // Process the frame using Hailo's inference functions
          auto preprocessed_image = frames_futures.back().get();
          process_frame(preprocessed_image); // Run inference
      
          frames_promises.pop_back(); // Remove processed promise to prevent overflow
          frames_futures.pop_back();   // Remove processed future to prevent overflow
      }
      
  3. Using DMA Address in Buffer Setup:

    • In the code, ensure that set_buffer(MemoryView()) receives the DMA buffer’s virtual address. Replace references to frame data with buf_vaddr, the buffer’s address from the camera’s DMA:

      auto status = bindings->input(input_name)->set_buffer(MemoryView(reinterpret_cast<void*>(d_buf->buf_vaddr), input_frame_size));
      
  4. Modify run_preprocess() for Continuous Frames:

    • Adjust the run_preprocess() function to handle frames in a loop, removing any checks or conditions related to a fixed frame count. This will support a continuously running application.
  5. Avoid Overflow with Promises/Futures:

    • To prevent memory overflow, pop the processed frames out of frames_promises and frames_futures after each processing loop. This approach keeps memory usage stable by removing old frames as new ones are added.

Summary of Changes

  1. Dynamically create frames_promises and frames_futures as new frames arrive.
  2. Pass the DMA address (buf_vaddr) to set_buffer(MemoryView()).
  3. Adjust run_preprocess() to handle frames continuously without a fixed count.
  4. Clear processed promises and futures from the vectors to prevent memory overflow.

With these changes, your async_detection_inference setup should be able to run continuously with your V4L2 camera, processing frames indefinitely. Let me know if there are further questions or specific issues with the implementation!

There are several points I would like to make.

First of all,
you proposed the scheme to pop back and forth the vector everytime I get an image frame. That would certainly avoid the error, but the whole point of async API is set_value on the promise and get the data are called on different thread. Therefore, if I get_future on one thread, that future would be useless to get on the actual inference thread, as the original code states. I need a better way to tackle this problem other than using std::vector or maybe using something other than std::promise and std::future.

Second of all,
as I stated, the final image data that I get my hands on is void *. Which means I don’t have to declare -if I really have to- my promise and future with cv::Mat.

Hey @ksj,

Sorry if I didn’t catch what you meant exactly.

So, here’s how the example app works when using a camera as the input:

    cv::VideoCapture capture;
    double frame_count;
    if (input_path.empty()) {
        capture.open(0, cv::CAP_ANY);
        if (!capture.isOpened()) {
            throw "Error in camera input";
        }
        frame_count = -1;
    }
    else {
        capture.open(input_path, cv::CAP_ANY);
        if (!capture.isOpened()){
            throw "Error when reading video";
        }
        if (!image_num.empty()){
            if (input_path.find(".avi") == std::string::npos && input_path.find(".mp4") == std::string::npos){
                frame_count = std::stoi(image_num);
            }
            else {
                frame_count = capture.get(cv::CAP_PROP_FRAME_COUNT);
                image_num = "";
            }
        }
        else {
            frame_count = capture.get(cv::CAP_PROP_FRAME_COUNT);
        }
    }

    double org_height = capture.get(cv::CAP_PROP_FRAME_HEIGHT);
    double org_width = capture.get(cv::CAP_PROP_FRAME_WIDTH);

    capture.release();

In this setup, if frame_count is set to -1, the app will keep processing frames indefinitely when it’s using a camera input (or if the input is empty, it just assumes it’s a camera). Basically, it captures each frame one at a time and treats them as single images, so there’s no “video” handling here. To keep it running indefinitely, just make sure frame_count is -1.

To get images from the DMA buffer, you can tweak the code like this:

        // Get DMA buffer pointer directly
        void* dma_buffer = frames_futures[i].get();

        // Set input from DMA buffer
        for (const auto &input_name : infer_model->get_input_names()) {
            size_t input_frame_size = infer_model->input(input_name)->get_frame_size();
            auto status = bindings->input(input_name)->set_buffer(MemoryView(dma_buffer, input_frame_size));
            if (HAILO_SUCCESS != status) {
                std::cerr << "Failed to set infer input buffer, status = " << status << std::endl;
                return status;
            }
        }
 std::vector<std::promise<void*>>& frames_promises,    // Changed to void* for DMA
 std::vector<std::future<void*>>& frames_futures,      // Changed to void* for DMA

Alternatively:

Here’s an untested bit of code that skips the whole “futures and promises” part to work directly with the DMA buffer, which might simplify things:

  while (true) {
        try {
            // Set input buffer using DMA buffer directly
            for (const auto &input_name : infer_model->get_input_names()) {
                size_t input_frame_size = infer_model->input(input_name)->get_frame_size();
                auto status = bindings->input(input_name)->set_buffer(MemoryView(dma_buffer, input_frame_size));
                if (HAILO_SUCCESS != status) {
                    std::cerr << "Failed to set infer input buffer, status = " << status << std::endl;
                    return status;
                }
                
                // Store DMA buffer pointer in guards
                input_buffer_guards.push_back(std::make_shared<void*>(dma_buffer));
            }

            // Handle output buffers
            std::vector<std::pair<uint8_t*, hailo_vstream_info_t>> output_data_and_infos;
            for (const auto &output_name : output_names) {
                size_t output_frame_size = infer_model->output(output_name)->get_frame_size();
                output_buffer = page_aligned_alloc(output_frame_size);
                
                auto status = bindings->output(output_name)->set_buffer(MemoryView(output_buffer.get(), output_frame_size));
                if (HAILO_SUCCESS != status) {
                    std::cerr << "Failed to set infer output buffer, status = " << status << std::endl;
                    return status;
                }

                output_data_and_infos.push_back(std::make_pair(
                    bindings->output(output_name)->get_buffer()->data(),
                    infer_model->hef().get_output_vstream_infos().release()[0]
                ));

                output_buffer_guards.push_back(output_buffer);
            }

            auto status = configured_infer_model->wait_for_async_ready(std::chrono::milliseconds(1000));
            if (HAILO_SUCCESS != status) {
                std::cerr << "Failed to run wait_for_async_ready, status = " << status << std::endl;
                return status;
            }

            auto job = configured_infer_model->run_async(bindings.value(),
                [&inferred_data_queue, output_data_and_infos, output_buffer](const hailort::AsyncInferCompletionInfo& info) {
                    inferred_data_queue.push(output_data_and_infos);
                    (void)output_buffer;
                });

            if (!job) {
                std::cerr << "Failed to start async infer job, status = " << job.status() << std::endl;
                return job.status();
            }

            job->detach();
            if (frame_count != -1 && current_frame == frame_count - 1) {
                last_infer_job = job.release();
                break;
            }
            
            current_frame++;
            
        } catch (const std::exception& e) {
            std::cerr << "Error during inference: " << e.what() << std::endl;
            break;
        }
    }

Hopefully, this clears things up! If you have any more questions, don’t hesitate to reach out.

Best Regards,
Omri

Let’s start from the beginning, shall we?
What I meant by obtaining v4l2 image data with DMA buffer is that we directly query the v4l2 buffer with ioctl call. And then, we export that said buffer with another ioctl call to DMA’s virtual/physical address scheme to import already mentioned exported buffer, so that we can send it directly to the GPU and several other components that our custom board utilizes for our purpose.

In other words, we don’t have to use cv::VideoCapture to obtain data. That’s entirely different story.

With that being said, setting frame_count to -1 just like your example would be a huge problem because
image
it’s initially double


and then gets casted to size_t

which means by the time that it comes to this part
image
the number of elements inside of the vector would be the maximum number that size_t allows and it would definitely fail the program, as far as I know.

Another thing is, std::promise and std::future, especially std::promise is a sort of one-way state machine, which means I can call set_value() only on time for one std::promise object.

So, at first, what you proposed -having the vector of std::promise and std::future dynamically popping back and forth with the frame- does not seem to be catching
"Promise already satisfied "
error, but again, since we have to get the actual data from future in another thread, it’s not feasible either.

Right now, I’m doing it with std::queue instead of std::vector and having std::shared_future to call get() mixed with condition variable, but since it has to wait for the condition variable to change the state of the queue, the performance is not that great…