Text Detection + Recognition: PaddleOCR Pipelined on Hailo-8 and Hailo-10H Guide

This guide provides a high-level overview of the newly added PaddleOCR application, focusing into the internal structure and advanced functionality, the app performs end-to-end text recognition using a two-stage OCR pipeline accelerated by Hailo-8 and Hailo-10H devices.

The pipeline combines:

  • A text detector to locate text regions

  • A text recognizer to decode the text inside each region

Example Runs

Single Image:

python3 paddle_ocr.py -n ocr_det.hef ocr.hef -i ocr_img1.png

Folder of Images:

python3 paddle_ocr.py -n ocr_det.hef ocr.hef -i ./my_images/

Video File:

python3 paddle_ocr.py -n ocr_det.hef ocr.hef -i input.mp4

Camera:

python3 paddle_ocr.py -n ocr_det.hef ocr.hef -i camera

Optional: Spell Correction

You can optionally improve OCR text accuracy using a spelling correction dictionary powered by symspellpy

python3 paddle_ocr.py … --use-corrector

Full Pipeline Description

The PaddleOCR app uses a multi-threaded, queue-based pipeline to process input efficiently and asynchronously across multiple stages.

Preprocessing
   ↓
Text Detector (HEF 1)
   ↓
Detection Postprocess → [No Text] → Visualize
   ↓
Text Recognizer (HEF 2)
   ↓
OCR Postprocess
   ↓
Visualization
   ↓ [Output]

1. Preprocessing

  • Input source can be:

    • A single image

    • A folder of images

    • A video file

    • A live camera stream

  • Each frame is:

    • Resized and padded to fit the detector’s input size (while preserving aspect ratio)

    • Batched (if batch_size > 1)

  • Outputs:

    • input_frame (for visualization)

    • preprocessed_frame (ready for inference)

  • Sent to: detector_hailo_infer via det_input_queue

2. Text Detection (HEF 1)

  • Uses the first HEF model to detect text regions

  • Runs asynchronously using HailoInfer.run()

  • On inference completion, triggers a callback:

    • Packs (original_frame, raw_output_tensor)
  • Sent to: det_postprocess_queue

3. Detection Postprocessing

  • Converts the raw heatmap into bounding boxes using DBPostProcess

  • For each box:

    • Crops the region from the original frame

    • Resizes it to fit the OCR model’s expected input size (with padding)

    • Attaches metadata: frame ID and box location

  • If no boxes are detected:

    • Sends the original frame with empty OCR results directly to visualization
  • Otherwise:

    • Sends: (frame, [resized_crop], (frame_id, box)) to ocr_input_queue

4. Text Recognition (HEF 2)

  • Uses the second HEF model to recognize text in each cropped region

  • Also runs asynchronously using HailoInfer.run()

  • On completion, a callback sends:

    • (frame_id, original_frame, ocr_result, box) to ocr_postprocess_queue

5. OCR Postprocessing

  • Collects all OCR outputs for a given frame (tracked by frame_id)

  • Keeps track of how many boxes are expected for that frame

  • Once all OCR results are collected:

    • Groups them into one bundle: (frame, list_of_results, list_of_boxes)

    • Sends to: vis_output_queue for visualization

  • Cleans up memory (removes processed frame_id entries)

6. Visualization & Rendering

  • Uses the inference_result_handler() to:

    • Decode OCR model outputs into readable text

    • (Optionally) apply spell correction using SymSpell if --use-corrector is set

  • Draws the results:

    • Left side: original image

    • Right side: same image with OCR results written inside white boxes

  • Saves each frame (image/video) to --output-dir

  • Optionally displays FPS if --show-fps is enabled

Threads Overview

Each of these stages runs in a separate thread:

Thread Role
preprocess_thread Prepares and resizes input
det_thread Runs text detection HEF
detection_postprocess Extracts boxes, crops, resizes
ocr_thread Runs text recognition HEF
ocr_postprocess Groups and synchronizes OCR results
vis_postprocess Handles decoding, correction, and rendering

Internal Queues

Queue Name Purpose
det_input_queue Holds original + preprocessed frames for the detector inference engine
det_postprocess_queue Receives detection outputs (raw tensors + original frames) for postprocessing
ocr_input_queue Carries cropped text regions + metadata to the OCR inference engine
ocr_postprocess_queue Receives OCR model outputs along with original frame and box info
vis_output_queue Collects final grouped results (frame, texts, boxes) for visualization and output
2 Likes

Hi @Majed_Abu_Mokh

Is there a github repo or a shared folder that contains these hef models?

hey @shashi,

The example includes a bash script for downloading the relevant HEF files (download_resources.sh)

@nina-vilela

Thanks. For some reason, I could not see the link to the github repo when I posted my message.

Hi @Majed_Abu_Mokh , do you have models for Hailo8l ?

Hi Aleksei_Markov
I don’t have models for Hailo-8L at the moment.
I’ll update you as soon as they’re released.