Compiling PaddleOCR

I was just wondering if anyone has / is able to compile PaddleOCR into a HEF file. I want to use PaddleOCR since it’s reliable and very accurate, however even though I have a successful compilation with the DFC, in the final step it resulted in a noise analysis of -3dB, which I assume is pretty bad. This is probably due to the compilation step, I’m not really sure what goes in the .alls file, and nms_config if that’s applicable in this case.

The purpose of this is similar to the LPR use case here, but for a different set of objects including letters. I was also having a lot of trouble with the calibration dataset, as I’m not sure what to pass there, where I normalized and unnormalized the calibration data, and got many errors shown below:


!my_env/bin/python optimize_model.py
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Loading model script commands to paddleOCR_renamed from string
[info] ParsedPerformanceParam command, setting optimization_level(max=2)
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:01.22)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100% 64/64 [00:58<00:00,  1.10entries/s]
[info] Statistics Collector is done (completion time is 00:01:02.96)
[info] Using dataset with 64 entries for calibration
Calibration: 100% 64/64 [01:23<00:00,  1.31s/entries]
[info] Output layer paddleOCR_renamed/deconv2 with sigmoid activation was detected. Forcing its output range to be [0, 1] (original range was [7.70556746658934e-29, 0.9999967813491821]).
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[info] No shifts available for layer paddleOCR_renamed/avgpool3/avgpool_op, using max shift instead. delta=1.0000
[info] No shifts available for layer paddleOCR_renamed/reducing_avgpool1/avgpool_op, using max shift instead. delta=2.0000
[info] No shifts available for layer paddleOCR_renamed/reducing_avgpool2/avgpool_op, using max shift instead. delta=1.0000
[info] No shifts available for layer paddleOCR_renamed/avgpool8/avgpool_op, using max shift instead. delta=1.0000
[info] No shifts available for layer paddleOCR_renamed/reducing_avgpool9/avgpool_op, using max shift instead. delta=1.0000
[info] No shifts available for layer paddleOCR_renamed/reducing_avgpool10/avgpool_op, using max shift instead. delta=1.0000
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped

Here is my alls script, these were the arguments that allowed it to compile, not sure if they are right:

normalization = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[2, 2])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool2, division_factors=[2, 2])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool9, division_factors=[2, 2])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool10, division_factors=[4, 4])
performance_param(compiler_optimization_level=max)

My primary question is just to see if anyone was able to compile PaddleOCR with accurate results, and what data should be in the calibration set, should it match the original training data of the model, normalized or unnormalized?

Hi can you please provide more information?
Which exact model are you compiling?
If you managed to compile, can share the compilation outputs and scripts you used. it will help us understand to the issue.

Hello,

Thank you for the response, here’s my end to end process. Here’s the link the the model I want to use: PaddleOCR. Reason for choosing this specific OCR model, performs well on blurry and non-blurry images and has a very high accuracy. This is all done on google colab, more info on how I compiled it can be found here, I believe the main places of error could be

  1. Step 6: The end nodes or net_input_shapes when converting to ONNX.
  2. Step 7: Generating a calibration dataset, my .alls script, and lack of nms_config? Not sure if the post process is needed for PaddleOCR.
  3. For my .alls commands, I ran into a lot of issues with all of the avgpooling layers like: pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[2, 2]), adding this line solved enabled the optimization step to move forward, but I’m not sure if that is the right way to solve the warnings that I got, which were just that the values lied outside the expected range.

  • The calibration dataset is a little confusing to me, should the calibration dataset be on the same images as the dataset used to train PaddleOCR?
  • If I didn’t normalize the dataset beforehand (values lie between 0-255), do I have to specify normalization = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0]) in the .alls script?
  • Vice versa, if I normalized it beforehand (values lie between 0-1), what do I do here for the .alls?

Step 1 - Installing Packages:
!pip install paddleocr paddle2onnx colorama ffmpeg-python paddlepaddle

Step 2 - Exporting PaddleOCR to ONNX for conversion:

from paddleocr import PaddleOCR
ocr = PaddleOCR()
!paddle2onnx \
    --model_dir /root/.paddleocr/whl/det/ch/ch_PP-OCRv4_det_infer \
    --model_filename inference.pdmodel \
    --params_filename inference.pdiparams \
    --save_file /content/ch_PP-OCRv4_det.onnx

Step 3 - Updating all packages and venv for the DFC:

!sudo apt-get update
!sudo apt-get install -y python3-dev python3-distutils python3-tk libfuse2 graphviz libgraphviz-dev
!pip install --upgrade pip virtualenv
!sudo apt install python3.10
!sudo apt install python3.10-venv
!python3.10 -m venv my_env

Step 4 - Installing the DFC 3.29.0:

#Installing the WHL file for Hailo DFC
# Just saved it to my gdrive
!gdown 15ORXdfAgFgN6TloxGxO_ClR_ZNoe6lwX

!my_env/bin/pip install /content/hailo_dataflow_compiler-3.29.0-py3-none-linux_x86_64.whl

Step 5 - Reverting CUDA drivers to 11.8 since it doesn’t work with 12.2:

Pretty long process, can add if needed

Step 6 - Translation Scripts:

with open("translate_model.py", "w") as f:
    f.write("""
from hailo_sdk_client import ClientRunner

# Define the ONNX model path and configuration
onnx_path = "/content/ch_PP-OCRv4_det.onnx"
onnx_model_name = "paddleOCR_renamed"
chosen_hw_arch = "hailo8"  # Specify the target hardware architecture

# Initialize the ClientRunner
runner = ClientRunner(hw_arch=chosen_hw_arch)

#For paddle overwrite
net_input_shapes = {
    "x": [16, 3, 640, 640]  # Replace dimensions if necessary for your model
}
end_node_names = [
    "sigmoid_0.tmp_0"
]
try:
    # Translate the ONNX model to Hailo's format
    hn, npz = runner.translate_onnx_model(
        onnx_path,
        onnx_model_name,
        end_node_names=end_node_names,
        net_input_shapes=net_input_shapes,  # Adjust input shapes if needed
    )
    print("Model translation successful.")
except Exception as e:
    print(f"Error during model translation: {e}")
    raise

# Save the Hailo model HAR file
hailo_model_har_name = f"{onnx_model_name}_hailo_model.har"
try:
    runner.save_har(hailo_model_har_name)
    print(f"HAR file saved as: {hailo_model_har_name}")
except Exception as e:
    print(f"Error saving HAR file: {e}")
    """)

Then running it in the venv:
!my_env/bin/python translate_model.py


Step 7 - Optimizing script:

with open("optimize_model.py", "w") as f:
    f.write("""
import os
from hailo_sdk_client import ClientRunner

# Define your model's HAR file name
model_name = "paddleOCR_renamed"
hailo_model_har_name = f"{model_name}_hailo_model.har"

# Ensure the HAR file exists
assert os.path.isfile(hailo_model_har_name), "Please provide a valid path for the HAR file"

# Initialize the ClientRunner with the HAR file
runner = ClientRunner(har=hailo_model_har_name)

# Define the model script to reduce global average pooling spatial dimensions
model_script = \"\"\"
normalization = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[2, 2])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool2, division_factors=[2, 2])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool9, division_factors=[2, 2])
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool10, division_factors=[4, 4])

performance_param(compiler_optimization_level=max)
\"\"\"

# Load the model script into the ClientRunner
runner.load_model_script(model_script)

# Define a calibration dataset
# Replace '/content/processed_calibration_data.npy' with your actual dataset path
calib_dataset = "/content/paddleocr_calibration_data_1024_255.npy"
assert os.path.exists(calib_dataset), "Calibration dataset not found!"

# Perform optimization with the calibration dataset
runner.optimize(calib_dataset)

# Save the optimized model to a new Quantized HAR file
quantized_model_har_path = f"{model_name}_quantized_model.har"
runner.save_har(quantized_model_har_path)

print(f"Quantized HAR file saved to: {quantized_model_har_path}")
    """)

Then running it:
!my_env/bin/python optimize_model.py

Step 8 - Compilation:

with open("compile_model.py", "w") as f:
    f.write("""
from hailo_sdk_client import ClientRunner
import os

# Define the quantized model HAR file
model_name = "paddleOCR_renamed"
quantized_model_har_path = f"{model_name}_quantized_model.har"
output_directory = "/content/paddle_output"

os.makedirs(output_directory, exist_ok=True)

# Initialize the ClientRunner with the HAR file
runner = ClientRunner(har=quantized_model_har_path)
print("[info] ClientRunner initialized successfully.")

# Compile the model
try:
    hef = runner.compile()
    print("[info] Compilation completed successfully.")
except Exception as e:
    print(f"[error] Failed to compile the model: {e}")
    raise

# Save the compiled model to the specified directory
output_file_path = os.path.join(output_directory, f"{model_name}.hef")
with open(output_file_path, "wb") as f:
    f.write(hef)

print(f"[info] Compiled model saved successfully to {output_file_path}")
""")

This process completes just fine, just takes a super long time (overnight on a T4 GPU) to compile. Keeps iterating through a “multi-context” flow several times, then eventually compiles into a .hef.

Sorry about the long response, let me know if I can provide anything else, would love your feedback!

I also faced the same set of issues when converting optimised_det.har to .hef for paddleocr, and required documentation is not provided for the same.
in my case the convertion terminates midway, leaving me clueless.

Certain help expected.

Thanks
Regards
Parth Bapaye

Thank you for the very detailed answer, it will be very helpful in our investigation.

Answering your questions:

The calibration dataset is used for statistic collection (getting the activation ranges). Because of that, it needs to accurately represent the data that you will run inference on. The train set works.

The std and mean used in the normalization should be the same used during the training. You need to check out what they used in their repo.

1 Like

Got it thank you for answering my questions. I just had one more regarding the division factors, I didn’t completely understand the division factors: ```
pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[2, 2])

What impact does the division factor have on the model? I assume the model performance is impacted somehow. Also just wondering, I didn't find many other OCR methods, apart from LPRnet, but can we expect PaddleOCR in a future release in the HailoMZ or something of the sorts? It is a highly rated OCR model for it's efficiency, and performs quite well for text detection/recognition.

As you correctly inferred, the division factor has a negative impact on throughput.

PaddleOCR is one of the networks that we are considering adding to our ModelZoo, we don’t have a timeline yet.

We are investigating the accuracy issue that you are facing and will update once we have progress. This may take a while, we appreciate your patience.

1 Like

Sounds good! Appreciate the help.

Hi! Just want to add that for our workflow, an OCR network/example in the ModelZoo would be greatly appreciated.

I’ve tested the optimization with the correct normalization values and a real calibration dataset and got okay-ish SNR. This is the normalization command I’ve used:
normalization = normalization([123.675, 116.28, 103.53], [58.395, 57.12, 57.375])

With these mean and std, it’s not necessary to reduce GAP ops. But you might have to check some higher optimization levels - I would suggest around 3 epochs of adaround.

1 Like

This is just a summary of my questions combined into one post

How did you determine the normalization values? I noticed that most YOLO models in the model zoo use a normalization range of 0–255, but other models have different values, including what you provided. Could you explain how you arrived at those specific values?

Regarding calibration data, what data did you use? I tried finding PaddleOCR’s training data but couldn’t locate it. You mentioned:

Since I’m implementing a similar pipeline to License Plate Recognition (LPR)—sending cropped images to PaddleOCR for text detection/recognition. However, I found that PaddleOCR uses dynamic sizing. The onnx model input shapes are as follows:
['DynamicDimension.0', 3, 'DynamicDimension.1', 'DynamicDimension.2']
So two questions here:

  1. Since Hailo does not support dynamic input sizes, what input shape I use for net_input_shape, should this be the size of the images that I will be performing inference on? From my understanding PaddleOCR automatically resizes whatever is detected to a certain aspect ratio while keeping an image height of 32px, so how do I reconcile this in the net_input_shape? Also I’m pretty sure dynamic batch sizes are supported if I’m not wrong, hence the -1 in the input shape. This is just what I have right now:
net_input_shapes = {
    "x": [-1, 3, 640, 640]  
}
end_node_names = [
    "sigmoid_0.tmp_0"
]
    # Translate the ONNX model to Hailo's format
    hn, npz = runner.translate_onnx_model(
        onnx_path,
        onnx_model_name,
        end_node_names=end_node_names,
        net_input_shapes=net_input_shapes,  # Adjust input shapes if needed
    )
    print("Model translation successful.")

  1. Should whatever net_input_shape I decide on match the shape of the calibration data? For example, if I set net_input_shape to [16, 3, 640, 640], should I resize all calibration images to this format?
  2. On top of my last question, should the cropped region (detections) also be resized to this [640,640]?
  3. With dynamic resizing, if PaddleOCR automatically does that behind the scenes, is that an operation that is supported by Hailo? Kind of confused on how dynamic resizing is handled if it is.

Also, I assume you didn’t modify the end node (sigmoid_0.tmp_0).

Apologies if the questions are confusing, please feel free to ask me to clarify if needed.

Thanks again for your help! You’ve been very responsive, and I really appreciate it.

Hello, I am currently converting the PP OCR model to HEF. Doesn’t the PP OCR model include three parts: DET, REC, and CLS? I can convert both the det and CLS models correctly, but when parsing the Rec model to Hef, the following error occurs:
“[error] Mapping Failed (allocation time: 2m 52s)
Compiler could not find a valid partition to contexts. Most commom error is: Automri finished with too many resources on context_4 with 11/68 failures.”
If you could answer my question, I would be extremely grateful. Can we add a contact information to communicate together?

It does contain three parts, I just wanted to do one at a time. You can contact me at trieut415@gmail.com, if that doesn’t work we can figure out a better method of communication from there. What commands did you use to get the det and cls models to compile?

  1. Net input shape
  2. What calibration data did you use, and did you resize it to your net input shape?
  3. What were you .alls scripts?
  4. What was your SNR for both?

Hey! Has anyone had some sort of success running the complete PaddleOCR?

Regarding the input shapes, I found this file here in the PaddleOCR source code that lists some common dynamic shapes for the recognition model (I think? I havent fully read the source code). This should/could help in determining a suitable shape for the rec-model. PaddleOCR/ppocr/utils/export_model.py at main · PaddlePaddle/PaddleOCR · GitHub

1 Like