I need to quantize a custom model with RepVGG as the backbone.
I noticed that Hailo Model Zoo has several models that use a reparameterization model such as RepVGG as their backbone. How are these models quantized?
Is the Conv after reparameterization quantized? Or is it quantized without reparameterization (with multiple Conv and Norm)?
When you compile a RepVGG-based network through the Hailo Model Zoo (like using hailomz compile … --yaml …), the backend automatically handles the reparameterization before quantization. Here’s what happens under the hood:
The compiler first takes your multi-branch Conv + BatchNorm structure and folds it all down into a single “deploy” convolution. Only after this folding step does it apply quantization. So you’re not actually quantizing each individual training-time branch or BatchNorm layer separately.
The process works like this:
Folding/Re-parameterization: During the full-precision optimization step (optimize_full_precision), all the training-time branches and batch norms get combined into one Conv + Bias layer (the “deploy” kernel). This handles conv + BN folding, conv + add operations, etc.
Quantization: Then in the next optimization pass (optimize), the compiler quantizes this single resulting convolution’s weights and activations, applying whatever weight/activation clipping you’ve configured before converting everything to int8.
So yeah, you’re right - it’s the post-reparameterization Conv that actually gets quantized, not the original multi-Conv-plus-Norm graph structure.
Yes, even when you’re doing QAT you still train the reparameterized Conv – you’re simply fine-tuning it (and any subsequent layers) in the quantized domain so that your final INT8 model recovers as much accuracy as possible