LayerNorm Layout Optimization Error During Segmentation Model Quantization
Summary
I’m experiencing a persistent layout optimization error when converting a FPN ResNet-18 segmentation model (from Fishial.ai) to Hailo-8. The error occurs during the Statistics Collector phase and appears to be related to LayerNorm operations. I’ve tried multiple ONNX export configurations without success.
Environment
- Hailo DFC Version: 3.33.0
- HailoRT Version: 4.23.0
- Target Hardware: Hailo-8 (26 TOPS)
- Host System: WSL2, Ubuntu 22.04, Python 3.10
- GPU: RTX 4080 (16GB VRAM)
- Docker Image: Based on Hailo AI SW Suite 2025-10
Model Details
- Architecture: FPN (Feature Pyramid Network) with ResNet-18 backbone
- Task: Fish segmentation
- Input Shape:
[1, 3, 416, 416](static) - Output Shape:
[1, 1, 416, 416](segmentation mask) - Model Size: 13M parameters
- Source: Fishial.ai segmentator model (TorchScript → ONNX → HAR)
Error Message
E0000 00:00:1761695513.267114 33 meta_optimizer.cc:966\] layout failed: INVALID_ARGUMENT:
Size of values 0 does not match size of permutation 4 @ fanin shape in
segmentator_fpn_res18_416_1/reduce_mean1_spatial_variance_layer_normalization5_1/
act_op_1/StatefulPartitionedCall/SelectV2-1-TransposeNHWCToNCHW-LayoutOptimizer
When it occurs: During Statistics Collector calibration phase (after LayerNorm Decomposition completes successfully)
Behavior: The error appears once, optimization continues and completes, but I’m concerned about performance impact due to failed layout optimization.
What I’ve Tried
ONNX Export Variations
All exports done from TorchScript with the following variations:
- Opset versions: 11, 13, 14, 17
- Dynamic vs Static shapes:
- ✗
dynamic_axes={'input': {0: 'batch_size'}}(original) - ✗
dynamic_axes=None(static shapes)
- ✗
- Constant folding: Both
TrueandFalse - Training mode:
torch.onnx.TrainingMode.EVAL - Verified: All ONNX models pass
onnx.checkervalidation
Current ONNX Export Settings
torch.onnx.export(
model,
dummy_input,
output_path,
*export_params*=True,
*opset_version*=17, *# Tried 11, 13, 14, 17*
*do_constant_folding*=False, *# LayerNorm-friendly*
*training*=torch.onnx.TrainingMode.EVAL,
*input_names*=\['input'\],
*output_names*=\['output'\],
*dynamic_axes*=None, *# Static shapes*
*verbose*=False
)
Model Script Configuration
*# Optimization Level 2: Standard with Equalization + Finetune*
model_optimization_flavor(*optimization_level*=2, *compression_level*=1)
model_optimization_config(compression_params, *auto_4bit_weights_ratio*=0.150)
model_optimization_config(calibration, *batch_size*=8, *calibset_size*=512)
post_quantization_optimization(bias_correction, *policy*=enabled)
post_quantization_optimization(finetune, *policy*=enabled, *batch_size*=8, *dataset_size*=512, *epochs*=2)
Optimization Pipeline
*# Parse ONNX to HAR*
hailo parser onnx segmentator.onnx \\
--net-name segmentator_fpn_res18_416 \\
--har-path segmentator.har \\
--hw-arch hailo8 \\
-y
*# Optimize (with GPU)*
hailo optimize segmentator.har \
--model-script segmentator_model_script.alls \
--output segmentator_optimized.har \
--use-random-calib-set \
--calib-random-max 512 \
--work-dir /tmp/hailo_work
Optimization Progress Log
[info] Starting Model Optimization
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Ratio of weights in 4bit is 0.18
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.30)
[info] Starting LayerNorm Decomposition
[info] Using dataset with 512 entries for calibration
Calibration: 100%|██████████| 512/512 \[00:40<00:00, 12.54entries/s\]
[info] Model Optimization Algorithm LayerNorm Decomposition is done (completion time is 00:00:55.68)
[info] Starting Statistics Collector
[info] Using dataset with 512 entries for calibration
Calibration: 0%| | 0/512 \[00:00<?, ?entries/s\]
E0000 00:00:1761695513.267114 33 meta_optimizer.cc:966\] layout failed: INVALID_ARGUMENT:
Size of values 0 does not match size of permutation 4 @ fanin shape in
segmentator_fpn_res18_416_1/reduce_mean1_spatial_variance_layer_normalization5_1/
act_op_1/StatefulPartitionedCall/SelectV2-1-TransposeNHWCToNCHW-LayoutOptimizer
Calibration: 100%|██████████| 512/512 \[01:39<00:00, 5.13entries/s\]
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:01:45.12)
[info] Starting Fix zp_comp Encoding
[... continues successfully ...]
Questions
-
Is this error blocking or just a warning? The optimization completes, but what’s the FPS impact of the failed layout optimization?
-
Is this a known limitation? The specific LayerNorm implementation in FPN models may not be compatible with Hailo’s NCHW↔NHWC transpose requirements.
-
What’s the recommended approach? Should I:
- Accept the fallback and test on hardware?
- Use a different normalization approach (BatchNorm, InstanceNorm)?
- Contact Hailo support for a potential patch/workaround?
-
Is there a way to see exactly which layers are affected and what the performance penalty is?
Additional Context
The problematic layer appears to be: reduce_mean1_spatial_variance_layer_normalization5_1, which is part of the FPN segmentation head.
Expected Outcome
I need optimal inference performance on Hailo-8 for real-time fish segmentation. Any FPS degradation from suboptimal layout could impact the real-time capability of the system (target: 15-30 FPS).
Any guidance on:
- Whether this is a critical issue
- How to measure the actual performance impact
- Potential workarounds or fixes
- Whether this is something Hailo engineering has seen before
Thank you for any insights!
Model source: GitHub - fishial/fish-identification: Fish Detection (Segmentation) & Classification models and training scripts
HEF target: Raspberry Pi 5 + Hailo AI HAT+ (26 TOPS)