LayerNorm Layout Optimization Error During Segmentation Model Quantization: Size of values 0 does not match size of permutation 4 @ fanin shape in segmentator fpn_res18 model

LayerNorm Layout Optimization Error During Segmentation Model Quantization

Summary

I’m experiencing a persistent layout optimization error when converting a FPN ResNet-18 segmentation model (from Fishial.ai) to Hailo-8. The error occurs during the Statistics Collector phase and appears to be related to LayerNorm operations. I’ve tried multiple ONNX export configurations without success.

Environment

  • Hailo DFC Version: 3.33.0
  • HailoRT Version: 4.23.0
  • Target Hardware: Hailo-8 (26 TOPS)
  • Host System: WSL2, Ubuntu 22.04, Python 3.10
  • GPU: RTX 4080 (16GB VRAM)
  • Docker Image: Based on Hailo AI SW Suite 2025-10

Model Details

  • Architecture: FPN (Feature Pyramid Network) with ResNet-18 backbone
  • Task: Fish segmentation
  • Input Shape: [1, 3, 416, 416] (static)
  • Output Shape: [1, 1, 416, 416] (segmentation mask)
  • Model Size: 13M parameters
  • Source: Fishial.ai segmentator model (TorchScript → ONNX → HAR)

Error Message

E0000 00:00:1761695513.267114      33 meta_optimizer.cc:966\] layout failed: INVALID_ARGUMENT: 
Size of values 0 does not match size of permutation 4 @ fanin shape in
segmentator_fpn_res18_416_1/reduce_mean1_spatial_variance_layer_normalization5_1/
act_op_1/StatefulPartitionedCall/SelectV2-1-TransposeNHWCToNCHW-LayoutOptimizer

When it occurs: During Statistics Collector calibration phase (after LayerNorm Decomposition completes successfully)

Behavior: The error appears once, optimization continues and completes, but I’m concerned about performance impact due to failed layout optimization.

What I’ve Tried

ONNX Export Variations

All exports done from TorchScript with the following variations:

  1. Opset versions: 11, 13, 14, 17
  2. Dynamic vs Static shapes:
    • dynamic_axes={'input': {0: 'batch_size'}} (original)
    • dynamic_axes=None (static shapes)
  3. Constant folding: Both True and False
  4. Training mode: torch.onnx.TrainingMode.EVAL
  5. Verified: All ONNX models pass onnx.checker validation

Current ONNX Export Settings

torch.onnx.export(
    model,
    dummy_input,
    output_path,
    *export_params*=True,
    *opset_version*=17,                          *# Tried 11, 13, 14, 17*

    *do_constant_folding*=False,                 *# LayerNorm-friendly*
    *training*=torch.onnx.TrainingMode.EVAL,
    *input_names*=\['input'\],
    *output_names*=\['output'\],
    *dynamic_axes*=None,                         *# Static shapes*
    *verbose*=False
)

Model Script Configuration

*# Optimization Level 2: Standard with Equalization + Finetune*
model_optimization_flavor(*optimization_level*=2, *compression_level*=1)
model_optimization_config(compression_params, *auto_4bit_weights_ratio*=0.150)
model_optimization_config(calibration, *batch_size*=8, *calibset_size*=512)
post_quantization_optimization(bias_correction, *policy*=enabled)
post_quantization_optimization(finetune, *policy*=enabled, *batch_size*=8, *dataset_size*=512, *epochs*=2)

Optimization Pipeline

*# Parse ONNX to HAR*

hailo parser onnx segmentator.onnx \\
  --net-name segmentator_fpn_res18_416 \\
  --har-path segmentator.har \\
  --hw-arch hailo8 \\
  -y

*# Optimize (with GPU)*

hailo optimize segmentator.har \
  --model-script segmentator_model_script.alls \
  --output segmentator_optimized.har \
  --use-random-calib-set \
  --calib-random-max 512 \
  --work-dir /tmp/hailo_work

Optimization Progress Log

[info] Starting Model Optimization
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Ratio of weights in 4bit is 0.18
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.30)

[info] Starting LayerNorm Decomposition
[info] Using dataset with 512 entries for calibration

Calibration: 100%|██████████| 512/512 \[00:40<00:00, 12.54entries/s\]

[info] Model Optimization Algorithm LayerNorm Decomposition is done (completion time is 00:00:55.68)

[info] Starting Statistics Collector
[info] Using dataset with 512 entries for calibration

Calibration:   0%|          | 0/512 \[00:00<?, ?entries/s\]

E0000 00:00:1761695513.267114      33 meta_optimizer.cc:966\] layout failed: INVALID_ARGUMENT: 

Size of values 0 does not match size of permutation 4 @ fanin shape in

segmentator_fpn_res18_416_1/reduce_mean1_spatial_variance_layer_normalization5_1/

act_op_1/StatefulPartitionedCall/SelectV2-1-TransposeNHWCToNCHW-LayoutOptimizer

Calibration: 100%|██████████| 512/512 \[01:39<00:00,  5.13entries/s\]

[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:01:45.12)

[info] Starting Fix zp_comp Encoding

[... continues successfully ...]

Questions

  1. Is this error blocking or just a warning? The optimization completes, but what’s the FPS impact of the failed layout optimization?

  2. Is this a known limitation? The specific LayerNorm implementation in FPN models may not be compatible with Hailo’s NCHW↔NHWC transpose requirements.

  3. What’s the recommended approach? Should I:

    • Accept the fallback and test on hardware?
    • Use a different normalization approach (BatchNorm, InstanceNorm)?
    • Contact Hailo support for a potential patch/workaround?
  4. Is there a way to see exactly which layers are affected and what the performance penalty is?

Additional Context

The problematic layer appears to be: reduce_mean1_spatial_variance_layer_normalization5_1, which is part of the FPN segmentation head.

Expected Outcome

I need optimal inference performance on Hailo-8 for real-time fish segmentation. Any FPS degradation from suboptimal layout could impact the real-time capability of the system (target: 15-30 FPS).

Any guidance on:

  • Whether this is a critical issue
  • How to measure the actual performance impact
  • Potential workarounds or fixes
  • Whether this is something Hailo engineering has seen before

Thank you for any insights!


Model source: GitHub - fishial/fish-identification: Fish Detection (Segmentation) & Classification models and training scripts

HEF target: Raspberry Pi 5 + Hailo AI HAT+ (26 TOPS)