Can't compile .hef after retraining fcn8 on ADE20K

Command:

onnx_path = f"fcn8-{dataset_name}_simplify.onnx"

calibration_images_path = "ot-segmentation-data/datasets/ade20k/images/validation"
yaml_path = f"hailo_model_zoo/hailo_model_zoo/cfg/networks/fcn8_resnet_v1_18_{dataset_name}.yaml"
command = (f"hailomz compile --ckp {onnx_path} --calib-path {calibration_images_path} --resize 512 512"
           f" --hw-arch hailo8 --yaml {yaml_path}")
os.sytem(command)

Output:

<Hailo Model Zoo INFO> Start run for network fcn8_resnet_v1_18_ade20k ...
<Hailo Model Zoo INFO> Initializing the hailo8 runner...
[info] Translation started on ONNX model fcn8_resnet_v1_18_ade20k
[info] Restored ONNX model fcn8_resnet_v1_18_ade20k (completion time: 00:00:07.31)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:07.68)
[info] Start nodes mapped from original model: 'test_input': 'fcn8_resnet_v1_18_ade20k/input_layer1'.
[info] End nodes mapped from original model: 'ArgMax_82'.
[info] Translation completed on ONNX model fcn8_resnet_v1_18_ade20k (completion time: 00:00:08.21)
[info] Saved HAR to: /output_folder/fcn8_resnet_v1_18_ade20k.har
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to fcn8_resnet_v1_18_ade20k from /output_folder/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/fcn8_resnet_v1_18.alls
[info] Appending model script commands to fcn8_resnet_v1_18_ade20k from string
[info] Mapping the argmax layer argmax1 from the neural core to CPU due to availability of resources
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.02)
[info] create_layer_norm skipped
[info] Starting Stats Collector
[info] Using dataset with 10 entries for calibration
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:27<00:00,  2.78s/entries]
[info] Stats Collector is done (completion time is 00:00:29.42)
[info] Bias Correction skipped
[info] Adaround skipped
[info] Fine Tune skipped
[info] Layer Noise Analysis skipped
[info] Model Optimization is done
[info] Saved HAR to: /output_folder/fcn8_resnet_v1_18_ade20k.har
[info] Loading model script commands to fcn8_resnet_v1_18_ade20k from /output_folder/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/fcn8_resnet_v1_18.alls
[info] Adding an output layer after resize1
[info] Loading network parameters
[warning] Output order different size
[info] Starting Hailo allocation and compilation flow
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%

Validating context_0 layer by layer (100%)                  
                                                            
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
                                                            
● Finished                                                  

[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/0 Iteration 0: Mapping prepost...          
Context:0/0 Iteration 0: Mapping prepost...           cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
Context:0/0 Iteration 4: Trying parallel mapping...   cluster_4  cluster_5  cluster_6  cluster_7  prepost 
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost 
 worker0  V          *          V          V          V          *          V          V          V       
 worker1                                                                                                  
 worker2                                                                                                  
 worker3                                                                                                  
  02:51 on cluster mapping: 0ectivity: 0                                                                  
  02:53 on cluster mapping: 0ectivity: 0                                                                  
Reverts on cluster mapping: 0ectivity: 0                                                                  
Reverts on inter-cluster connectivity: 0                                                                  
Reverts on pre-mapping validation: 0                                                                      
Reverts on split failed: 0                                                                                

[info] Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 12.5%               | 10.9%               | 34.4%              |
[info] | cluster_2 | 56.3%               | 51.6%               | 93%                |
[info] | cluster_3 | 6.3%                | 6.3%                | 30.5%              |
[info] | cluster_4 | 6.3%                | 7.8%                | 29.7%              |
[info] | cluster_6 | 81.3%               | 70.3%               | 93%                |
[info] | cluster_7 | 68.8%               | 84.4%               | 59.4%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 28.9%               | 28.9%               | 42.5%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 3m 30s)
[info] Compiling context_0...
[info] Bandwidth of model inputs: 6.0 Mbps, outputs: 300.0 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Building HEF...
invalid input format 4 in edge fcn8_resnet_v1_18_ade20k/resize1 in op fcn8_resnet_v1_18_ade20k/argmax_logits_postprocess1

[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: invalid input format 4 in edge fcn8_resnet_v1_18_ade20k/resize1 in op fcn8_resnet_v1_18_ade20k/argmax_logits_postprocess1

How I retrained the model:
From fcn-retraining docker I run

tools/dist_train.sh configs/fcn/fcn8_r18_ade20k_hailo.py 2;

where fcn8_r18_ade20k_hailo.py is:

# model settings
_base_ = [
    './fcn_r18_ade20k_hailo.py',
]
model = dict(
    decode_head=dict(
        in_channels=[128, 256, 512],
        in_index=[1, 2, 3],
    ),
)

and fcn_r18_ade20k_hailo.py is:

# model settings
_base_ = [
    '../_base_/datasets/ade20k_scene_parsing.py', '../_base_/default_runtime.py',
]

# optimizer
optimizer = dict(type='Adam', lr=0.001, weight_decay=1e-5)
optim_wrapper = dict(type='OptimWrapper', optimizer=optimizer, clip_grad=None)

# learning policy
param_scheduler = [
    dict(
        type='LinearLR', start_factor=0.2, by_epoch=False, begin=0, end=7440),
    dict(
        type='CosineAnnealingLR', begin=7440, by_epoch=False, end=59520)
]

# runtime settings
train_cfg = dict(type='IterBasedTrainLoop', max_iters=59520, val_interval=1488)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')

# default hooks - logger & checkpoint configs
default_hooks = dict(

    # print log every 100 iterations.
    logger=dict(type='LoggerHook', interval=100, log_metric_by_epoch=False),

    # enable the parameter scheduler.
    param_scheduler=dict(type='ParamSchedulerHook'),

    # save checkpoint every 5 epochs.
    checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=7440),
)

# tensorboard vis
vis_backends = [dict(type='LocalVisBackend'),
                dict(type='TensorboardVisBackend')]

# data preprocessing
norm_cfg = dict(type='SyncBN', requires_grad=True)
crop_size = (512, 512)
data_preprocessor = dict(
    type='SegDataPreProcessor',
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    bgr_to_rgb=True,
    pad_val=0,
    seg_pad_val=255,
    size=crop_size)

model = dict(
    type='EncoderDecoder',
    pretrained='torchvision://resnet18',
    backbone=dict(
        type='ResNet',
        depth=18,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        dilations=(1, 1, 1, 1),
        strides=(1, 2, 2, 2),
        norm_cfg=norm_cfg,
        norm_eval=False,
        style='pytorch',
        contract_dilation=True),
    decode_head=dict(
        type='FCNGenHead',
        in_channels=[256, 512],
        input_transform='multiple_select',
        in_index=[2, 3],
        channels=512,
        num_convs=0,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=150,
        norm_cfg=norm_cfg,
        align_corners=True,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    # model training and testing settings
    train_cfg=dict(),
    test_cfg=dict(mode='whole'),
    infer_wo_softmax=True)

and ../_base_/datasets/ade20k_scene_parsing.py is:

# dataset settings
dataset_type = 'ADE20KDataset'
data_root = '/base/datasets/ade20k_scene_parsing'
crop_size = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=True),
    dict(
        type='RandomResize',
        scale=(2048, 512),
        ratio_range=(0.5, 2.0),
        keep_ratio=True),
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='PackSegInputs')
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', scale=(2048, 512), keep_ratio=True),
    dict(type='CenterCrop', crop_size=crop_size),
    # add loading annotation after ``Resize`` because ground truth
    # does not need to do resize data transform
    dict(type='LoadAnnotations', reduce_zero_label=True),
    dict(type='PackSegInputs')
]
img_ratios = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
tta_pipeline = [
    dict(type='LoadImageFromFile', backend_args=None),
    dict(
        type='TestTimeAug',
        transforms=[
            [
                dict(type='Resize', scale_factor=r, keep_ratio=True)
                for r in img_ratios
            ],
            [dict(type='CenterCrop', crop_size=crop_size)],
            [
                dict(type='RandomFlip', prob=0., direction='horizontal'),
                dict(type='RandomFlip', prob=1., direction='horizontal')
            ], [dict(type='LoadAnnotations')], [dict(type='PackSegInputs')]
        ])
]
train_dataloader = dict(
    batch_size=24,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(type='InfiniteSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        data_prefix=dict(
            img_path='images/training', seg_map_path='annotations/training'),
        pipeline=train_pipeline))
val_dataloader = dict(
    batch_size=1,
    num_workers=4,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        data_prefix=dict(
            img_path='images/validation',
            seg_map_path='annotations/validation'),
        pipeline=test_pipeline))
test_dataloader = val_dataloader

val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
test_evaluator = val_evaluator

The command I used to generate the .onnx file is:

python tools/pytorch2onnx.py mmsegmentation/configs/fcn/fcn8_r18_ade20k_hailo.py --checkpoint output/iter_7440.pth --shape 512 512 --out_name output/fcn8-ade20k.onnx --work-dir output

fcn_ade20k.yaml is:

base:
- base/base.yaml
info:
  source: external
preprocessing:
  network_type: segmentation
  meta_arch: fcn_resnet
postprocessing:
  device_pre_post_layers:
    softmax: false
    argmax: true
    bilinear: true
    nms: false
  ext_upsample: 8
parser:
  nodes:
  - zero_padding2d/Pad
  - ArgMax
  normalization_params:
    normalize_in_net: true
    std_list:
    - 58.395
    - 57.12
    - 57.375
    mean_list:
    - 123.675
    - 116.28
    - 103.53
evaluation:
  classes: 150
  dataset_name: ade20k

Moreover, note that the .har file is correctly generated (as you can see from the log)

Hi @andrea.bonvini,

This is a known issue and is currently being worked on by the R&D.
As a workaround, use these alls commands when running the optimization step:

logits_layer1 = logits_layer(fcn8_resnet_v1_18_ade20k/resize1, argmax, -1, cpu)
resize1_output_reshape = format_conversion(resize1, f8cr_to_hailo_rgb)

Regards,

Thanks @Omer for your fast response,
If I change my fcn8_resnet_v1_18.alls script
from

 normalization1 = normalization([123.675, 116.28, 103.53], [58.395, 57.12, 57.375])
model_optimization_config(calibration, batch_size=1, calibset_size=64)

to

normalization1 = normalization([123.675, 116.28, 103.53], [58.395, 57.12, 57.375])
model_optimization_config(calibration, batch_size=1, calibset_size=64)
logits_layer1 = logits_layer(fcn8_resnet_v1_18_ade20k/resize1, argmax, -1, cpu)
resize1_output_reshape = format_conversion(resize1, f8cr_to_hailo_rgb)

I obtain the following error message:

<Hailo Model Zoo INFO> Start run for network fcn8_resnet_v1_18_ade20k ...
<Hailo Model Zoo INFO> Initializing the hailo8 runner...
[info] Translation started on ONNX model fcn8_resnet_v1_18_ade20k
[info] Restored ONNX model fcn8_resnet_v1_18_ade20k (completion time: 00:00:01.09)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:01.42)
[info] Start nodes mapped from original model: 'test_input': 'fcn8_resnet_v1_18_ade20k/input_layer1'.
[info] End nodes mapped from original model: 'ArgMax_82'.
[info] Translation completed on ONNX model fcn8_resnet_v1_18_ade20k (completion time: 00:00:01.92)
[info] Saved HAR to: /my/project/fcn8_resnet_v1_18_ade20k.har
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to fcn8_resnet_v1_18_ade20k from /my/project/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/fcn8_resnet_v1_18-bugfix.alls
[info] Appending model script commands to fcn8_resnet_v1_18_ade20k from string
Traceback (most recent call last):
  File "/my/venv//bin/hailomz", line 33, in <module>
    sys.exit(load_entry_point('hailo-model-zoo', 'console_scripts', 'hailomz')())
  File "/my/project/hailo_model_zoo/hailo_model_zoo/main.py", line 122, in main
    run(args)
  File "/my/project/hailo_model_zoo/hailo_model_zoo/main.py", line 111, in run
    return handlers[args.command](args)
  File "/my/project/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 250, in compile
    _ensure_optimized(runner, logger, args, network_info)
  File "/my/project/hailo_model_zoo/hailo_model_zoo/main_driver.py", line 91, in _ensure_optimized
    optimize_model(
  File "/my/project/hailo_model_zoo/hailo_model_zoo/core/main_utils.py", line 319, in optimize_model
    optimize_full_precision_model(runner, calib_feed_callback, logger, model_script, resize, input_conversion, classes)
  File "/my/project/hailo_model_zoo/hailo_model_zoo/core/main_utils.py", line 305, in optimize_full_precision_model
    runner.optimize_full_precision(calib_data=calib_feed_callback)
  File "/lib/python3.10/site-packages/hailo_sdk_common/states/states.py", line 16, in wrapped_func
    return func(self, *args, **kwargs)
  File "/my/venv//lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 1597, in optimize_full_precision
    self._optimize_full_precision(calib_data=calib_data, data_type=data_type)
  File "/my/venv//lib/python3.10/site-packages/hailo_sdk_client/runner/client_runner.py", line 1600, in _optimize_full_precision
    self._sdk_backend.optimize_full_precision(calib_data=calib_data, data_type=data_type)
  File "/my/venv//lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1284, in optimize_full_precision
    model, params = self._apply_model_modification_commands(model, params, update_model_and_params)
  File "/my/venv//lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/sdk_backend.py", line 1200, in _apply_model_modification_commands
    model, params = command.apply(model, params, hw_consts=self.hw_arch.consts)
  File "/my/venv//lib/python3.10/site-packages/hailo_sdk_client/sdk_backend/script_parser/model_modifications_commands.py", line 492, in apply
    hailo_nn = logits_layers_adder.add_logits_layers()
  File "/my/venv//lib/python3.10/site-packages/hailo_sdk_client/tools/logits_layer_addition.py", line 47, in add_logits_layers
    FuserHelper.add_logits_as_postprocess_layer_to_hn(self._hn, self._layers, self._activation_type, self._axis,
  File "/my/venv//lib/python3.10/site-packages/hailo_sdk_client/tools/fuser/fuser_helper.py", line 70, in add_logits_as_postprocess_layer_to_hn
    raise UnsupportedPostprocessLayerError(
hailo_sdk_client.model_translator.exceptions.UnsupportedPostprocessLayerError: Unable to find output layer for layer fcn8_resnet_v1_18_ade20k/resize1 when trying to add fcn8_resnet_v1_18_ade20k/logits_layer1

I’m not quite sure this is the right way to apply the workaround though.

Hi @andrea.bonvini,
I see. If you can please contact me via email at omerw@hailo.ai and provide the model, I’ll have a look at it and try to resolve the issue.

Regards,