Command:
onnx_path = f"fcn8-{dataset_name}_simplify.onnx"
calibration_images_path = "ot-segmentation-data/datasets/ade20k/images/validation"
yaml_path = f"hailo_model_zoo/hailo_model_zoo/cfg/networks/fcn8_resnet_v1_18_{dataset_name}.yaml"
command = (f"hailomz compile --ckp {onnx_path} --calib-path {calibration_images_path} --resize 512 512"
f" --hw-arch hailo8 --yaml {yaml_path}")
os.sytem(command)
Output:
<Hailo Model Zoo INFO> Start run for network fcn8_resnet_v1_18_ade20k ...
<Hailo Model Zoo INFO> Initializing the hailo8 runner...
[info] Translation started on ONNX model fcn8_resnet_v1_18_ade20k
[info] Restored ONNX model fcn8_resnet_v1_18_ade20k (completion time: 00:00:07.31)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:07.68)
[info] Start nodes mapped from original model: 'test_input': 'fcn8_resnet_v1_18_ade20k/input_layer1'.
[info] End nodes mapped from original model: 'ArgMax_82'.
[info] Translation completed on ONNX model fcn8_resnet_v1_18_ade20k (completion time: 00:00:08.21)
[info] Saved HAR to: /output_folder/fcn8_resnet_v1_18_ade20k.har
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to fcn8_resnet_v1_18_ade20k from /output_folder/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/fcn8_resnet_v1_18.alls
[info] Appending model script commands to fcn8_resnet_v1_18_ade20k from string
[info] Mapping the argmax layer argmax1 from the neural core to CPU due to availability of resources
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.02)
[info] create_layer_norm skipped
[info] Starting Stats Collector
[info] Using dataset with 10 entries for calibration
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:27<00:00, 2.78s/entries]
[info] Stats Collector is done (completion time is 00:00:29.42)
[info] Bias Correction skipped
[info] Adaround skipped
[info] Fine Tune skipped
[info] Layer Noise Analysis skipped
[info] Model Optimization is done
[info] Saved HAR to: /output_folder/fcn8_resnet_v1_18_ade20k.har
[info] Loading model script commands to fcn8_resnet_v1_18_ade20k from /output_folder/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/fcn8_resnet_v1_18.alls
[info] Adding an output layer after resize1
[info] Loading network parameters
[warning] Output order different size
[info] Starting Hailo allocation and compilation flow
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
Validating context_0 layer by layer (100%)
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
● Finished
[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/0 Iteration 0: Mapping prepost...
Context:0/0 Iteration 0: Mapping prepost... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
Context:0/0 Iteration 4: Trying parallel mapping... cluster_4 cluster_5 cluster_6 cluster_7 prepost
cluster_0 cluster_1 cluster_2 cluster_3 cluster_4 cluster_5 cluster_6 cluster_7 prepost
worker0 V * V V V * V V V
worker1
worker2
worker3
02:51 on cluster mapping: 0ectivity: 0
02:53 on cluster mapping: 0ectivity: 0
Reverts on cluster mapping: 0ectivity: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 12.5% | 10.9% | 34.4% |
[info] | cluster_2 | 56.3% | 51.6% | 93% |
[info] | cluster_3 | 6.3% | 6.3% | 30.5% |
[info] | cluster_4 | 6.3% | 7.8% | 29.7% |
[info] | cluster_6 | 81.3% | 70.3% | 93% |
[info] | cluster_7 | 68.8% | 84.4% | 59.4% |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total | 28.9% | 28.9% | 42.5% |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 3m 30s)
[info] Compiling context_0...
[info] Bandwidth of model inputs: 6.0 Mbps, outputs: 300.0 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Building HEF...
invalid input format 4 in edge fcn8_resnet_v1_18_ade20k/resize1 in op fcn8_resnet_v1_18_ade20k/argmax_logits_postprocess1
[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: invalid input format 4 in edge fcn8_resnet_v1_18_ade20k/resize1 in op fcn8_resnet_v1_18_ade20k/argmax_logits_postprocess1
How I retrained the model:
From fcn-retraining docker I run
tools/dist_train.sh configs/fcn/fcn8_r18_ade20k_hailo.py 2;
where fcn8_r18_ade20k_hailo.py
is:
# model settings
_base_ = [
'./fcn_r18_ade20k_hailo.py',
]
model = dict(
decode_head=dict(
in_channels=[128, 256, 512],
in_index=[1, 2, 3],
),
)
and fcn_r18_ade20k_hailo.py
is:
# model settings
_base_ = [
'../_base_/datasets/ade20k_scene_parsing.py', '../_base_/default_runtime.py',
]
# optimizer
optimizer = dict(type='Adam', lr=0.001, weight_decay=1e-5)
optim_wrapper = dict(type='OptimWrapper', optimizer=optimizer, clip_grad=None)
# learning policy
param_scheduler = [
dict(
type='LinearLR', start_factor=0.2, by_epoch=False, begin=0, end=7440),
dict(
type='CosineAnnealingLR', begin=7440, by_epoch=False, end=59520)
]
# runtime settings
train_cfg = dict(type='IterBasedTrainLoop', max_iters=59520, val_interval=1488)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
# default hooks - logger & checkpoint configs
default_hooks = dict(
# print log every 100 iterations.
logger=dict(type='LoggerHook', interval=100, log_metric_by_epoch=False),
# enable the parameter scheduler.
param_scheduler=dict(type='ParamSchedulerHook'),
# save checkpoint every 5 epochs.
checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=7440),
)
# tensorboard vis
vis_backends = [dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')]
# data preprocessing
norm_cfg = dict(type='SyncBN', requires_grad=True)
crop_size = (512, 512)
data_preprocessor = dict(
type='SegDataPreProcessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_val=0,
seg_pad_val=255,
size=crop_size)
model = dict(
type='EncoderDecoder',
pretrained='torchvision://resnet18',
backbone=dict(
type='ResNet',
depth=18,
num_stages=4,
out_indices=(0, 1, 2, 3),
dilations=(1, 1, 1, 1),
strides=(1, 2, 2, 2),
norm_cfg=norm_cfg,
norm_eval=False,
style='pytorch',
contract_dilation=True),
decode_head=dict(
type='FCNGenHead',
in_channels=[256, 512],
input_transform='multiple_select',
in_index=[2, 3],
channels=512,
num_convs=0,
concat_input=False,
dropout_ratio=0.1,
num_classes=150,
norm_cfg=norm_cfg,
align_corners=True,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
# model training and testing settings
train_cfg=dict(),
test_cfg=dict(mode='whole'),
infer_wo_softmax=True)
and ../_base_/datasets/ade20k_scene_parsing.py
is:
# dataset settings
dataset_type = 'ADE20KDataset'
data_root = '/base/datasets/ade20k_scene_parsing'
crop_size = (512, 512)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', reduce_zero_label=True),
dict(
type='RandomResize',
scale=(2048, 512),
ratio_range=(0.5, 2.0),
keep_ratio=True),
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
dict(type='RandomFlip', prob=0.5),
dict(type='PhotoMetricDistortion'),
dict(type='PackSegInputs')
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(2048, 512), keep_ratio=True),
dict(type='CenterCrop', crop_size=crop_size),
# add loading annotation after ``Resize`` because ground truth
# does not need to do resize data transform
dict(type='LoadAnnotations', reduce_zero_label=True),
dict(type='PackSegInputs')
]
img_ratios = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
tta_pipeline = [
dict(type='LoadImageFromFile', backend_args=None),
dict(
type='TestTimeAug',
transforms=[
[
dict(type='Resize', scale_factor=r, keep_ratio=True)
for r in img_ratios
],
[dict(type='CenterCrop', crop_size=crop_size)],
[
dict(type='RandomFlip', prob=0., direction='horizontal'),
dict(type='RandomFlip', prob=1., direction='horizontal')
], [dict(type='LoadAnnotations')], [dict(type='PackSegInputs')]
])
]
train_dataloader = dict(
batch_size=24,
num_workers=4,
persistent_workers=True,
sampler=dict(type='InfiniteSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
img_path='images/training', seg_map_path='annotations/training'),
pipeline=train_pipeline))
val_dataloader = dict(
batch_size=1,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_prefix=dict(
img_path='images/validation',
seg_map_path='annotations/validation'),
pipeline=test_pipeline))
test_dataloader = val_dataloader
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
test_evaluator = val_evaluator
The command I used to generate the .onnx file is:
python tools/pytorch2onnx.py mmsegmentation/configs/fcn/fcn8_r18_ade20k_hailo.py --checkpoint output/iter_7440.pth --shape 512 512 --out_name output/fcn8-ade20k.onnx --work-dir output
fcn_ade20k.yaml
is:
base:
- base/base.yaml
info:
source: external
preprocessing:
network_type: segmentation
meta_arch: fcn_resnet
postprocessing:
device_pre_post_layers:
softmax: false
argmax: true
bilinear: true
nms: false
ext_upsample: 8
parser:
nodes:
- zero_padding2d/Pad
- ArgMax
normalization_params:
normalize_in_net: true
std_list:
- 58.395
- 57.12
- 57.375
mean_list:
- 123.675
- 116.28
- 103.53
evaluation:
classes: 150
dataset_name: ade20k
Moreover, note that the .har
file is correctly generated (as you can see from the log)