Hi everyone, I’m trying to run simple model with multiple inputs at HAILO8 M.2. Trying 2 different scenarios: def forward(self, x1, x2, x3): ...
and def forward(self, x: Dict[str, torch,Tensor]): ...
. In both cases I receive same error during model compile:
Error msg
[error] Mapping Failed (allocation time: 0s)
No successful assignment for: concat1[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: No successful assignment for: concat1
Full logs:
[info] Translation started on ONNX model MyModelWithDictInput
[info] Restored ONNX model MyModelWithDictInput (completion time: 00:00:00.00)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.03)
[info] Start nodes mapped from original model: ‘onnx::Gemm_0’: ‘MyModelWithDictInput/input_layer1’, ‘onnx::Gemm_1’: ‘MyModelWithDictInput/input_layer2’, ‘onnx::Gemm_2’: ‘MyModelWithDictInput/input_layer3’.
[info] End nodes mapped from original model: ‘/linear31/Gemm’, ‘/linear32/Gemm’.
[info] Translation completed on ONNX model MyModelWithDictInput (completion time: 00:00:00.09)
[info] Saved HAR to: /local/shared_with_docker/tpu_investigation/har_models/MyModelWithDictInput_hailo_model.har
Model successfully saved to har: /local/shared_with_docker/tpu_investigation/har_models/MyModelWithDictInput_hailo_model.har
[optimize_model] started …
[prepare_calibration_dataset] started …
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won’t be optimized and compression won’t be used) because there’s less data than the recommended amount (1024), and there’s no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.08)
[info] Layer Norm Decomposition skipped
[info] Starting Stats Collector
[info] Using dataset with 50 entries for calibration
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:03<00:00, 13.11entries/s]
[info] Stats Collector is done (completion time is 00:00:04.08)
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] matmul_equalization skipped
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Fine Tune skipped
[info] Layer Noise Analysis skipped
[info] The calibration set seems to not be normalized, because the values range is [(0.0006018692, 0.97930187), (0.028501753, 0.9929861), (0.0004638245, 0.92580736)].
Since the neural core works in 8-bit (between 0 to 255), a quantization will occur on the CPU of the runtime platform.
Add a normalization layer to the model to offload the normalization to the neural core.
Refer to the user guide Hailo Dataflow Compiler user guide / Model Optimization / Optimization Related Model Script Commands / model_modification_commands / normalization for details.
[info] Model Optimization is done
[info] Saved HAR to: /local/shared_with_docker/tpu_investigation/har_models/MyModelWithDictInput_hailo_model_quantized_model.har
[info] To achieve optimal performance, set the compiler_optimization_level to “max” by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[error] Mapping Failed (allocation time: 0s)
No successful assignment for: concat1[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: No successful assignment for: concat1
[info] Translation started on ONNX model MyModelWithMultipleInputs
[info] Restored ONNX model MyModelWithMultipleInputs (completion time: 00:00:00.00)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.03)
[info] Start nodes mapped from original model: ‘x1’: ‘MyModelWithMultipleInputs/input_layer1’, ‘x2’: ‘MyModelWithMultipleInputs/input_layer2’, ‘x3’: ‘MyModelWithMultipleInputs/input_layer3’.
[info] End nodes mapped from original model: ‘/linear31/Gemm’, ‘/linear32/Gemm’.
[info] Translation completed on ONNX model MyModelWithMultipleInputs (completion time: 00:00:00.10)
[info] Saved HAR to: /local/shared_with_docker/tpu_investigation/har_models/MyModelWithMultipleInputs_hailo_model.har
Model successfully saved to har: /local/shared_with_docker/tpu_investigation/har_models/MyModelWithMultipleInputs_hailo_model.har
[optimize_model] started …
[prepare_calibration_dataset] started …
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won’t be optimized and compression won’t be used) because there’s less data than the recommended amount (1024), and there’s no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.08)
[info] Layer Norm Decomposition skipped
[info] Starting Stats Collector
[info] Using dataset with 50 entries for calibration
Calibration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:04<00:00, 12.36entries/s]
[info] Stats Collector is done (completion time is 00:00:04.30)
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] matmul_equalization skipped
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Fine Tune skipped
[info] Layer Noise Analysis skipped
[info] The calibration set seems to not be normalized, because the values range is [(0.08511114, 0.989071), (0.040190928, 0.9438765), (0.0030021686, 0.9975469)].
Since the neural core works in 8-bit (between 0 to 255), a quantization will occur on the CPU of the runtime platform.
Add a normalization layer to the model to offload the normalization to the neural core.
Refer to the user guide Hailo Dataflow Compiler user guide / Model Optimization / Optimization Related Model Script Commands / model_modification_commands / normalization for details.
[info] Model Optimization is done
[info] Saved HAR to: /local/shared_with_docker/tpu_investigation/har_models/MyModelWithMultipleInputs_hailo_model_quantized_model.har
[info] To achieve optimal performance, set the compiler_optimization_level to “max” by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[error] Mapping Failed (allocation time: 0s)
No successful assignment for: concat1[error] Failed to produce compiled graph
[error] BackendAllocatorException: Compilation failed: No successful assignment for: concat1
My code:
import sys
PROJECT_DIR_PATH = '/local/shared_with_docker/tpu_investigation/'
sys.path.append(PROJECT_DIR_PATH)
from typing import Dict
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import Linear, Module
import numpy as np
import onnx
from hailo_sdk_client import ClientRunner
import numpy as np
from hailo_sdk_client.exposed_definitions import CalibrationDataType
class MyModelWithMultipleInputs(Module):
def __init__(self):
super().__init__()
self.linear11 = Linear(3, 8)
self.linear12 = Linear(5, 8)
self.linear13 = Linear(10, 8)
self.linear2 = Linear(24, 128)
self.linear31 = Linear(128, 1)
self.linear32 = Linear(128, 1)
def forward(self, x1, x2, x3):
z1 = self.linear11(x1)
z2 = self.linear12(x2)
z3 = self.linear13(x3)
z = torch.cat((z1, z2, z3), dim=1)
z = F.relu(z)
z = F.relu(self.linear2(z))
return self.linear31(z), self.linear32(z)
class MyModelWithDictInput(Module):
def __init__(self):
super().__init__()
self.linear11 = Linear(3, 8)
self.linear12 = Linear(5, 8)
self.linear13 = Linear(10, 8)
self.linear2 = Linear(24, 128)
self.linear31 = Linear(128, 1)
self.linear32 = Linear(128, 1)
def forward(self, batch: Dict[str, torch.Tensor]):
z1 = self.linear11(batch['x1'])
z2 = self.linear12(batch['x2'])
z3 = self.linear13(batch['x3'])
z = torch.cat((z1, z2, z3), dim=1)
z = F.relu(z)
z = F.relu(self.linear2(z))
return self.linear31(z), self.linear32(z)
def parse_model_to_har(onnx_path, onnx_model_name):
print(f'[parse_model_to_har] started ...')
hailo_model_har_name = f"{onnx_model_name}_hailo_model"
path_to_har = os.path.join(os.path.join(PROJECT_DIR_PATH, "har_models"), hailo_model_har_name + '.har')
if os.path.exists(path_to_har):
print(f'Model [HAR] exist at {path_to_har}. Skip ...')
return path_to_har, hailo_model_har_name
chosen_hw_arch = "hailo8"
runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_onnx_model(
onnx_path,
onnx_model_name
)
runner.save_har(path_to_har)
print(f'Model successfully saved to har: {path_to_har}')
return path_to_har, hailo_model_har_name
def prepare_calibration_dataset_multiple_inputs(model_name, batch_size=50):
print(f'[prepare_calibration_dataset] started ...')
x1 = np.random.rand(batch_size, 3)
x2 = np.random.rand(batch_size, 5)
x3 = np.random.rand(batch_size, 10)
calib_data = {
f'{model_name}/input_layer1': x1,
f'{model_name}/input_layer2': x2,
f'{model_name}/input_layer3': x3
}
return calib_data
def optimize_model(model_name, har_path, har_model_name):
print(f'[optimize_model] started ...')
har_quantized_model_name = har_model_name + '_quantized_model.har'
path_to_quantized_har = os.path.join(os.path.join(PROJECT_DIR_PATH, "har_models"), har_quantized_model_name)
if os.path.exists(path_to_quantized_har):
print(f'Model [HAR] exist at {path_to_quantized_har}. Skip ...')
return path_to_quantized_har, har_quantized_model_name
runner = ClientRunner(har=har_path)
calib_dataset = prepare_calibration_dataset_multiple_inputs(model_name)
runner.optimize(calib_dataset, data_type=CalibrationDataType.np_array)
runner.save_har(path_to_quantized_har)
del calib_dataset
return path_to_quantized_har, har_quantized_model_name
def compile_model_to_hef(har_path, har_model_name):
hef_model_name = har_model_name + '.hef'
path_to_hef = os.path.join(os.path.join(PROJECT_DIR_PATH, "hef_models"), hef_model_name)
if os.path.exists(path_to_hef):
print(f'Model [HEF] exist at {path_to_hef}. Skip ...')
return path_to_hef, hef_model_name
runner = ClientRunner(har=har_path)
hef = runner.compile()
with open(path_to_hef, "wb") as f:
f.write(hef)
har_model_name = har_model_name + '_compiled_model.har'
path_to_compiled_har = os.path.join(os.path.join(PROJECT_DIR_PATH, "har_models"), har_model_name)
runner.save_har(path_to_compiled_har)
return path_to_hef, hef_model_name
def onnx_to_hef_dict():
model = MyModelWithDictInput()
batch = {
'x1' : torch.randn(1, 3),
'x2' : torch.randn(1, 5),
'x3' : torch.randn(1, 10),
}
y = model(batch)
print(f'[MyModelWithDictInput] y: {y}')
path_to_onnx = os.path.join(PROJECT_DIR_PATH, "onnx_models/MyModelWithDictInput.onnx")
torch_model_name = 'MyModelWithDictInput'
onnx_model = torch.onnx.export(
model,
{"batch": batch},
path_to_onnx,
verbose=True,
)
onnx_model = onnx.load(path_to_onnx)
onnx.checker.check_model(onnx_model)
input_layer_names = [input.name for input in onnx_model.graph.input]
# Print the input layer names
print("Input Layer Names:", input_layer_names)
# path_to_onnx, torch_model_name = export_to_onnx(policy)
path_to_har, hailo_model_har_name = parse_model_to_har(path_to_onnx, torch_model_name)
path_to_quantized_har, har_quantized_model_name = optimize_model('MyModelWithDictInput', path_to_har, hailo_model_har_name)
path_to_hef, hef_model_name = compile_model_to_hef(path_to_quantized_har, har_quantized_model_name)
def onnx_to_hef_multiple_inputs():
model = MyModelWithMultipleInputs()
dummy_x = (torch.randn(1, 3), torch.randn(1, 5), torch.randn(1, 10))
y = model(*dummy_x)
print(y)
path_to_onnx = os.path.join(PROJECT_DIR_PATH, "onnx_models/MyModelWithMultipleInputs.onnx")
torch_model_name = 'MyModelWithMultipleInputs'
onnx_model = torch.onnx.export(
model,
dummy_x,
path_to_onnx,
verbose=True,
input_names=["x1", "x2", "x3"],
output_names=["y1", "y2"],
)
onnx_model = onnx.load(path_to_onnx)
onnx.checker.check_model(onnx_model)
input_layer_names = [input.name for input in onnx_model.graph.input]
# Print the input layer names
print("Input Layer Names:", input_layer_names)
# path_to_onnx, torch_model_name = export_to_onnx(policy)
path_to_har, hailo_model_har_name = parse_model_to_har(path_to_onnx, torch_model_name)
path_to_quantized_har, har_quantized_model_name = optimize_model('MyModelWithMultipleInputs', path_to_har, hailo_model_har_name)
path_to_hef, hef_model_name = compile_model_to_hef(path_to_quantized_har, har_quantized_model_name)
if __name__ == "__main__":
onnx_to_hef_multiple_inputs()