hi @KlausK
Many thanks for the prompt reply, here are the requested information:
Model from hailo_model_zoo
hailortcli parse-hef yolov8s_hmz.hef
Architecture HEF was compiled for: HAILO8L
Network group name: yolov8s, Multi Context - Number of contexts: 3
Network name: yolov8s/yolov8s
VStream infos:
Input yolov8s/input_layer1 UINT8, NHWC(640x640x3)
Output yolov8s/yolov8_nms_postprocess FLOAT32, HAILO NMS BY CLASS(number of classes: 80, maximum bounding boxes per class: 100, maximum frame size: 160320)
Operation:
Op YOLOV8
Name: YOLOV8-Post-Process
Score threshold: 0.200
IoU threshold: 0.70
Classes: 80
Max bboxes per class: 100
Image height: 640
Image width: 640
Our model(we have only one object class):
hailortcli parse-hef yolov8s_max_optimization.hef
Architecture HEF was compiled for: HAILO8L
Network group name: yolov8s, Multi Context - Number of contexts: 3
Network name: yolov8s/yolov8s
VStream infos:
Input yolov8s/input_layer1 UINT8, NHWC(640x640x3)
Output yolov8s/yolov8_nms_postprocess FLOAT32, HAILO NMS BY CLASS(number of classes: 1, maximum bounding boxes per class: 100, maximum frame size: 2004)
Operation:
Op YOLOV8
Name: YOLOV8-Post-Process
Score threshold: 0.200
IoU threshold: 0.70
Classes: 1
Max bboxes per class: 100
Image height: 640
Image width: 640
I see the difference in maximum frame size - I don’t know what does that mean?
Here are the machine details, but that shouldn’t matter because I am comparing both models on the same machine?
02:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
Physical Slot: 0-2
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at 380800004000 (64-bit, prefetchable) \[size=16K\]
Region 2: Memory at 380800008000 (64-bit, prefetchable) \[size=4K\]
Region 4: Memory at 380800000000 (64-bit, prefetchable) \[size=16K\]
Capabilities: \[80\] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x2 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: \[e0\] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: \[f8\] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: \[100 v1\] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
Capabilities: \[108 v1\] Latency Tolerance Reporting
Max snoop latency: 15728640ns
Max no snoop latency: 15728640ns
Capabilities: \[200 v2\] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 04000001 0000000f 86f80000 54f14c95
Kernel driver in use: hailo
Kernel modules: hailo_pci
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 14
On-line CPU(s) list: 0-13
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) Ultra 7 265K
CPU family: 6
Model: 198
Thread(s) per core: 1
Core(s) per socket: 14
Socket(s): 1
Stepping: 2
BogoMIPS: 7756.43
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcn
t tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rd
seed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni wbnoinvd arat umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid bus_lock_detect movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 448 KiB (14 instances)
L1i: 448 KiB (14 instances)
L2: 56 MiB (14 instances)
L3: 16 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-13
Vulnerabilities:
Gather data sampling: Not affected
Indirect target selection: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS Not affected; BHI BHI_DIS_S
Srbds: Not affected
Tsa: Not affected
Tsx async abort: Not affected
Vmscape: Not affected
Maybe the difference is in the YOLOv8 export itself? Could you tell me how to generate profiler results - these are included in hailo model zoo, I could compare the networks and look for the differences there.
Best regards,
Chuck