hailo_pci.ko: UBSAN: array-index-out-of-bounds

I observe message from the sanitizer about undefined behavior (out of bounds) when the module is activated and the firmware is written to the device. My test-system is an Ubuntu 24.04, further infos see below.

Regards, Jörg

dmesg:

[ 40.051253] hailo: Init module. driver version 4.18.0
[ 40.051342] hailo 0000:06:00.0: Probing on: 1e60:2864…
[ 40.051347] hailo 0000:06:00.0: Probing: Allocate memory for device extension, 11632
[ 40.051361] hailo 0000:06:00.0: enabling device (0000 → 0002)
[ 40.051581] hailo 0000:06:00.0: Probing: Device enabled
[ 40.051624] hailo 0000:06:00.0: Probing: mapped bar 0 - 000000000c259acd 16384
[ 40.051636] hailo 0000:06:00.0: Probing: mapped bar 2 - 000000002cb4db7e 4096
[ 40.051644] hailo 0000:06:00.0: Probing: mapped bar 4 - 00000000f0cea0d2 16384
[ 40.051650] hailo 0000:06:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[ 40.051729] hailo 0000:06:00.0: Probing: Enabled 64 bit dma
[ 40.051731] hailo 0000:06:00.0: Probing: Using specialized dma_ops=iommu_dma_ops
[ 40.051738] hailo 0000:06:00.0: Probing: Using userspace allocated vdma buffers
[ 40.051743] hailo 0000:06:00.0: Disabling ASPM L0s
[ 40.051775] hailo 0000:06:00.0: Successfully disabled ASPM L0s
[ 40.054091] ------------[ cut here ]------------
[ 40.054097] UBSAN: array-index-out-of-bounds in /share/opt/hailo/linux/pcie/…/…/common/pcie_common.c:473:53
[ 40.054106] index 840 is out of range for type ‘u8 [*]’
[ 40.054110] CPU: 6 PID: 2345 Comm: insmod Tainted: G OE 6.8.0-38-generic #38-Ubuntu
[ 40.054116] Hardware name: Undefined Undefined/MSC C6C-RLP-1365URE ES1, BIOS X1.10c BETA 05/17/2024
[ 40.054119] Call Trace:
[ 40.054122]
[ 40.054127] dump_stack_lvl+0x76/0xa0
[ 40.054139] dump_stack+0x10/0x20
[ 40.054143] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 40.054152] hailo_write_app_firmware+0x100/0x110 [hailo_pci]
[ 40.054177] hailo_pcie_write_firmware+0x71/0x160 [hailo_pci]
[ 40.054196] hailo_activate_board+0x465/0x6f0 [hailo_pci]
[ 40.054216] hailo_pcie_probe+0x360/0xa30 [hailo_pci]
[ 40.054232] local_pci_probe+0x44/0xb0
[ 40.054238] pci_call_probe+0x55/0x1a0
[ 40.054243] pci_device_probe+0x84/0x120
[ 40.054248] really_probe+0x1c4/0x410
[ 40.054255] __driver_probe_device+0x8c/0x180
[ 40.054260] driver_probe_device+0x24/0xd0
[ 40.054266] __driver_attach+0x10b/0x210
[ 40.054272] ? __pfx___driver_attach+0x10/0x10
[ 40.054278] bus_for_each_dev+0x8a/0xf0
[ 40.054282] driver_attach+0x1e/0x30
[ 40.054287] bus_add_driver+0x156/0x260
[ 40.054292] driver_register+0x5e/0x130
[ 40.054299] __pci_register_driver+0x5e/0x70
[ 40.054303] hailo_pcie_module_init+0x74/0xff0 [hailo_pci]
[ 40.054320] ? __pfx_hailo_pcie_module_init+0x10/0x10 [hailo_pci]
[ 40.054335] do_one_initcall+0x5b/0x340
[ 40.054344] do_init_module+0xc0/0x2c0
[ 40.054350] load_module+0xba1/0xcf0
[ 40.054354] ? security_kernel_post_read_file+0x75/0x90
[ 40.054361] init_module_from_file+0x96/0x100
[ 40.054365] ? init_module_from_file+0x96/0x100
[ 40.054370] idempotent_init_module+0x11c/0x2b0
[ 40.054375] __x64_sys_finit_module+0x64/0xd0
[ 40.054380] x64_sys_call+0x1d6e/0x25c0
[ 40.054384] do_syscall_64+0x7f/0x180
[ 40.054390] ? syscall_exit_to_user_mode+0x89/0x260
[ 40.054395] ? do_syscall_64+0x8c/0x180
[ 40.054399] ? irqentry_exit_to_user_mode+0x7e/0x260
[ 40.054404] ? irqentry_exit+0x43/0x50
[ 40.054408] ? exc_page_fault+0x94/0x1b0
[ 40.054412] entry_SYSCALL_64_after_hwframe+0x78/0x80
[ 40.054419] RIP: 0033:0x71fa7c52725d
[ 40.054447] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[ 40.054451] RSP: 002b:00007ffc1e3af2d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 40.054457] RAX: ffffffffffffffda RBX: 00005d0a90bf5720 RCX: 000071fa7c52725d
[ 40.054460] RDX: 0000000000000000 RSI: 00005d0a9098ce52 RDI: 0000000000000003
[ 40.054463] RBP: 00007ffc1e3af390 R08: 0000000000000040 R09: 00005d0a9098d0b0
[ 40.054466] R10: 000071fa7c603b20 R11: 0000000000000246 R12: 00005d0a9098ce52
[ 40.054468] R13: 0000000000000000 R14: 00005d0a90bf9690 R15: 0000000000000000
[ 40.054473]
[ 40.054474] —[ end trace ]-
[ 40.229775] hailo 0000:06:00.0: Firmware was loaded successfully
[ 40.242546] hailo 0000:06:00.0: Probing: Added board 1e60-2864, /dev/hailo0

robot@robot-Undefined:~$ modinfo ./hailo_pci.ko
filename: /home/robot/./hailo_pci.ko
version: 4.18.0
license: GPL v2
description: Hailo PCIe driver
author: Hailo Technologies Ltd.
import_ns: DMA_BUF
srcversion: 7789EF8A5BB8A490BF16983
alias: pci:v00001E60d000043A2svsdbcsci*
alias: pci:v00001E60d000045C4svsdbcsci*
alias: pci:v00001E60d00002864svsdbcsci*
depends:
retpoline: Y
name: hailo_pci
vermagic: 6.8.0-38-generic SMP preempt mod_unload modversions
parm: o_dbg:int
parm: no_power_mode:Disables automatic D0->D3 PCIe transactions (invbool)
parm: force_allocation_from_driver:Determines whether to force buffer allocation from driver or userspace (int)
parm: force_desc_page_size:Determines the maximum DMA descriptor page size (must be a power of 2) (int)
parm: force_hailo15_legacy_mode:Forces work with Hailo15 in legacy mode(relevant for emulators) (bool)
robot@robot-Undefined:~$ cat /etc/issue
Ubuntu 24.04 LTS \n \l

robot@robot-Undefined:~$ uname -a
Linux robot-Undefined 6.8.0-38-generic #38-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 7 15:25:01 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Welcome to the Hailo Community!

Ubuntu 24 is not yet supported. It is currently planned for later this year.

Hello Klaus,

this is not an operating system issue, rather a vulnerability issue in hailos pci-driver programming.

I also see these critical warnings on Yocto 4.3 (aka nanbield) with kernel 6.6.

Kind regards,
Jörg

Hi Klaus,
I found a strange declaration in fw_validation.h, line 44:

typedef struct {
u32 key_size;
u32 content_size;
u8 certificates_data[0];
} secure_boot_certificate_t;

Defining a data array with size zero is not valid within C as fart as I know. Even if the compiler does not complain, other checks could find this fault. This line should be:
u8 certificates_data;
as a declaration of a variable length array is wanted.

The structure is assigned to a part of the firmware file to access the contents inside the firmware. The access to the data is currently just checked against max length values but not against the real size/end of the firmware file. This is a major security issue as an access behind the loaded firmware data is possible.
I think the declaration has to be corrected as well as additional file data end check inside the function
int FW_VALIDATION__validate_cert_header(uintptr_t firmware_base_address,
size_t firmware_size, u32 *outer_consumed_firmware_offset, secure_boot_certificate_t **out_firmware_cert)
located in fw_validation.c, starting with line 87

This is not an issue of Ubuntu or Kernel, it is a driver mistake in general an a possible reason for buffer overflow attacks. It is just now reported as the security checks will be better and better…

Thank you for the information. I have created a JIRA ticket for our R&D engineers to investigate.