Removing power to Hailo-8 module to reduce power consumption

Hello,

In my project running imx6 processor, I need to have minimum possible power consumption (under 20mW) when the Hailo-8 M.2 module is not used. Removing power seems to be the only way to get to the power requirements I need.
When applying power back to the module and releasing the reset signal, the system is able to detect the hailo-8 module.

[   32.641536] pci 0000:01:00.0: [1e60:2864] type 00 class 0x0b4000
[   32.648276] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit pref]
[   32.655984] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00000fff 64bit pref]
[   32.664285] pci 0000:01:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[   32.674706] pci 0000:01:00.0: PME# supported from D3hot
[   32.680648] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[   32.721266] pci 0000:01:00.0: BAR 0: assigned [mem 0x01100000-0x01103fff 64bit pref]
[   32.729432] pci 0000:01:00.0: BAR 4: assigned [mem 0x01104000-0x01107fff 64bit pref]
[   32.737571] pci 0000:01:00.0: BAR 2: assigned [mem 0x01108000-0x01108fff 64bit pref]

When loading the hailo_pci.ko module, the kernel crashes after loading the firmware (I’ve added some prints in the hailo_pci driver to help me track the progress).

[   38.073130] hailo: Init module. driver version 4.21.0
[   38.082784] hailo 0000:01:00.0: Probing on: 1e60:2864...
[   38.089193] hailo 0000:01:00.0: Probing: Allocate memory for device extension, 12344
[   38.097806] hailo 0000:01:00.0: Enabling bridge
[   38.102789] hailo 0000:01:00.0: Enable device: BARS mask 0x15
[   38.108935] hailo 0000:01:00.0: enabling device (0000 -> 0002)
[   38.115393] hailo 0000:01:00.0: Probing: Device enabled
[   38.121247] hailo 0000:01:00.0: Probing: mapped bar 0 - (ptrval) 16384
[   38.128265] hailo 0000:01:00.0: Probing: mapped bar 2 - (ptrval) 4096
[   38.135160] hailo 0000:01:00.0: Probing: mapped bar 4 - (ptrval) 16384
[   38.142093] hailo 0000:01:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[   38.151296] hailo 0000:01:00.0: Probing: Enabled 64 bit dma
[   38.157350] hailo 0000:01:00.0: Probing: Using userspace allocated vdma buffers
[   38.165183] hailo 0000:01:00.0: Disabling ASPM L0s
[   38.170448] hailo 0000:01:00.0: Successfully disabled ASPM L0s
[   38.178226] hailo 0000:01:00.0: Writing file hailo/hailo8_fw.bin
[   38.431746] hailo 0000:01:00.0: File hailo/hailo8_fw.bin written successfully
[   38.439533] hailo 0000:01:00.0: Writing file hailo/hailo8_board_cfg.bin
[   38.447596] hailo 0000:01:00.0: Error -2. Ignore non-mandatory file hailo/hailo8_board_cfg.bin
[   38.457111] hailo 0000:01:00.0: Writing file hailo/hailo8_fw_cfg.bin
[   38.464162] hailo 0000:01:00.0: Error -2. Ignore non-mandatory file hailo/hailo8_fw_cfg.bin
[   38.473083] hailo 0000:01:00.0: Triggering firmware boot...
[   38.479090] hailo 0000:01:00.0: Firmware triggered
[   43.520782] hailo 0000:01:00.0: Firmware completion timeout, getting boot status...
[   43.528671] 8<--- cut here ---
[   43.531751] Unhandled fault: imprecise external abort (0x1406) at 0x7f000000
[   43.538855] [7f000000] *pgd=1263b811, *pte=126250cf, *ppte=1262521e
[   43.545225] Internal error: : 1406 [#1] PREEMPT ARM
[   43.550160] Modules linked in: hailo_pci(O)
...

The call stack:

[   43.924605]  hailo_resource_read_buffer [hailo_pci] from read_memory+0xc0/0x1c4 [hailo_pci]
[   43.933279]  read_memory [hailo_pci] from hailo_get_boot_status+0x48/0x70 [hailo_pci]
[   43.941352]  hailo_get_boot_status [hailo_pci] from hailo_activate_board+0x1054/0x15e0 [hailo_pci]
[   43.950573]  hailo_activate_board [hailo_pci] from hailo_pcie_probe+0x618/0x828 [hailo_pci]
[   43.959166]  hailo_pcie_probe [hailo_pci] from pci_device_probe+0x8c/0x118
[   43.966312]  pci_device_probe from really_probe+0xc8/0x2ec
[   43.971914]  really_probe from __driver_probe_device+0x88/0x19c
[   43.977891]  __driver_probe_device from driver_probe_device+0x30/0x104
[   43.984481]  driver_probe_device from __driver_attach_async_helper+0x48/0x98
[   43.991587]  __driver_attach_async_helper from async_run_entry_fn+0x1c/0xd0
[   43.998632]  async_run_entry_fn from process_one_work+0x1bc/0x3ec
[   44.004821]  process_one_work from worker_thread+0x84/0x5dc
[   44.010450]  worker_thread from kthread+0xd0/0x100
[   44.015335]  kthread from ret_from_fork+0x14/0x28

By debugging through printk, I determined that after the firmware is loaded, the driver sends activate command and waits for firmware to boot by calling wait_for_firmware_completion(). The call times out and the driver call hailo_get_boot_status(). That function ends up calling ioread32() and that’s where it crashes.

hailo_pci.ko driver loads fine after the cold boot but fails after cycling power/reset line to the module (GPIO controlled).

Any Ideas on how to go make it work?
Thank you.

Hey @user01,

Welcome to the community!

So you’re trying to hit that sub-20 mW budget when your i.MX6 isn’t actively using the Hailo-8 M.2 module - yeah, you’re going to need to physically cut power to that thing. Software alone just isn’t going to get you there.

The software power management (D3 states, ASPM, etc.) might save you a few hundred milliwatts if you’re lucky, but to reliably get under 20 mW you’re going to need to implement board-level power gating. Basically you need to be able to completely cut the 3.3V rail to the module when you don’t need it.

This is the correct way to remove and reload:

# Remove the device from the PCI bus
echo -n "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/remove

# Let things settle
sleep 0.01

# Force a rescan (this resets and retrains the link)
echo 1 > /sys/bus/pci/rescan

# Now load your driver
modprobe hailo_pci

This forces the kernel to do a proper reset and link retrain before your driver even touches the hardware.

Hope that helps clarify things!

The problem is that the driver fails to load if I add remove power into the process. After unloading hailo driver and removing the device (echo 1 > /sys/bus/…/remove), I remove power using GPIO-controlled regulator. When the power is applied back and rescan is issued, the device appears to be recognized. When I to load hailo_pci driver, the driver appears to load hailo firmware into the module but times out during call to wait_for_firmware_completion()

    if (!wait_for_firmware_completion(&board->fw_boot.fw_loaded_completion, hailo_pcie_get_loading_stage_info(board->pcie_resources.board_type, FIRST_STAGE)\
->timeout)) {
        boot_status = hailo_get_boot_status(&board->pcie_resources);
        hailo_dev_err(dev, "Timeout waiting for NNC firmware file, boot status %u\n", boot_status);
        return -ETIMEDOUT;
    }

Here is the terminal printout:

# echo 1 > /sys/bus/pci/rescan
pci 0000:01:00.0: BAR 0: assigned [mem 0x01100000-0x01103fff 64bit pref]
pci 0000:01:00.0: BAR 4: assigned [mem 0x01104000-0x01107fff 64bit pref]
pci 0000:01:00.0: BAR 2: assigned [mem 0x01108000-0x01108fff 64bit pref]
# insmod /lib/modules/hailo_pci.ko
hailo: Init module. driver version 4.20.1
hailo 0000:01:00.0: Probing on: 1e60:2864...
hailo 0000:01:00.0: Probing: Allocate memory for device extension, 12344
hailo 0000:01:00.0: enabling device (0000 -> 0002)
hailo 0000:01:00.0: Probing: Device enabled
hailo 0000:01:00.0: Probing: mapped bar 0 - a0c10000 16384
hailo 0000:01:00.0: Probing: mapped bar 2 - a08ee000 4096
hailo 0000:01:00.0: Probing: mapped bar 4 - a0c18000 16384
hailo 0000:01:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
hailo 0000:01:00.0: Probing: Enabled 64 bit dma
hailo 0000:01:00.0: Probing: Using specialized dma_ops=arm_dma_ops
hailo 0000:01:00.0: Probing: Using userspace allocated vdma buffers
hailo 0000:01:00.0: Disabling ASPM L0s
hailo 0000:01:00.0: Successfully disabled ASPM L0s
hailo 0000:01:00.0: Writing file hailo/hailo8_fw.bin
hailo 0000:01:00.0: File hailo/hailo8_fw.bin written successfully
hailo 0000:01:00.0: Writing file hailo/hailo8_board_cfg.bin
Failed to write file hailo/hailo8_board_cfg.bin
hailo 0000:01:00.0: File hailo/hailo8_board_cfg.bin written successfully
hailo 0000:01:00.0: Writing file hailo/hailo8_fw_cfg.bin
Failed to write file hailo/hailo8_fw_cfg.bin
hailo 0000:01:00.0: File hailo/hailo8_fw_cfg.bin written successfully
hailo 0000:01:00.0: Timeout waiting for NNC firmware file, boot status 0
hailo 0000:01:00.0: FW loaded, took 5140 ms
hailo 0000:01:00.0: Firmware load failed
hailo 0000:01:00.0: Failed activating board -110
hailo: probe of 0000:01:00.0 failed with error -110
#

Any other tricks to try?