The hailo-dfc docker script fails to recognize non VGA-compatible gpus like H100/L40S

Hi teams.

I found that he hailo-dfc docker script (hailo_ai_sw_suite_docker_run.sh) won’t recognize non VGA-compatible gpus like L40S/H100.

Cause

The original script tries to detect gpu availability by this one liner script; readonly NVIDIA_GPU_EXIST=$(lspci | grep "VGA compatible controller: NVIDIA"), but this captures only VGA-compatible gpus like RTX/GTX series gpus because the lspci outputs of non VGA-compatible gpus are like below;

18:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
2a:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)

Workaround

I replaced that line with this and resolved the problem;

readonly NVIDIA_GPU_EXIST=$(nvidia-smi &> /dev/null && echo true || echo false)

The above utilizes the result of nvidia-smi command to check the availability of nvidia gpus (I think this is a more general solution than the original grep-based one). Please consider to use my idea if you preffer. thanks.

NOTE: the workaround is only available when nvidia-smi is available.

One more note; you need to recreate the container by hailo_ai_sw_suite_docker_run.sh --override. Please backup data before recreating the container.

hey @mineto_tsukada

It looks like the Hailo-DFC Docker script (hailo_ai_sw_suite_docker_run.sh) is failing to detect non-VGA compatible GPUs like H100 and L40S. This happens because the original script relies on:

readonly NVIDIA_GPU_EXIST=$(lspci | grep "VGA compatible controller: NVIDIA")

This command only detects VGA-compatible GPUs (like the RTX/GTX series). However, high-performance GPUs such as H100 and L40S show up differently in the lspci output, like this:

18:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
2a:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)

Workaround

You can resolve the issue by modifying the script to use nvidia-smi, which provides a more reliable way to detect NVIDIA GPUs. Replace the original line with:

readonly NVIDIA_GPU_EXIST=$(nvidia-smi &> /dev/null && echo true || echo false)

This change ensures that the script checks for GPU availability using nvidia-smi.


Things to Keep in Mind

  1. Ensure nvidia-smi is installed:

    • The workaround will only work if NVIDIA drivers and nvidia-smi are installed and accessible.
  2. Recreate the Docker Container:

    • After modifying the script, you need to recreate the container by running:
      ./hailo_ai_sw_suite_docker_run.sh --override
      
    • Backup your data before recreating the container to avoid data loss.

This solution provides a more general method to detect both VGA-compatible and non-VGA-compatible GPUs. Let me know if this resolves the issue or if you need further assistance!