NVIDIA: Difference between revisions
→Multiple boot configurations: Add reference to the nixos-hardware batterySaver specialisation |
|||
| (10 intermediate revisions by 7 users not shown) | |||
| Line 10: | Line 10: | ||
{{Note|<code>hardware.graphics.enable</code> was named <code>hardware.opengl.enable</code> '''until NixOS 24.11'''.}} | {{Note|<code>hardware.graphics.enable</code> was named <code>hardware.opengl.enable</code> '''until NixOS 24.11'''.}} | ||
{{Note|Since driver version 560, you also will need to decide whether to use the open-source or proprietary modules by setting the <code>hardware.nvidia.open</code> option to either <code>true</code> or <code>false</code> respectively.<br><br>Open-source kernel modules are preferred over and planned to steadily replace proprietary modules<ref>https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/</ref>, although they only support GPUs of the Turing architecture or newer (from GeForce RTX 20 series and GeForce GTX 16 series onwards). Data center GPUs starting from Grace Hopper or | {{Note|Since driver version 560, you also will need to decide whether to use the open-source or proprietary modules by setting the <code>hardware.nvidia.open</code> option to either <code>true</code> or <code>false</code> respectively.<br><br>Open-source kernel modules are preferred over and planned to steadily replace proprietary modules<ref>https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/</ref>, although they only support GPUs of the Turing architecture or newer (from GeForce RTX 20 series and GeForce GTX 16 series onwards). Data center GPUs starting from Grace Hopper or Blackwell must use open-source modules — proprietary modules are no longer supported.<br><br>Make sure to allow [[Unfree software|unfree software]] even when using the open module as the user space part of the driver is still proprietary. Other unfree NVIDIA packages include <code>nvidia-x11</code>, <code>nvidia-settings</code>, and <code>nvidia-persistenced</code>. | ||
}}{{Warning|If you use a laptop with both dedicated and integrated GPUs, remember to [[#Hybrid_graphics_with_PRIME|configure PRIME]] in order to make your dedicated NVIDIA GPU work properly with your integrated GPU. Your configuration '''might not work''' if you skip this step.}}{{file|configuration.nix|nix|<nowiki> | }}{{Warning|If you use a laptop with both dedicated and integrated GPUs, remember to [[#Hybrid_graphics_with_PRIME|configure PRIME]] in order to make your dedicated NVIDIA GPU work properly with your integrated GPU. Your configuration '''might not work''' if you skip this step.}}{{file|configuration.nix|nix|<nowiki> | ||
{ | { | ||
| Line 63: | Line 63: | ||
==== Common setup ==== | ==== Common setup ==== | ||
All PRIME configurations require setting the PCI bus IDs of the two GPUs. One easy way to do find their IDs is by running <code>lspci</code> from the <code>pciutils</code> package, and then finding devices that are classified as VGA controllers. After double checking that the listed devices are indeed your integrated and dedicated GPUs, you can then find the PCI IDs at the beginning of each line. Exact results may vary, but an example output might look like:<syntaxhighlight lang="console"> | All PRIME configurations require setting the PCI bus IDs of the two GPUs. One easy way to do find their IDs is by running <code>lspci</code> from the <code>pciutils</code> package, and then finding devices that are classified as VGA controllers. After double checking that the listed devices are indeed your integrated and dedicated GPUs, you can then find the PCI IDs at the beginning of each line. Exact results may vary, but an example output might look like:<syntaxhighlight lang="console"> | ||
$ nix shell nixpkgs#pciutils -c lspci -d ::03xx | $ nix shell nixpkgs#pciutils -c lspci -D -d ::03xx | ||
0000:00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01) | 0000:00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01) | ||
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1) | 0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1) | ||
</syntaxhighlight>Before putting them into your configuration, however, '''they must first be reformatted''' — | </syntaxhighlight>Before putting them into your configuration, however, '''they must first be reformatted''' — assuming the bus address is <code><domain>:<bus>:<device>.<func></code>, convert all numbers from hexadecimal to decimal, then the formatted string is <code>PCI:<bus>@<domain>:<device>:<func></code>. They can be set under <code>intelBusId</code>, <code>nvidiaBusId</code>, or <code>amdgpuBusId</code> in <code>hardware.nvidia.prime</code>, depending on the manufacturer of the GPU:{{file|configuration.nix|nix|<nowiki> | ||
{ | { | ||
hardware.nvidia.prime = { | hardware.nvidia.prime = { | ||
intelBusId = "PCI:0:2:0"; | intelBusId = "PCI:0@0:2:0"; | ||
nvidiaBusId = "PCI:1:0:0"; | nvidiaBusId = "PCI:1@0:0:0"; | ||
#amdgpuBusId = "PCI: | # amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU | ||
}; | }; | ||
} | } | ||
| Line 95: | Line 95: | ||
offload.enable = true; | offload.enable = true; | ||
intelBusId = "PCI:0:2:0"; | intelBusId = "PCI:0@0:2:0"; | ||
nvidiaBusId = "PCI:1:0:0"; | nvidiaBusId = "PCI:1@0:0:0"; | ||
#amdgpuBusId = "PCI: | # amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU | ||
}; | }; | ||
} | } | ||
| Line 118: | Line 118: | ||
sync.enable = true; | sync.enable = true; | ||
intelBusId = "PCI:0:2:0"; | intelBusId = "PCI:0@0:2:0"; | ||
nvidiaBusId = "PCI:1:0:0"; | nvidiaBusId = "PCI:1@0:0:0"; | ||
#amdgpuBusId = "PCI: | # amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU | ||
}; | }; | ||
} | } | ||
| Line 135: | Line 135: | ||
reverseSync.enable = true; | reverseSync.enable = true; | ||
intelBusId = "PCI:0:2:0"; | intelBusId = "PCI:0@0:2:0"; | ||
nvidiaBusId = "PCI:1:0:0"; | nvidiaBusId = "PCI:1@0:0:0"; | ||
#amdgpuBusId = "PCI: | # amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU | ||
}; | }; | ||
} | } | ||
</nowiki>}} | </nowiki>}} | ||
=== Wayland === | |||
==== Requirements ==== | |||
Wayland requires kernel mode setting (KMS) to be enabled (Highly Recommended): | |||
{{file|/etc/nixos/configuration.nix|nix|<nowiki> | |||
{ | |||
hardware.nvidia.modesetting.enable = true; | |||
} | |||
</nowiki>}} | |||
==== Supported Compositors ==== | |||
* '''GNOME (Wayland)''' | |||
Fully supported on recent drivers (≥ 535 recommended, ≥ 555 strongly recommended). | |||
* '''KDE Plasma (Wayland)''' | |||
Usable since Plasma 6 with recent NVIDIA drivers, though some issues may remain. | |||
* '''Hyprland''' | |||
Generally works with recent NVIDIA drivers, but support is not officially guaranteed. Regressions may occur after driver or compositor updates. | |||
==== PRIME and Wayland ==== | |||
* PRIME '''sync''' and '''reverse sync''' modes are '''X11-only''' and do not work under Wayland. | |||
* PRIME '''offload''' works under Wayland, but application offloading may behave differently depending on the compositor. | |||
==== Explict Sync ==== | |||
Drivers ≥ 555 introduce explicit sync support, which greatly improves frame pacing and reduces flickering and stuttering under Wayland. For the best Wayland experience, recent NVIDIA drivers are strongly recommended. | |||
== Tips and tricks == | == Tips and tricks == | ||
| Line 173: | Line 203: | ||
After rebuilding and rebooting, you'll see in your boot menu under each Generation an "on-the-go" option, which will let you boot into the on-to-go specialisation for that generation. | After rebuilding and rebooting, you'll see in your boot menu under each Generation an "on-the-go" option, which will let you boot into the on-to-go specialisation for that generation. | ||
See also the [https://github.com/NixOS/nixos-hardware/blob/master/common/gpu/nvidia/prime.nix nixos-hardware] implementation of a similar idea. | |||
=== Using GPUs on non-NixOS === | === Using GPUs on non-NixOS === | ||
If you're using Nix-packaged software on a non-NixOS system, you'll need a workaround to get everything up-and-running. The [https://github.com/guibou/nixGL nixGL project] provides wrapper to use GL drivers on non-NixOS systems. You need to have GPU drivers installed on your distro (for kernel modules). With nixGL installed, you'll run <code>nixGL foobar</code> instead of <code>foobar</code>. | If you're using Nix-packaged software on a non-NixOS system, you'll need a workaround to get everything up-and-running. The [https://github.com/guibou/nixGL nixGL project] provides wrapper to use GL drivers on non-NixOS systems. You need to have GPU drivers installed on your distro (for kernel modules). With nixGL installed, you'll run <code>nixGL foobar</code> instead of <code>foobar</code>. | ||
| Line 180: | Line 211: | ||
=== CUDA and using your GPU for compute === | === CUDA and using your GPU for compute === | ||
See the [[CUDA]] wiki page. | See the [[CUDA]] wiki page. | ||
=== Multi-Process Service (MPS) === | |||
[https://docs.nvidia.com/deploy/mps/index.html NVIDIA Multi-Process Service (MPS)] allows multiple CUDA processes to share a single GPU context. NixOS does not provide a dedicated module for MPS, so a custom systemd service is required: | |||
{{file|configuration.nix|nix|<nowiki> | |||
{ config, pkgs, ... }: | |||
{ | |||
systemd.services.nvidia-mps = { | |||
description = "NVIDIA CUDA Multi-Process Service"; | |||
after = [ "nvidia-persistenced.service" ]; | |||
requires = [ "nvidia-persistenced.service" ]; | |||
wantedBy = [ "multi-user.target" ]; | |||
path = [ config.hardware.nvidia.package.bin ]; | |||
serviceConfig = { | |||
Type = "forking"; | |||
ExecStart = "${config.hardware.nvidia.package.bin}/bin/nvidia-cuda-mps-control -d"; | |||
ExecStop = "${pkgs.writeShellScript "nvidia-mps-stop" '' | |||
echo quit | ${config.hardware.nvidia.package.bin}/bin/nvidia-cuda-mps-control | |||
''}"; | |||
Restart = "on-failure"; | |||
RestartSec = 5; | |||
}; | |||
}; | |||
} | |||
</nowiki>}} | |||
{{Warning|The <code>path</code> option is required. The MPS control daemon uses <code>execlp</code> to spawn <code>nvidia-cuda-mps-server</code>, which must be in the service's <code>PATH</code>. Without it, the daemon appears to start normally but silently fails to spawn the server process. CUDA clients will receive Error 805 (<code>cudaErrorMpsConnectionFailed</code>).}} | |||
To use MPS from [[Docker]] containers, the MPS pipe directory must be mounted and the host IPC namespace must be shared: | |||
<syntaxhighlight lang="yaml"> | |||
services: | |||
gpu-worker: | |||
ipc: host | |||
volumes: | |||
- /tmp/nvidia-mps:/tmp/nvidia-mps | |||
environment: | |||
CUDA_MPS_PIPE_DIRECTORY: /tmp/nvidia-mps | |||
</syntaxhighlight> | |||
=== Running Specific NVIDIA Driver Versions === | === Running Specific NVIDIA Driver Versions === | ||
| Line 239: | Line 310: | ||
<code>hardware.nvidia.powerManagement.enable = true</code> can also sometimes fix this issue; it is <code>false</code> by default. | <code>hardware.nvidia.powerManagement.enable = true</code> can also sometimes fix this issue; it is <code>false</code> by default. | ||
{{Note|When the <code>hardware.nvidia.powerManagement.enable</code> option is enabled, the driver saves video memory to <code>/tmp</code> by default. If <code>/tmp</code> is backed by tmpfs (RAM) and the GPU VRAM usage exceeds the available space, the system will not resume and you will see a blank screen instead. | |||
To resolve this, redirect the temporary file to a storage location with sufficient capacity (e.g., <code>/var/tmp</code>) using kernel parameters: | |||
{{file|configuration.nix|nix|<nowiki> | |||
boot.kernelParams = [ "nvidia.NVreg_TemporaryFilePath=/var/tmp" ]; | |||
</nowiki>}} | |||
}} | |||
If you have a modern NVIDIA GPU (Turing [https://en.wikipedia.org/wiki/Turing_(microarchitecture)#Products_using_Turing] or later), you may also want to investigate the <code>hardware.nvidia.powerManagement.finegrained</code> option: [https://download.nvidia.com/XFree86/Linux-x86_64/460.73.01/README/dynamicpowermanagement.html] | If you have a modern NVIDIA GPU (Turing [https://en.wikipedia.org/wiki/Turing_(microarchitecture)#Products_using_Turing] or later), you may also want to investigate the <code>hardware.nvidia.powerManagement.finegrained</code> option: [https://download.nvidia.com/XFree86/Linux-x86_64/460.73.01/README/dynamicpowermanagement.html] | ||
A potential fix that Interrupts the gnome-shell in time so it’s not trying to access the graphics hardware. <ref>https://discourse.nixos.org/t/suspend-resume-cycling-on-system-resume/32322/12</ref> The entire purpose is to manually "pause" the GNOME Shell process just before the system sleeps and "un-pause" it just after the system wakes up. | [https://discourse.nixos.org/t/suspend-resume-cycling-on-system-resume/32322/12 A potential fix] that Interrupts the gnome-shell in time so it’s not trying to access the graphics hardware. <ref>https://discourse.nixos.org/t/suspend-resume-cycling-on-system-resume/32322/12</ref> The entire purpose is to manually "pause" the GNOME Shell process just before the system sleeps and "un-pause" it just after the system wakes up. | ||
<hr> | |||
If you have graphical corruption upon waking from suspend, and the above causes the system to go back to sleep ~20-30 seconds after wakeup, the following may solve both issues: | |||
{{File|3={ | |||
# https://discourse.nixos.org/t/black-screen-after-suspend-hibernate-with-nvidia/54341/6 | |||
# https://discourse.nixos.org/t/suspend-problem/54033/28 | |||
systemd = { | |||
# Uncertain if this is still required or not. | |||
services.systemd-suspend.environment.SYSTEMD_SLEEP_FREEZE_USER_SESSIONS = "false"; | |||
services."gnome-suspend" = { | |||
description = "suspend gnome shell"; | |||
before = [ | |||
"systemd-suspend.service" | |||
"systemd-hibernate.service" | |||
"nvidia-suspend.service" | |||
"nvidia-hibernate.service" | |||
]; | |||
wantedBy = [ | |||
"systemd-suspend.service" | |||
"systemd-hibernate.service" | |||
]; | |||
serviceConfig = { | |||
Type = "oneshot"; | |||
ExecStart = ''${pkgs.procps}/bin/pkill -f -STOP ${pkgs.gnome-shell}/bin/gnome-shell''; | |||
}; | |||
}; | |||
services."gnome-resume" = { | |||
description = "resume gnome shell"; | |||
after = [ | |||
"systemd-suspend.service" | |||
"systemd-hibernate.service" | |||
"nvidia-resume.service" | |||
]; | |||
wantedBy = [ | |||
"systemd-suspend.service" | |||
"systemd-hibernate.service" | |||
]; | |||
serviceConfig = { | |||
Type = "oneshot"; | |||
ExecStart = ''${pkgs.procps}/bin/pkill -f -CONT ${pkgs.gnome-shell}/bin/gnome-shell''; | |||
}; | |||
}; | |||
}; | |||
# https://discourse.nixos.org/t/black-screen-after-suspend-hibernate-with-nvidia/54341/23 | |||
hardware.nvidia.powerManagement.enable = true; | |||
}|name=configuration.nix|lang=nix}} | |||
=== Black screen or 'nothing works' on laptops === | === Black screen or 'nothing works' on laptops === | ||
| Line 253: | Line 382: | ||
boot.kernelParams = [ "module_blacklist=amdgpu" ]; | boot.kernelParams = [ "module_blacklist=amdgpu" ]; | ||
</syntaxHighlight> | </syntaxHighlight> | ||
=== NVIDIA Docker Containers === | |||
See: [[Docker#NVIDIA Docker Containers]] | |||
== Disabling == | == Disabling == | ||