NVIDIA: Difference between revisions

(3 intermediate revisions by 3 users not shown)

Line 10:

{{Note|<code>hardware.graphics.enable</code> was named <code>hardware.opengl.enable</code> '''until NixOS 24.11'''.}}

{{Note|Since driver version 560, you also will need to decide whether to use the open-source or proprietary modules by setting the <code>hardware.nvidia.open</code> option to either <code>true</code> or <code>false</code> respectively. Open-source kernel modules are preferred over and planned to steadily replace proprietary modules<ref>https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/</ref>, although they only support GPUs of the Turing architecture or newer (from GeForce RTX 20 series and GeForce GTX 16 series onwards). Data center GPUs starting from Grace Hopper or ~~Blackwall~~ must use open-source modules — proprietary modules are no longer supported. Make sure to allow [[Unfree software|unfree software]] even when using the open module as the user space part of the driver is still proprietary. Other unfree NVIDIA packages include <code>nvidia-x11</code>, <code>nvidia-settings</code>, and <code>nvidia-persistenced</code>.

{{Note|Since driver version 560, you also will need to decide whether to use the open-source or proprietary modules by setting the <code>hardware.nvidia.open</code> option to either <code>true</code> or <code>false</code> respectively. Open-source kernel modules are preferred over and planned to steadily replace proprietary modules<ref>https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/</ref>, although they only support GPUs of the Turing architecture or newer (from GeForce RTX 20 series and GeForce GTX 16 series onwards). Data center GPUs starting from Grace Hopper or Blackwell must use open-source modules — proprietary modules are no longer supported. Make sure to allow [[Unfree software|unfree software]] even when using the open module as the user space part of the driver is still proprietary. Other unfree NVIDIA packages include <code>nvidia-x11</code>, <code>nvidia-settings</code>, and <code>nvidia-persistenced</code>.

}}{{Warning|If you use a laptop with both dedicated and integrated GPUs, remember to [[#Hybrid_graphics_with_PRIME|configure PRIME]] in order to make your dedicated NVIDIA GPU work properly with your integrated GPU. Your configuration '''might not work''' if you skip this step.}}{{file|configuration.nix|nix|<nowiki>

{

Line 66:

0000:00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01)

0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)

</syntaxhighlight>Before putting them into your configuration, however, '''they must first be reformatted''' — ~~take~~ the ~~last three numbers~~, convert ~~them~~ from hexadecimal to decimal, ~~remove~~ the ~~leading zeroes, concatenate them with colons, and then add a~~ <code>PCI:</code> ~~prefix~~. ~~Then, they~~ can be set under <code>intelBusId</code>, <code>nvidiaBusId</code>, or <code>amdgpuBusId</code> in <code>hardware.nvidia.prime</code>, depending on the manufacturer of the GPU:{{file|configuration.nix|nix|<nowiki>

</syntaxhighlight>Before putting them into your configuration, however, '''they must first be reformatted''' — assuming the bus address is <code><domain>:<bus>:<device>.<func></code>, convert all numbers from hexadecimal to decimal, then the formatted string is <code>PCI:<bus>@<domain>:<device>:<func></code>. They can be set under <code>intelBusId</code>, <code>nvidiaBusId</code>, or <code>amdgpuBusId</code> in <code>hardware.nvidia.prime</code>, depending on the manufacturer of the GPU:{{file|configuration.nix|nix|<nowiki>

{

hardware.nvidia.prime = {

intelBusId = "PCI:0:2:0";

intelBusId = "PCI:0@0:2:0";

nvidiaBusId = "PCI:1:0:0";

nvidiaBusId = "PCI:1@0:0:0";

#amdgpuBusId = "PCI:54:0:0"; # If you have an AMD iGPU

# amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU

};

}

Line 95:

offload.enable = true;

intelBusId = "PCI:0:2:0";

intelBusId = "PCI:0@0:2:0";

nvidiaBusId = "PCI:1:0:0";

nvidiaBusId = "PCI:1@0:0:0";

#amdgpuBusId = "PCI:54:0:0"; # If you have an AMD iGPU

# amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU

};

}

Line 118:

sync.enable = true;

intelBusId = "PCI:0:2:0";

intelBusId = "PCI:0@0:2:0";

nvidiaBusId = "PCI:1:0:0";

nvidiaBusId = "PCI:1@0:0:0";

#amdgpuBusId = "PCI:54:0:0"; # If you have an AMD iGPU

# amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU

};

}

Line 135:

reverseSync.enable = true;

intelBusId = "PCI:0:2:0";

intelBusId = "PCI:0@0:2:0";

nvidiaBusId = "PCI:1:0:0";

nvidiaBusId = "PCI:1@0:0:0";

#amdgpuBusId = "PCI:54:0:0"; # If you have an AMD iGPU

# amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU

};

}

Line 203:

After rebuilding and rebooting, you'll see in your boot menu under each Generation an "on-the-go" option, which will let you boot into the on-to-go specialisation for that generation.

See also the [https://github.com/NixOS/nixos-hardware/blob/master/common/gpu/nvidia/prime.nix nixos-hardware] implementation of a similar idea.

=== Using GPUs on non-NixOS ===

If you're using Nix-packaged software on a non-NixOS system, you'll need a workaround to get everything up-and-running. The [https://github.com/guibou/nixGL nixGL project] provides wrapper to use GL drivers on non-NixOS systems. You need to have GPU drivers installed on your distro (for kernel modules). With nixGL installed, you'll run <code>nixGL foobar</code> instead of <code>foobar</code>.

Line 210:

Line 211:

=== CUDA and using your GPU for compute ===

See the [[CUDA]] wiki page.

=== Multi-Process Service (MPS) ===

[https://docs.nvidia.com/deploy/mps/index.html NVIDIA Multi-Process Service (MPS)] allows multiple CUDA processes to share a single GPU context. NixOS does not provide a dedicated module for MPS, so a custom systemd service is required:

{{file|configuration.nix|nix|<nowiki>

{ config, pkgs, ... }:

{

systemd.services.nvidia-mps = {

description = "NVIDIA CUDA Multi-Process Service";

after = [ "nvidia-persistenced.service" ];

requires = [ "nvidia-persistenced.service" ];

wantedBy = [ "multi-user.target" ];

path = [ config.hardware.nvidia.package.bin ];

serviceConfig = {

Type = "forking";

ExecStart = "${config.hardware.nvidia.package.bin}/bin/nvidia-cuda-mps-control -d";

ExecStop = "${pkgs.writeShellScript "nvidia-mps-stop" ''

echo quit | ${config.hardware.nvidia.package.bin}/bin/nvidia-cuda-mps-control

''}";

Restart = "on-failure";

RestartSec = 5;

};

}

</nowiki>}}

{{Warning|The <code>path</code> option is required. The MPS control daemon uses <code>execlp</code> to spawn <code>nvidia-cuda-mps-server</code>, which must be in the service's <code>PATH</code>. Without it, the daemon appears to start normally but silently fails to spawn the server process. CUDA clients will receive Error 805 (<code>cudaErrorMpsConnectionFailed</code>).}}

To use MPS from [[Docker]] containers, the MPS pipe directory must be mounted and the host IPC namespace must be shared:

services:

gpu-worker:

ipc: host

volumes:

- /tmp/nvidia-mps:/tmp/nvidia-mps

environment:

CUDA_MPS_PIPE_DIRECTORY: /tmp/nvidia-mps

</syntaxhighlight>

=== Running Specific NVIDIA Driver Versions ===

@@ Line 10: / Line 10: @@
 {{Note|<code>hardware.graphics.enable</code> was named <code>hardware.opengl.enable</code> '''until NixOS 24.11'''.}}
-{{Note|Since driver version 560, you also will need to decide whether to use the open-source or proprietary modules by setting the <code>hardware.nvidia.open</code> option to either <code>true</code> or <code>false</code> respectively.<br><br>Open-source kernel modules are preferred over and planned to steadily replace proprietary modules<ref>https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/</ref>, although they only support GPUs of the Turing architecture or newer (from GeForce RTX 20 series and GeForce GTX 16 series onwards). Data center GPUs starting from Grace Hopper or Blackwall must use open-source modules — proprietary modules are no longer supported.<br><br>Make sure to allow [[Unfree software|unfree software]] even when using the open module as the user space part of the driver is still proprietary. Other unfree NVIDIA packages include <code>nvidia-x11</code>, <code>nvidia-settings</code>, and <code>nvidia-persistenced</code>.
+{{Note|Since driver version 560, you also will need to decide whether to use the open-source or proprietary modules by setting the <code>hardware.nvidia.open</code> option to either <code>true</code> or <code>false</code> respectively.<br><br>Open-source kernel modules are preferred over and planned to steadily replace proprietary modules<ref>https://developer.nvidia.com/blog/nvidia-transitions-fully-towards-open-source-gpu-kernel-modules/</ref>, although they only support GPUs of the Turing architecture or newer (from GeForce RTX 20 series and GeForce GTX 16 series onwards). Data center GPUs starting from Grace Hopper or Blackwell must use open-source modules — proprietary modules are no longer supported.<br><br>Make sure to allow [[Unfree software|unfree software]] even when using the open module as the user space part of the driver is still proprietary. Other unfree NVIDIA packages include <code>nvidia-x11</code>, <code>nvidia-settings</code>, and <code>nvidia-persistenced</code>.
 }}{{Warning|If you use a laptop with both dedicated and integrated GPUs, remember to [[#Hybrid_graphics_with_PRIME|configure PRIME]] in order to make your dedicated NVIDIA GPU work properly with your integrated GPU. Your configuration '''might not work''' if you skip this step.}}{{file|configuration.nix|nix|<nowiki>
 {
@@ Line 66: / Line 66: @@
 :00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01)
 :01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)
-</syntaxhighlight>Before putting them into your configuration, however, '''they must first be reformatted''' — take the last three numbers, convert them from hexadecimal to decimal, remove the leading zeroes, concatenate them with colons, and then add a <code>PCI:</code> prefix. Then, they can be set under <code>intelBusId</code>, <code>nvidiaBusId</code>, or <code>amdgpuBusId</code> in <code>hardware.nvidia.prime</code>, depending on the manufacturer of the GPU:{{file|configuration.nix|nix|<nowiki>
+</syntaxhighlight>Before putting them into your configuration, however, '''they must first be reformatted''' — assuming the bus address is <code><domain>:<bus>:<device>.<func></code>, convert all numbers from hexadecimal to decimal, then the formatted string is <code>PCI:<bus>@<domain>:<device>:<func></code>. They can be set under <code>intelBusId</code>, <code>nvidiaBusId</code>, or <code>amdgpuBusId</code> in <code>hardware.nvidia.prime</code>, depending on the manufacturer of the GPU:{{file|configuration.nix|nix|<nowiki>
 {
    hardware.nvidia.prime = {
-     intelBusId = "PCI:0:2:0";
+     intelBusId = "PCI:0@0:2:0";
-     nvidiaBusId = "PCI:1:0:0";
+     nvidiaBusId = "PCI:1@0:0:0";
-     #amdgpuBusId = "PCI:54:0:0"; # If you have an AMD iGPU
+     # amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU
    };
 }
@@ Line 95: / Line 95: @@
      offload.enable = true;
-     intelBusId = "PCI:0:2:0";
+     intelBusId = "PCI:0@0:2:0";
-     nvidiaBusId = "PCI:1:0:0";
+     nvidiaBusId = "PCI:1@0:0:0";
-     #amdgpuBusId = "PCI:54:0:0"; # If you have an AMD iGPU
+     # amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU
    };
 }
@@ Line 118: / Line 118: @@
      sync.enable = true;
-     intelBusId = "PCI:0:2:0";
+     intelBusId = "PCI:0@0:2:0";
-     nvidiaBusId = "PCI:1:0:0";
+     nvidiaBusId = "PCI:1@0:0:0";
-     #amdgpuBusId = "PCI:54:0:0"; # If you have an AMD iGPU
+     # amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU
    };
 }
@@ Line 135: / Line 135: @@
      reverseSync.enable = true;
-     intelBusId = "PCI:0:2:0";
+     intelBusId = "PCI:0@0:2:0";
-     nvidiaBusId = "PCI:1:0:0";
+     nvidiaBusId = "PCI:1@0:0:0";
-     #amdgpuBusId = "PCI:54:0:0"; # If you have an AMD iGPU
+     # amdgpuBusId = "PCI:5@0:0:0"; # If you have an AMD iGPU
    };
 }
@@ Line 203: / Line 203: @@
 After rebuilding and rebooting, you'll see in your boot menu under each Generation an "on-the-go" option, which will let you boot into the on-to-go specialisation for that generation.
+See also the [https://github.com/NixOS/nixos-hardware/blob/master/common/gpu/nvidia/prime.nix nixos-hardware] implementation of a similar idea.
 === Using GPUs on non-NixOS ===
 If you're using Nix-packaged software on a non-NixOS system, you'll need a workaround to get everything up-and-running. The [https://github.com/guibou/nixGL nixGL project] provides wrapper to use GL drivers on non-NixOS systems. You need to have GPU drivers installed on your distro (for kernel modules). With nixGL installed, you'll run <code>nixGL foobar</code> instead of  <code>foobar</code>.
@@ Line 210: / Line 211: @@
 === CUDA and using your GPU for compute ===
 See the [[CUDA]] wiki page.
+=== Multi-Process Service (MPS) ===
+[https://docs.nvidia.com/deploy/mps/index.html NVIDIA Multi-Process Service (MPS)] allows multiple CUDA processes to share a single GPU context. NixOS does not provide a dedicated module for MPS, so a custom systemd service is required:
+{{file|configuration.nix|nix|<nowiki>
+{ config, pkgs, ... }:
+{
+  systemd.services.nvidia-mps = {
+    description = "NVIDIA CUDA Multi-Process Service";
+    after = [ "nvidia-persistenced.service" ];
+    requires = [ "nvidia-persistenced.service" ];
+    wantedBy = [ "multi-user.target" ];
+    path = [ config.hardware.nvidia.package.bin ];
+    serviceConfig = {
+      Type = "forking";
+      ExecStart = "${config.hardware.nvidia.package.bin}/bin/nvidia-cuda-mps-control -d";
+      ExecStop = "${pkgs.writeShellScript "nvidia-mps-stop" ''
+        echo quit | ${config.hardware.nvidia.package.bin}/bin/nvidia-cuda-mps-control
+      ''}";
+      Restart = "on-failure";
+      RestartSec = 5;
+    };
+  };
+}
+</nowiki>}}
+{{Warning|The <code>path</code> option is required. The MPS control daemon uses <code>execlp</code> to spawn <code>nvidia-cuda-mps-server</code>, which must be in the service's <code>PATH</code>. Without it, the daemon appears to start normally but silently fails to spawn the server process. CUDA clients will receive Error 805 (<code>cudaErrorMpsConnectionFailed</code>).}}
+To use MPS from [[Docker]] containers, the MPS pipe directory must be mounted and the host IPC namespace must be shared:
+<syntaxhighlight lang="yaml">
+services:
+  gpu-worker:
+    ipc: host
+    volumes:
+      - /tmp/nvidia-mps:/tmp/nvidia-mps
+    environment:
+      CUDA_MPS_PIPE_DIRECTORY: /tmp/nvidia-mps
+</syntaxhighlight>
 === Running Specific NVIDIA Driver Versions ===