NVIDIA: Difference between revisions

mNo edit summary
Add MPS section: systemd service config + Docker usage, document silent PATH failure
 
Line 210: Line 210:
=== CUDA and using your GPU for compute ===
=== CUDA and using your GPU for compute ===
See the [[CUDA]] wiki page.
See the [[CUDA]] wiki page.
=== Multi-Process Service (MPS) ===
[https://docs.nvidia.com/deploy/mps/index.html NVIDIA Multi-Process Service (MPS)] allows multiple CUDA processes to share a single GPU context. NixOS does not provide a dedicated module for MPS, so a custom systemd service is required:
{{file|configuration.nix|nix|<nowiki>
{ config, pkgs, ... }:
{
  systemd.services.nvidia-mps = {
    description = "NVIDIA CUDA Multi-Process Service";
    after = [ "nvidia-persistenced.service" ];
    requires = [ "nvidia-persistenced.service" ];
    wantedBy = [ "multi-user.target" ];
    path = [ config.hardware.nvidia.package.bin ];
    serviceConfig = {
      Type = "forking";
      ExecStart = "${config.hardware.nvidia.package.bin}/bin/nvidia-cuda-mps-control -d";
      ExecStop = "${pkgs.writeShellScript "nvidia-mps-stop" ''
        echo quit | ${config.hardware.nvidia.package.bin}/bin/nvidia-cuda-mps-control
      ''}";
      Restart = "on-failure";
      RestartSec = 5;
    };
  };
}
</nowiki>}}
{{Warning|The <code>path</code> option is required. The MPS control daemon uses <code>execlp</code> to spawn <code>nvidia-cuda-mps-server</code>, which must be in the service's <code>PATH</code>. Without it, the daemon appears to start normally but silently fails to spawn the server process. CUDA clients will receive Error 805 (<code>cudaErrorMpsConnectionFailed</code>).}}
To use MPS from [[Docker]] containers, the MPS pipe directory must be mounted and the host IPC namespace must be shared:
<syntaxhighlight lang="yaml">
services:
  gpu-worker:
    ipc: host
    volumes:
      - /tmp/nvidia-mps:/tmp/nvidia-mps
    environment:
      CUDA_MPS_PIPE_DIRECTORY: /tmp/nvidia-mps
</syntaxhighlight>


=== Running Specific NVIDIA Driver Versions ===
=== Running Specific NVIDIA Driver Versions ===