K3s: Difference between revisions

← Older edit Newer edit →

VisualWikitext

@@ Line 103: / Line 103: @@
    systemd.services.k3s.path = [ pkgs.ipset ];
 </syntaxHighlight>
+== Nvidia support ==
+To use Nvidia GPU in the cluster the nvidia-container-runtime and runc are needed. To get the two components it suffices to add the following to the configuration
+<syntaxHighlight lang=nix>
+virtualisation.docker = {
+  enable = true;
+  enableNvidia = true;
+};
+environment.systemPackages = with pkgs; [ docker runc ];
+</syntaxHighlight>
+Note, using docker here is a workaround, it will install nvidia-container-runtime and that will cause it to be accessible via "/run/current-system/sw/bin/nvidia-container-runtime", currently its not directly accessible in nixpkgs.
+You now need to create a new file in  <code>/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl</code> with the following
+<syntaxHighlight lang=toml>
+{{ template "base" . }}
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
+  privileged_without_host_devices = false
+  runtime_engine = ""
+  runtime_root = ""
+  runtime_type = "io.containerd.runc.v2"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
+  BinaryName = "/run/current-system/sw/bin/nvidia-container-runtime"
+</syntaxHighlight>
+Note here we are pointing the nvidia runtime to  "/run/current-system/sw/bin/nvidia-container-runtime".
+Now apply the following runtime class to k3s cluster:
+<syntaxHighlight lang=yaml>
+apiVersion: node.k8s.io/v1
+handler: nvidia
+kind: RuntimeClass
+metadata:
+  labels:
+    app.kubernetes.io/component: gpu-operator
+  name: nvidia
+</syntaxHighlight>
+Following [https://github.com/NVIDIA/k8s-device-plugin#deployment-via-helm k8s-device-plugin] install the helm chart with  <code>runtimeClassName: nvidia</code> set. In order to passthrough the nvidia card into the container, your deployments spec must contain
+- runtimeClassName: nvidia
+- env:
+    - name: NVIDIA_VISIBLE_DEVICES
+      value: all
+    - name: NVIDIA_DRIVER_CAPABILITIES
+      value: all
+to test its working exec onto a pod and run  <code>nvidia-smi </code>. For more configurability of nvidia related matters in k3s look in [https://docs.k3s.io/advanced#nvidia-container-runtime-support k3s-docs]
 == Troubleshooting ==