K3s: Difference between revisions

imported>Remedan
m Fix missing brace
imported>Atropos112
Added nvidia support
Line 103: Line 103:
   systemd.services.k3s.path = [ pkgs.ipset ];
   systemd.services.k3s.path = [ pkgs.ipset ];
</syntaxHighlight>
</syntaxHighlight>
== Nvidia support ==
To use Nvidia GPU in the cluster the nvidia-container-runtime and runc are needed. To get the two components it suffices to add the following to the configuration
<syntaxHighlight lang=nix>
virtualisation.docker = {
  enable = true;
  enableNvidia = true;
};
environment.systemPackages = with pkgs; [ docker runc ];
</syntaxHighlight>
Note, using docker here is a workaround, it will install nvidia-container-runtime and that will cause it to be accessible via "/run/current-system/sw/bin/nvidia-container-runtime", currently its not directly accessible in nixpkgs.
You now need to create a new file in  <code>/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl</code> with the following
<syntaxHighlight lang=toml>
{{ template "base" . }}
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
  privileged_without_host_devices = false
  runtime_engine = ""
  runtime_root = ""
  runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
  BinaryName = "/run/current-system/sw/bin/nvidia-container-runtime"
</syntaxHighlight>
Note here we are pointing the nvidia runtime to  "/run/current-system/sw/bin/nvidia-container-runtime".
Now apply the following runtime class to k3s cluster:
<syntaxHighlight lang=yaml>
apiVersion: node.k8s.io/v1
handler: nvidia
kind: RuntimeClass
metadata:
  labels:
    app.kubernetes.io/component: gpu-operator
  name: nvidia
</syntaxHighlight>
Following [https://github.com/NVIDIA/k8s-device-plugin#deployment-via-helm k8s-device-plugin] install the helm chart with  <code>runtimeClassName: nvidia</code> set. In order to passthrough the nvidia card into the container, your deployments spec must contain
- runtimeClassName: nvidia
- env:
    - name: NVIDIA_VISIBLE_DEVICES
      value: all
    - name: NVIDIA_DRIVER_CAPABILITIES
      value: all
to test its working exec onto a pod and run  <code>nvidia-smi </code>. For more configurability of nvidia related matters in k3s look in [https://docs.k3s.io/advanced#nvidia-container-runtime-support k3s-docs]


== Troubleshooting ==
== Troubleshooting ==