K3s: Difference between revisions
imported>Remedan m Fix missing brace |
imported>Atropos112 Added nvidia support |
||
Line 103: | Line 103: | ||
systemd.services.k3s.path = [ pkgs.ipset ]; | systemd.services.k3s.path = [ pkgs.ipset ]; | ||
</syntaxHighlight> | </syntaxHighlight> | ||
== Nvidia support == | |||
To use Nvidia GPU in the cluster the nvidia-container-runtime and runc are needed. To get the two components it suffices to add the following to the configuration | |||
<syntaxHighlight lang=nix> | |||
virtualisation.docker = { | |||
enable = true; | |||
enableNvidia = true; | |||
}; | |||
environment.systemPackages = with pkgs; [ docker runc ]; | |||
</syntaxHighlight> | |||
Note, using docker here is a workaround, it will install nvidia-container-runtime and that will cause it to be accessible via "/run/current-system/sw/bin/nvidia-container-runtime", currently its not directly accessible in nixpkgs. | |||
You now need to create a new file in <code>/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl</code> with the following | |||
<syntaxHighlight lang=toml> | |||
{{ template "base" . }} | |||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] | |||
privileged_without_host_devices = false | |||
runtime_engine = "" | |||
runtime_root = "" | |||
runtime_type = "io.containerd.runc.v2" | |||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] | |||
BinaryName = "/run/current-system/sw/bin/nvidia-container-runtime" | |||
</syntaxHighlight> | |||
Note here we are pointing the nvidia runtime to "/run/current-system/sw/bin/nvidia-container-runtime". | |||
Now apply the following runtime class to k3s cluster: | |||
<syntaxHighlight lang=yaml> | |||
apiVersion: node.k8s.io/v1 | |||
handler: nvidia | |||
kind: RuntimeClass | |||
metadata: | |||
labels: | |||
app.kubernetes.io/component: gpu-operator | |||
name: nvidia | |||
</syntaxHighlight> | |||
Following [https://github.com/NVIDIA/k8s-device-plugin#deployment-via-helm k8s-device-plugin] install the helm chart with <code>runtimeClassName: nvidia</code> set. In order to passthrough the nvidia card into the container, your deployments spec must contain | |||
- runtimeClassName: nvidia | |||
- env: | |||
- name: NVIDIA_VISIBLE_DEVICES | |||
value: all | |||
- name: NVIDIA_DRIVER_CAPABILITIES | |||
value: all | |||
to test its working exec onto a pod and run <code>nvidia-smi </code>. For more configurability of nvidia related matters in k3s look in [https://docs.k3s.io/advanced#nvidia-container-runtime-support k3s-docs] | |||
== Troubleshooting == | == Troubleshooting == |