K3s: Difference between revisions
imported>Remedan m Fix missing brace  | 
				imported>Atropos112  Added nvidia support  | 
				||
| Line 103: | Line 103: | ||
   systemd.services.k3s.path = [ pkgs.ipset ];  |    systemd.services.k3s.path = [ pkgs.ipset ];  | ||
</syntaxHighlight>  | </syntaxHighlight>  | ||
== Nvidia support ==  | |||
To use Nvidia GPU in the cluster the nvidia-container-runtime and runc are needed. To get the two components it suffices to add the following to the configuration  | |||
<syntaxHighlight lang=nix>  | |||
virtualisation.docker = {  | |||
  enable = true;  | |||
  enableNvidia = true;  | |||
};  | |||
environment.systemPackages = with pkgs; [ docker runc ];  | |||
</syntaxHighlight>  | |||
Note, using docker here is a workaround, it will install nvidia-container-runtime and that will cause it to be accessible via "/run/current-system/sw/bin/nvidia-container-runtime", currently its not directly accessible in nixpkgs.  | |||
You now need to create a new file in  <code>/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl</code> with the following  | |||
<syntaxHighlight lang=toml>  | |||
{{ template "base" . }}  | |||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]  | |||
  privileged_without_host_devices = false  | |||
  runtime_engine = ""  | |||
  runtime_root = ""  | |||
  runtime_type = "io.containerd.runc.v2"  | |||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]  | |||
  BinaryName = "/run/current-system/sw/bin/nvidia-container-runtime"  | |||
</syntaxHighlight>  | |||
Note here we are pointing the nvidia runtime to  "/run/current-system/sw/bin/nvidia-container-runtime".  | |||
Now apply the following runtime class to k3s cluster:  | |||
<syntaxHighlight lang=yaml>  | |||
apiVersion: node.k8s.io/v1  | |||
handler: nvidia  | |||
kind: RuntimeClass  | |||
metadata:  | |||
  labels:  | |||
    app.kubernetes.io/component: gpu-operator  | |||
  name: nvidia  | |||
</syntaxHighlight>  | |||
Following [https://github.com/NVIDIA/k8s-device-plugin#deployment-via-helm k8s-device-plugin] install the helm chart with  <code>runtimeClassName: nvidia</code> set. In order to passthrough the nvidia card into the container, your deployments spec must contain   | |||
- runtimeClassName: nvidia  | |||
- env:  | |||
    - name: NVIDIA_VISIBLE_DEVICES  | |||
      value: all  | |||
    - name: NVIDIA_DRIVER_CAPABILITIES  | |||
      value: all  | |||
to test its working exec onto a pod and run  <code>nvidia-smi </code>. For more configurability of nvidia related matters in k3s look in [https://docs.k3s.io/advanced#nvidia-container-runtime-support k3s-docs]  | |||
== Troubleshooting ==  | == Troubleshooting ==  | ||