Kubernetes: Difference between revisions

(18 intermediate revisions by 10 users not shown)

Line 1:

If you are new to ~~kubernetes~~ you might want to check out [[~~k3s~~]] first as it is easier to set up (less moving parts)

[https://kubernetes.io/ Kubernetes] is an open-source container orchestration system for automating software deployment, scaling, and management.

This wiki article extends the documentation in [https://nixos.org/manual/nixos/stable/#sec-kubernetes NixOS manual].

== [[wikipedia:en:KISS principle|KISS]] ==

If you are new to [[Kubernetes]] you might want to check out [[K3s]] first as it is easier to set up (less moving parts).

== 1 Master and 1 Node ==

Line 21:

Line 27:

Add to your <code>configuration.nix</code>:

{ config, pkgs, ... }:

let

# When using easyCerts=true the IP Address must resolve to the master on creation.

# So use simply 127.0.0.1 in that case. Otherwise you will have errors like this https://github.com/NixOS/nixpkgs/issues/59364

kubeMasterIP = "10.1.1.2";

kubeMasterHostname = "api.kube";

Line 62:

Line 70:

Link your <code>kubeconfig</code> to your home directory:

ln -s /etc/kubernetes/cluster-admin.kubeconfig ~/.kube/config

</syntaxhighlight>

Line 68:

Line 76:

Now, executing <code>kubectl cluster-info</code> should yield something like this:

Kubernetes master is running at https://10.1.1.2

CoreDNS is running at https://10.1.1.2/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

Line 77:

Line 85:

You should also see that the master is also a node using <code>kubectl get nodes</code>:

NAME STATUS ROLES AGE VERSION

direwolf Ready <none> 41m v1.16.6-beta.0

Line 86:

Line 94:

Add to your <code>configuration.nix</code>:

{ config, pkgs, ... }:

let

kubeMasterIP = "10.1.1.2";

kubeMasterHostname = "api.kube";

kubeMasterAPIServerPort = ~~443~~;

kubeMasterAPIServerPort = 6443;

in

{

Line 129:

Line 137:

According to the [https://github.com/NixOS/nixpkgs/blob/18ff53d7656636aa440b2f73d2da788b785e6a9c/nixos/tests/kubernetes/rbac.nix#L118 NixOS tests], make your Node join the cluster:

~~<syntaxhighlight lang="bash">~~

on the master, grab the apitoken

# on the master, grab the apitoken

cat /var/lib/kubernetes/secrets/apitoken.secret

</syntaxhighlight>

# on the node, join the node with

on the node, join the node with

echo TOKEN | nixos-kubernetes-node-join

</syntaxhighlight>

Line 139:

Line 149:

After that, you should see your new node using <code>kubectl get nodes</code>:

NAME STATUS ROLES AGE VERSION

direwolf Ready <none> 62m v1.16.6-beta.0

drake Ready <none> 102m v1.16.6-beta.0

</syntaxhighlight>

== N Masters (HA) ==

Line 152:

Line 161:

== Troubleshooting ==

systemctl status kubelet

</syntaxhighlight>

systemctl status kube-apiserver

</syntaxhighlight>

kubectl get nodes

</syntaxhighlight>

Line 162:

Line 175:

If you face issues while running the <code>nixos-kubernetes-node-join</code> script:

Restarting certmgr...

Job for certmgr.service failed because a timeout was exceeded.

Line 170:

Line 183:

Go investigate with <code>journalctl -u certmgr</code>:

... certmgr: loading from config file /nix/store/gj7qr7lp6wakhiwcxdpxwbpamvmsifhk-certmgr.yaml

... manager: loading certificates from /nix/store/4n41ikm7322jxg7bh0afjpxsd4b2idpv-certmgr.d

Line 180:

Line 193:

Restarting cfssl on the <code>master</code> node should help: <code>systemctl restart cfssl</code>

Also, make sure that port <code>8888</code> is open on your master node.

=== DNS issues ===

Line 185:

Line 200:

Check if coredns is running via <code>kubectl get pods -n kube-system</code>:

NAME READY STATUS RESTARTS AGE

coredns-577478d784-bmt5s 1/1 Running 2 163m

Line 193:

Line 208:

Run a pod to check with <code>kubectl run curl --restart=Never --image=radial/busyboxplus:curl -i --tty</code>:

~~<syntaxhighlight>~~

If you don't see a command prompt, try pressing enter.

[ root@curl:/ ]$ nslookup google.com

[ root@curl:/ ]$

</syntaxhighlight>

nslookup google.com

</syntaxhighlight>

Server: 10.0.0.254

Address 1: 10.0.0.254 kube-dns.kube-system.svc.cluster.local

Line 206:

Line 226:

In case DNS is still not working I found that sometimes, restarting services helps:

systemctl restart kube-proxy flannel kubelet

</syntaxhighlight>

Line 213:

Line 233:

Sometimes it helps to have a clean state on all instances:

* comment kubernetes-related code in <code>configuration.nix</code>

* <code>nixos-rebuild switch</code>

Line 229:

Line 248:

To do so, I found it necessary to change a few things (tested with <code>rook v1.2</code>):

* you need the <code>ceph</code> kernel module: <code>boot.kernelModules = [ "ceph" ];</code>

* change the root dir of the kubelet: <code>kubelet.extraOpts = "--root-dir=/var/lib/kubelet";</code>

* reboot all your nodes

* continue with [https://rook.io/docs/rook/v1.2/ceph-quickstart.html the official quickstart guide]

* in <code>operator.yaml</code>, help the CSI plugins find the hosts' ceph kernel modules by adding (or uncommenting -- they're in the example config) these entries:

* in <code>operator.yaml</code>, ~~set <code>CSI_FORCE_CEPHFS_KERNEL_CLIENT<~~/~~code> to <code>false<~~/~~code>~~

CSI_CEPHFS_PLUGIN_VOLUME: |

- name: lib-modules

hostPath:

path: /run/current-system/kernel-modules/lib/modules/

CSI_RBD_PLUGIN_VOLUME: |

- name: lib-modules

hostPath:

path: /run/current-system/kernel-modules/lib/modules/

=== NVIDIA ===

Line 245:

Line 267:

Make <code>nvidia-docker</code> your default docker runtime:

virtualisation.docker = {

enable = true;

Line 257:

Line 279:

Apply their Daemonset:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

</syntaxhighlight>

Line 265:

Line 287:

Some applications need enough shared memory to work properly.

Create a new volumeMount for your Deployment:

~~...~~

volumeMounts:

- mountPath: /dev/shm

name: dshm

~~...~~

</syntaxhighlight>

and mark its <code>medium</code> as <code>Memory</code>:

~~...~~

volumes:

- name: dshm

emptyDir:

medium: Memory

...

</syntaxhighlight>

=== Arm64 ===

Nix might pull in <code>coredns</code> and <code>etcd</code> images that are incompatible with arm, To resolve this add the following to your master node's configuration:

==== etcd ====

...

services.kubernetes = {...};

systemd.services.etcd = {

environment = {

ETCD_UNSUPPORTED_ARCH = "arm64";

};

...

</syntaxhighlight>

==== coredns ====

services.kubernetes = {

...

# use coredns

addons.dns = {

enable = true;

coredns = {

finalImageTag = "1.10.1";

imageDigest = "sha256:a0ead06651cf580044aeb0a0feba63591858fb2e43ade8c9dea45a6a89ae7e5e";

imageName = "coredns/coredns";

sha256 = "0c4vdbklgjrzi6qc5020dvi8x3mayq4li09rrq2w0hcjdljj0yf9";

};

...

};

</syntaxhighlight>

Line 287:

Line 336:

There are various community projects aimed at facilitating working with Kubernetes combined with Nix:

* [https://github.com/saschagrunert/kubernix kubernix]: simple setup of development clusters using Nix

* [https://github.com/~~cmollekopf~~/~~kube~~-~~nix kube~~-~~nix~~]

* [https://kubenix.org/ kubenix] - [https://github.com/hall/kubenix GitHub (updated 2023)]

* [https://github.com/justinas/nixos-ha-kubernetes nixos-ha-kubernetes]

== References ==

Line 297:

Line 347:

* [https://logs.nix.samueldr.com/nixos-kubernetes/2018-09-07 IRC (2018-09)]: issues related to DNS

* [https://logs.nix.samueldr.com/nixos-kubernetes/2019-09-05 IRC (2019-09)]: discussion about <code>easyCerts</code> and general setup

[[Category:Applications]]

[[Category:Server]]

[[Category:Container]]

[[Category:NixOS Manual]]

@@ Line 1: / Line 1: @@
-If you are new to kubernetes you might want to check out [[k3s]] first as it is easier to set up (less moving parts)
+[https://kubernetes.io/ Kubernetes] is an open-source container orchestration system for automating software deployment, scaling, and management.
+This wiki article extends the documentation in [https://nixos.org/manual/nixos/stable/#sec-kubernetes NixOS manual].
+== [[wikipedia:en:KISS principle|KISS]] ==
+If you are new to [[Kubernetes]] you might want to check out [[K3s]] first as it is easier to set up (less moving parts).
 == 1 Master and 1 Node ==
@@ Line 21: / Line 27: @@
 Add to your <code>configuration.nix</code>:
-<syntaxhighlight lang="nix">
+<syntaxhighlight lang=nix>
 { config, pkgs, ... }:
 let
+  # When using easyCerts=true the IP Address must resolve to the master on creation.
+ # So use simply 127.0.0.1 in that case. Otherwise you will have errors like this https://github.com/NixOS/nixpkgs/issues/59364
    kubeMasterIP = "10.1.1.2";
    kubeMasterHostname = "api.kube";
@@ Line 62: / Line 70: @@
 Link your <code>kubeconfig</code> to your home directory:
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang=bash>
 ln -s /etc/kubernetes/cluster-admin.kubeconfig ~/.kube/config
 </syntaxhighlight>
@@ Line 68: / Line 76: @@
 Now, executing <code>kubectl cluster-info</code> should yield something like this:
-<syntaxhighlight>
+<syntaxhighlight lang=shell>
 Kubernetes master is running at https://10.1.1.2
 CoreDNS is running at https://10.1.1.2/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
@@ Line 77: / Line 85: @@
 You should also see that the master is also a node using <code>kubectl get nodes</code>:
-<syntaxhighlight>
+<syntaxhighlight lang=shell>
 NAME       STATUS   ROLES    AGE   VERSION
 direwolf   Ready    <none>   41m   v1.16.6-beta.0
@@ Line 86: / Line 94: @@
 Add to your <code>configuration.nix</code>:
-<syntaxhighlight lang="nix">
+<syntaxhighlight lang=nix>
 { config, pkgs, ... }:
 let
    kubeMasterIP = "10.1.1.2";
    kubeMasterHostname = "api.kube";
-   kubeMasterAPIServerPort = 443;
+   kubeMasterAPIServerPort = 6443;
 in
 {
@@ Line 129: / Line 137: @@
 According to the [https://github.com/NixOS/nixpkgs/blob/18ff53d7656636aa440b2f73d2da788b785e6a9c/nixos/tests/kubernetes/rbac.nix#L118 NixOS tests], make your Node join the cluster:
-<syntaxhighlight lang="bash">
+on the master, grab the apitoken
-# on the master, grab the apitoken
+<syntaxhighlight lang=bash>
 cat /var/lib/kubernetes/secrets/apitoken.secret
+</syntaxhighlight>
-# on the node, join the node with
+on the node, join the node with
+<syntaxhighlight lang=bash>
 echo TOKEN | nixos-kubernetes-node-join
 </syntaxhighlight>
@@ Line 139: / Line 149: @@
 After that, you should see your new node using <code>kubectl get nodes</code>:
-<syntaxhighlight>
+<syntaxhighlight lang=shell>
 NAME       STATUS   ROLES    AGE    VERSION
 direwolf   Ready    <none>   62m    v1.16.6-beta.0
 drake      Ready    <none>   102m   v1.16.6-beta.0
 </syntaxhighlight>
 == N Masters (HA) ==
@@ Line 152: / Line 161: @@
 == Troubleshooting ==
-<syntaxhighlight>
+<syntaxhighlight lang=bash>
 systemctl status kubelet
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
 systemctl status kube-apiserver
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
 kubectl get nodes
 </syntaxhighlight>
@@ Line 162: / Line 175: @@
 If you face issues while running the <code>nixos-kubernetes-node-join</code> script:
-<syntaxhighlight>
+<syntaxhighlight lang=shell>
 Restarting certmgr...
 Job for certmgr.service failed because a timeout was exceeded.
@@ Line 170: / Line 183: @@
 Go investigate with <code>journalctl -u certmgr</code>:
-<syntaxhighlight>
+<syntaxhighlight lang=shell>
 ... certmgr: loading from config file /nix/store/gj7qr7lp6wakhiwcxdpxwbpamvmsifhk-certmgr.yaml
 ... manager: loading certificates from /nix/store/4n41ikm7322jxg7bh0afjpxsd4b2idpv-certmgr.d
@@ Line 180: / Line 193: @@
 Restarting cfssl on the <code>master</code> node should help: <code>systemctl restart cfssl</code>
+Also, make sure that port <code>8888</code> is open on your master node.
 === DNS issues ===
@@ Line 185: / Line 200: @@
 Check if coredns is running via <code>kubectl get pods -n kube-system</code>:
-<syntaxhighlight>
+<syntaxhighlight lang=shell>
 NAME                       READY   STATUS    RESTARTS   AGE
 coredns-577478d784-bmt5s   1/1     Running   2          163m
@@ Line 193: / Line 208: @@
 Run a pod to check with <code>kubectl run curl --restart=Never --image=radial/busyboxplus:curl -i --tty</code>:
-<syntaxhighlight>
 If you don't see a command prompt, try pressing enter.
-[ root@curl:/ ]$ nslookup google.com
+<syntaxhighlight lang=shell>
+[ root@curl:/ ]$
+</syntaxhighlight>
+<syntaxhighlight lang=bash>
+nslookup google.com
+</syntaxhighlight>
+<syntaxhighlight lang=shell>
 Server:    10.0.0.254
 Address 1: 10.0.0.254 kube-dns.kube-system.svc.cluster.local
@@ Line 206: / Line 226: @@
 In case DNS is still not working I found that sometimes, restarting services helps:
-<syntaxhighlight>
+<syntaxhighlight lang=bash>
 systemctl restart kube-proxy flannel kubelet
 </syntaxhighlight>
@@ Line 213: / Line 233: @@
 Sometimes it helps to have a clean state on all instances:
 * comment kubernetes-related code in <code>configuration.nix</code>
 * <code>nixos-rebuild switch</code>
@@ Line 229: / Line 248: @@
 To do so, I found it necessary to change a few things (tested with <code>rook v1.2</code>):
 * you need the <code>ceph</code> kernel module: <code>boot.kernelModules = [ "ceph" ];</code>
 * change the root dir of the kubelet: <code>kubelet.extraOpts = "--root-dir=/var/lib/kubelet";</code>
 * reboot all your nodes
 * continue with [https://rook.io/docs/rook/v1.2/ceph-quickstart.html the official quickstart guide]
+* in <code>operator.yaml</code>, help the CSI plugins find the hosts' ceph kernel modules by adding (or uncommenting -- they're in the example config) these entries:
-* in <code>operator.yaml</code>, set <code>CSI_FORCE_CEPHFS_KERNEL_CLIENT</code> to <code>false</code>
+  CSI_CEPHFS_PLUGIN_VOLUME: |
+  - name: lib-modules
+    hostPath:
+      path: /run/current-system/kernel-modules/lib/modules/
+  CSI_RBD_PLUGIN_VOLUME: |
+  - name: lib-modules
+    hostPath:
+      path: /run/current-system/kernel-modules/lib/modules/
 === NVIDIA ===
@@ Line 245: / Line 267: @@
 Make <code>nvidia-docker</code> your default docker runtime:
-<syntaxhighlight>
+<syntaxhighlight lang=nix>
 virtualisation.docker = {
      enable = true;
@@ Line 257: / Line 279: @@
 Apply their Daemonset:
-<syntaxhighlight>
+<syntaxhighlight lang=bash>
 kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml
 </syntaxhighlight>
@@ Line 265: / Line 287: @@
 Some applications need enough shared memory to work properly.
 Create a new volumeMount for your Deployment:
-<syntaxhighlight>
+<syntaxhighlight lang=bash>
-...
 volumeMounts:
 - mountPath: /dev/shm
    name: dshm
-...
 </syntaxhighlight>
 and mark its <code>medium</code> as <code>Memory</code>:
-<syntaxhighlight>
+<syntaxhighlight lang=bash>
-...
 volumes:
 - name: dshm
    emptyDir:
    medium: Memory
-...
+</syntaxhighlight>
+=== Arm64 ===
+Nix might pull in <code>coredns</code> and <code>etcd</code> images that are incompatible with arm, To resolve this add the following to your master node's configuration:
+==== etcd ====
+<syntaxhighlight lang=nix>
+  ...
+  services.kubernetes = {...};
+  systemd.services.etcd = {
+    environment = {
+      ETCD_UNSUPPORTED_ARCH = "arm64";
+    };
+  };
+  ...
+</syntaxhighlight>
+==== coredns ====
+<syntaxhighlight lang=nix>
+  services.kubernetes = {
+    ...
+    # use coredns
+    addons.dns = {
+      enable = true;
+      coredns = {
+        finalImageTag = "1.10.1";
+        imageDigest = "sha256:a0ead06651cf580044aeb0a0feba63591858fb2e43ade8c9dea45a6a89ae7e5e";
+        imageName = "coredns/coredns";
+        sha256 = "0c4vdbklgjrzi6qc5020dvi8x3mayq4li09rrq2w0hcjdljj0yf9";
+      };
+    };
+   ...
+  };
 </syntaxhighlight>
@@ Line 287: / Line 336: @@
 There are various community projects aimed at facilitating working with Kubernetes combined with Nix:
 * [https://github.com/saschagrunert/kubernix kubernix]: simple setup of development clusters using Nix
-* [https://github.com/cmollekopf/kube-nix kube-nix]
+* [https://kubenix.org/ kubenix] - [https://github.com/hall/kubenix GitHub (updated 2023)]
+* [https://github.com/justinas/nixos-ha-kubernetes nixos-ha-kubernetes]
 == References ==
@@ Line 297: / Line 347: @@
 * [https://logs.nix.samueldr.com/nixos-kubernetes/2018-09-07 IRC (2018-09)]: issues related to DNS
 * [https://logs.nix.samueldr.com/nixos-kubernetes/2019-09-05 IRC (2019-09)]: discussion about <code>easyCerts</code> and general setup
+[[Category:Applications]]
+[[Category:Server]]
+[[Category:Container]]
+[[Category:NixOS Manual]]