Kubernetes: Difference between revisions
imported>Iceychris m + DNS: setup and debugging |
imported>Iceychris m + miscellaneous: nvidia, shared memory |
||
Line 187: | Line 187: | ||
* uncomment kubernetes-related code again | * uncomment kubernetes-related code again | ||
* <code>nixos-rebuild switch</code> | * <code>nixos-rebuild switch</code> | ||
== Miscellaneous == | |||
=== NVIDIA === | |||
You can use NVIDIA's [https://github.com/NVIDIA/k8s-device-plugin k8s-device-plugin]. | |||
Make <code>nvidia-docker</code> your default docker runtime: | |||
<syntaxhighlight> | |||
virtualisation.docker = { | |||
enable = true; | |||
# use nvidia as the default runtime | |||
enableNvidia = true; | |||
extraOptions = "--default-runtime=nvidia"; | |||
}; | |||
</syntaxhighlight> | |||
Apply their Daemonset: | |||
<syntaxhighlight> | |||
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml | |||
</syntaxhighlight> | |||
=== <code>/dev/shm</code> === | |||
Some applications need enough shared memory to work properly. | |||
Create a new volumeMount for your Deployment: | |||
<syntaxhighlight> | |||
... | |||
volumeMounts: | |||
- mountPath: /dev/shm | |||
name: dshm | |||
... | |||
</syntaxhighlight> | |||
and mark its <code>medium</code> as <code>Memory</code>: | |||
<syntaxhighlight> | |||
... | |||
volumes: | |||
- name: dshm | |||
emptyDir: | |||
medium: Memory | |||
... | |||
</syntaxhighlight> | |||
== Tooling == | == Tooling == |
Revision as of 16:58, 9 March 2020
1 Master and 1 Node
Assumptions:
- Master and Node are on the same network (in this example
10.1.1.0/24
) - IP of the Master:
10.1.1.2
- IP of the first Node:
10.1.1.3
Caveats:
- this was only tested on
20.09pre215024.e97dfe73bba (Nightingale)
(unstable
) - this is probably not best-practice
- for a production-grade cluster you shouldn't use
easyCerts
- for a production-grade cluster you shouldn't use
Master
Add to your configuration.nix
:
{ config, pkgs, ... }:
let
kubeMasterIP = "10.1.1.2";
kubeMasterHostname = "api.kube";
kubeMasterAPIServerPort = 443;
in
{
# resolve master hostname
networking.extraHosts = "${kubeMasterIP} ${kubeMasterHostname}";
# packages for administration tasks
environment.systemPackages = with pkgs; [
kompose
kubectl
kubernetes
];
services.kubernetes = {
roles = ["master" "node"];
masterAddress = kubeMasterHostname;
easyCerts = true;
apiserver = {
securePort = kubeMasterAPIServerPort;
advertiseAddress = kubeMasterIP;
};
# use coredns
addons.dns.enable = true;
# needed if you use swap
kubelet.extraOpts = "--fail-swap-on=false";
};
}
Apply your config (e.g. nixos-rebuild switch
).
Link your kubeconfig
to your home directory:
ln -s /etc/kubernetes/cluster-admin.kubeconfig ~/.kube/config
Now, executing kubectl cluster-info
should yield something like this:
Kubernetes master is running at https://10.1.1.2
CoreDNS is running at https://10.1.1.2/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
You should also see that the master is also a node using kubectl get nodes
:
NAME STATUS ROLES AGE VERSION
direwolf Ready <none> 41m v1.16.6-beta.0
Node
Add to your configuration.nix
:
{ config, pkgs, ... }:
let
kubeMasterIP = "10.1.1.2";
kubeMasterHostname = "api.kube";
kubeMasterAPIServerPort = 443;
in
{
# resolve master hostname
networking.extraHosts = "${kubeMasterIP} ${kubeMasterHostname}";
# packages for administration tasks
environment.systemPackages = with pkgs; [
kompose
kubectl
kubernetes
];
services.kubernetes = let
api = "https://${kubeMasterHostname}:${kubeMasterAPIServerPort}";
in
{
roles = ["node"];
masterAddress = kubeMasterHostname;
easyCerts = true;
# point kubelet and other services to kube-apiserver
kubelet.kubeconfig.server = api;
apiserverAddress = api;
# use coredns
addons.dns.enable = true;
# needed if you use swap
kubelet.extraOpts = "--fail-swap-on=false";
};
}
Apply your config (e.g. nixos-rebuild switch
).
According to the NixOS tests, make your Node join the cluster:
# on the master, grab the apitoken
cat /var/lib/kubernetes/secrets/apitoken.secret
# on the node, join the node with
echo TOKEN | nixos-kubernetes-node-join
After that, you should see your new node using kubectl get nodes
:
NAME STATUS ROLES AGE VERSION
direwolf Ready <none> 62m v1.16.6-beta.0
drake Ready <none> 102m v1.16.6-beta.0
N Masters (HA)
Debugging
systemctl status kubelet
systemctl status kube-apiserver
kubectl get nodes
DNS issues
Check if coredns is running via kubectl get pods -n kube-system
:
NAME READY STATUS RESTARTS AGE
coredns-577478d784-bmt5s 1/1 Running 2 163m
coredns-577478d784-bqj65 1/1 Running 2 163m
Run a pod to check with kubectl run curl --restart=Never --image=radial/busyboxplus:curl -i --tty
:
If you don't see a command prompt, try pressing enter.
[ root@curl:/ ]$ nslookup google.com
Server: 10.0.0.254
Address 1: 10.0.0.254 kube-dns.kube-system.svc.cluster.local
Name: google.com
Address 1: 2a00:1450:4016:803::200e muc12s04-in-x0e.1e100.net
Address 2: 172.217.23.14 lhr35s01-in-f14.1e100.net
reset to a clean state
Sometimes it helps to have a clean state on all instances:
- comment kubernetes-related code in
configuration.nix
nixos-rebuild switch
- clean up filesystem
rm -rf /var/lib/kubernetes/ /var/lib/etcd/ /var/lib/cfssl/ /var/lib/kubelet/
rm -rf /etc/kube-flannel/ /etc/kubernetes/
- uncomment kubernetes-related code again
nixos-rebuild switch
Miscellaneous
NVIDIA
You can use NVIDIA's k8s-device-plugin.
Make nvidia-docker
your default docker runtime:
virtualisation.docker = {
enable = true;
# use nvidia as the default runtime
enableNvidia = true;
extraOptions = "--default-runtime=nvidia";
};
Apply their Daemonset:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml
/dev/shm
Some applications need enough shared memory to work properly. Create a new volumeMount for your Deployment:
...
volumeMounts:
- mountPath: /dev/shm
name: dshm
...
and mark its medium
as Memory
:
...
volumes:
- name: dshm
emptyDir:
medium: Memory
...
Tooling
There are various community projects aimed at facilitating working with Kubernetes combined with Nix:
Sources
- Issue #39327: kubernetes support is missing some documentation
- NixOS Discourse: Using multiple nodes on unstable
- Kubernetes docs
- NixOS e2e kubernetes tests: Node Joining etc.
- IRC (2018-09): issues related to DNS
- IRC (2019-09): discussion about
easyCerts
and general setup