mpolednik.github.io virtualization & tech blog

Proof of concept Kubernetes cluster on Raspberry Pi using K3s

The project

The plan is somewhat simple here: we’ll try to setup a proof of concept Kubernetes cluster in homelab environment with a twist: service announcement over BGP.

The hardware

You’ll need several pieces of hardware at this stage. I’ve tried this with a Raspberry Pi version 3 model B and feel that the Pi didn’t manage the load well at times - version 4 might be a bit better idea.

The Mikrotik RB4011 router is a great choice for this project as it comes with 10 gigabit RJ45 switch ports (on 2 separate switch chips), great OS (RouterOS) and more capabilities than one can even dream of utilizing. That being said, it’s an overkill. Any router that speaks BGP is sufficient, but the snippets console in this post are from MikroTik’s RouterOS.

  • Mikrotik RB4011, an ethernet router that, among other features, speaks BGP
  • Raspberry Pi 3 Model B
  • MicroSD card (8 GiB seems enough),
  • MicroSD card write-capable device
  • power adapter suitable for the Pi
  • ethernet cable

Setting up the network infrastructure

We’ll bridge ports 6 through 10 (1 is used for WAN, 2-5 for home network) and create a new subnet where the project will take place. Why 5 ports for 1 Pi? At this point, it’s just a convenience of being able to plug it into any of the 5 ports and be on the correct network. Starting at layer 2:

/interface bridge
add name=bridge2
/interface bridge port
add bridge=bridge2 interface=ether6
add bridge=bridge2 interface=ether7
add bridge=bridge2 interface=ether8
add bridge=bridge2 interface=ether9
add bridge=bridge2 interface=ether10

Next, we move over to L3 tasks. Home lab will initially be assigned the 10.0.2.0/23 subnet, and the router will be at 10.0.2.1. This is an important piece of information that we’ll need when setting up BGP.

/ip address
add address=10.0.2.1/23 interface=bridge2 network=10.0.2.0

Then we probably want a DHCP server. The 10.0.2.0/24 portion of the subnet will be reserved for static IPs, and DHCP will only distribute addresess in 10.0.3.0/24 range. The split exists because it scales to more complex scenarios in the future, we won’t need to touch the static space except for the router.

/ip pool
add name=lab-pool ranges=10.0.3.1-10.0.3.254
/ip dhcp-server network
add address=10.0.2.0/23 dns-server=8.8.8.8,8.8.4.4 gateway=10.0.2.1
/ip dhcp-server
add address-pool=lab-pool disabled=no interface=bridge2 name=dhcp2

Preparing the Raspberry Pi (on MacOS)

We need some OS. At this point, experimenting with ARM64 is a needless overhead. The same applies to any non-standard OS. The path of least resistance seems to be Raspberry Pi OS (previously called raspbian).

$ wget https://downloads.raspberrypi.org/raspios_lite_armhf_latest \
    --trust-server-names \
    --timestamping
$ unzip 2020-08-20-raspios-buster-armhf-lite.zip

Since the DHCP server is under our control (running on RB4011), it is acceptable for the Pi to obtain a DHCP lease. We’ll then find the IP directly within the DHCP server’s interface. There is no expectation that a screen will ever be connected to the Pi, and Raspberry Pi OS does not ship with SSH enabled by default. Can we somehow enable it before at this stage?

According to the docs, the OS looks for a file named ssh in the boot directory. If found, the OS will boot up with SSH enabled (using default pi username and raspberry password).

$ open 2020-08-20-raspios-buster-armhf-lite.img
$ cd /Volumes/boot
$ touch ssh
$ diskutil unMount /Volumes/boot

We then need to get the modified image to MicroSD card used as our main system disk. Since Apple hardware no longer ships with (Micro)SD card reader, an external reader is our only choice.

First, let’s figure out which device is the SD card using diskutil command:

$ diskutil list
...

/dev/disk3 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:     FDisk_partition_scheme                        *7.9 GB     disk3
   1:             Windows_FAT_32 boot                    268.4 MB   disk3s1
   2:                      Linux                         7.7 GB     disk3s2

Double check the parameters of the device - writing to a wrong device would lead to catastrophic consequences. In order to speed up the process, we want to write to the raw device (/dev/rdiskN) instead of the buffered /dev/diskN variant.

$ sudo diskutil unmountDisk /dev/disk3
$ sudo dd if=~/2020-08-20-raspios-buster-armhf-lite.img of=/dev/rdisk3 bs=4m
$ sudo diskutil eject /dev/rdisk3

After ejecting the logical device, it’s time to get on the hardware level and plug the card into the Raspberry Pi. Continuing on the hardware front, we plug the Pi into any of the bridged router ports (6-10) and connect a power adapter.

First test!

The Pi should boot up and obtain a DHCP lease. Let’s consult the router. The output is modified to hide any other leases and hwaddr.

/ip dhcp-server lease p
...
 1 D 10.0.3.254 B8:27:EB:00:00:00 rpi dhcp2 bound
...

That means we should be able to reach our Pi via SSH:

$ ssh pi@10.0.3.254
pi@10.0.3.254's password:
Linux k8s-master 5.4.51-v7+ #1333 SMP Mon Aug 10 16:45:19 BST 2020 armv7l

...

pi@rpi:~ $

Success! We’re in, and it’s time to install Kubernetes. The flavor of Kubernetes of choice is k3s (GitHub link). K3s is pretty popular (at least according to the 15k GitHub as of Sep. 2020) minimal Kubernetes distribution from Rancher. For this project, the main value propostion of k3s is a minimal resource usage and also high quality developer tools.

Naively reading the k3s docs, it seems that all we need is to run single curl command and pipe that into shell. As a root. Sounds very safe, but it’s an experiment - so why not. :) Also note that Pi was renamed to k8s-master.

root@k8s-master:/home/pi# curl -sfL https://get.k3s.io | sh -
[INFO]  Finding release for channel stable
[INFO]  Using v1.18.9+k3s1 as release
[INFO]  Downloading hash https://github.com/rancher/k3s/releases/download/v1.18.9+k3s1/sha256sum-arm.txt
[INFO]  Downloading binary https://github.com/rancher/k3s/releases/download/v1.18.9+k3s1/k3s-armhf
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
root@k8s-master:/home/pi#

Wow, at this point it’s worth noting that k3s project kept the promise of “convenient way to download K3s and add a service to systemd or openrc”. Did we really manage to install Kubernetes (ARM even!) with one command?!

root@k8s-master:/home/pi# kubectl get pods --all-namespaces
NAMESPACE     NAME                                     READY   STATUS              RESTARTS   AGE
kube-system   helm-install-traefik-jstv5               0/1     ContainerCreating   0          65s
kube-system   local-path-provisioner-6d59f47c7-7r5kr   0/1     ContainerCreating   0          63s
kube-system   metrics-server-7566d596c8-xb8mg          0/1     ContainerCreating   0          63s
kube-system   coredns-7944c66d8d-9w9lk                 0/1     ContainerCreating   0          63s

It does seem to be the case. That being said, the Pi isn’t managing the load exactly well. It’s time to order few Pi 4s as cluster slowly boots up.

top - 16:45:07 up  4:57,  1 user,  load average: 7.37, 3.57, 1.82
Tasks: 168 total,   2 running, 166 sleeping,   0 stopped,   0 zombie
%Cpu0  :  25.6/8.4    34[||||||||||||||||||||||||||||||||||                                                                  ]
%Cpu1  :  37.3/6.0    43[|||||||||||||||||||||||||||||||||||||||||||                                                         ]
%Cpu2  :  54.5/2.4    57[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||                                           ]
%Cpu3  :  68.1/4.0    72[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                            ]
MiB Mem :    925.9 total,     21.9 free,    474.7 used,    429.2 buff/cache
MiB Swap:    100.0 total,     79.2 free,     20.8 used.    394.3 avail Mem

As few minutes pass, this is what we get:

root@k8s-master:/home/pi# kubectl get pods --all-namespaces
NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-6d59f47c7-7r5kr   1/1     Running     0          2m53s
kube-system   metrics-server-7566d596c8-xb8mg          1/1     Running     0          2m53s
kube-system   coredns-7944c66d8d-9w9lk                 1/1     Running     0          2m53s
kube-system   helm-install-traefik-jstv5               0/1     Completed   0          2m55s
kube-system   svclb-traefik-7wwmr                      2/2     Running     0          62s
kube-system   traefik-758cd5fc85-kbbnr                 1/1     Running     0          64s

I’m not exactly happy about the choice of Traefik (but that’s almost a material for another post), but let’s consider that acceptable for now. It’s time to get MetalLB up and try to advertise a service over BGP. Since we lack any automation on the router side, we need to prepare the router to peer with our MetalLB instance:

/routing bgp peer
add name=peer1 remote-address=10.0.3.254 remote-as=64500 ttl=default

Let’s see how MetalLB setup goes!

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

MetalLB also requires a configmap that configures BGP peers and address pools for service allocation. The used pool should be outside of the DHCP range to avoid possibly conflicts. Our choice is therefore 10.0.4.0/23 as that subnet happens to be unused in our network.

$ cat << EOF > metallb-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    peers:
    - peer-address: 10.0.2.1
      peer-asn: 65530
      my-asn: 64500
    address-pools:
    - name: default
      protocol: bgp
      avoid-buggy-ips: true
      addresses:
      - 10.0.4.0/23
EOF
$ kubectl apply -f metallb-cm.yaml
configmap/config created

After setting everything up, it’s time to create a service type LoadBalancer and see if MetalLB is able to advertise the IP. Actually, wait a second. Since there is a Traefik in the cluster, don’t we already have one LoadBalancer service?

root@k8s-master:/home/pi# kubectl get svc --all-namespaces
NAMESPACE     NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
default       kubernetes           ClusterIP      10.43.0.1       <none>        443/TCP                      9m55s
kube-system   kube-dns             ClusterIP      10.43.0.10      <none>        53/UDP,53/TCP,9153/TCP       9m51s
kube-system   metrics-server       ClusterIP      10.43.213.246   <none>        443/TCP                      9m44s
kube-system   traefik-prometheus   ClusterIP      10.43.10.242    <none>        9100/TCP                     7m45s
kube-system   traefik              LoadBalancer   10.43.168.74    10.0.3.254    80:31932/TCP,443:30621/TCP   7m44s

Uh-oh. On the positive side, we do happen to have a LoadBalancer service. On the other hand, 10.0.3.254 certainly doesn’t belong to our pool - 10.0.4.0/23. What went wrong?

$ /routing bgp peer p
Flags: X - disabled, E - established
 #   INSTANCE                                               REMOTE-ADDRESS                                                                         REMOTE-AS
 0 E default                                                10.0.3.254                                                                             64500

MetalLB managed to peer with the router successfully. Checking the speaker logs…

$ kubectl logs -n metallb-system speaker-96mxn --tail 100
...
{"caller":"main.go:267","event":"startUpdate","msg":"start of service update","service":"kube-system/traefik","ts":"2020-09-27T15:54:55.90341073Z"}
{"caller":"main.go:293","error":"assigned IP not allowed by config","ip":"10.0.3.254","msg":"IP allocated by controller not allowed by config","op":"setBalancer","service":"kube-system/traefik","ts":"2020-09-27T15:54:55.903652343Z"}
{"caller":"main.go:369","event":"serviceWithdrawn","ip":"","msg":"withdrawing service announcement","reason":"ipNotAllowed","service":"kube-system/traefik","ts":"2020-09-27T15:54:55.903869425Z"}
{"caller":"main.go:294","event":"endUpdate","msg":"end of service update","service":"kube-system/traefik","ts":"2020-09-27T15:54:55.903962654Z"}
...

There seems to be a process interferring with what MetalLB controller attempts to do, and the external IP toggles between node IP and our advertised IP.

$ /routing prefix-lists> /ip route p
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, B - blackhole, U - unreachable, P - prohibit
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
 ...
 3 ADb  10.0.4.1/32                        10.0.3.254               20
 ...

The router also accepted the route, so what went wrong? Digging into the k3s docs, it’s apparent that k3s ships with some form of a LoadBalancer controller. Luckily, the docs mention that this is an optional component that can be disabled with --disable servicelb, perfect! The doc also mentions that it’s possible to disable Traefik with --disable traefik, so let’s try to combine these two and see where we get. It’s again time to appreciate how powerful tools k3s ships with. There’s a script to uninstall everything and start from scratch!

$ k3s-uninstall.sh

Second attempt, now without Traefik and ServiceLB

With our newly obtained knowledge, let’s get k3s up and running without the components we don’t want:

curl -sfL https://get.k3s.io | sh -s - --disable traefik --disable servicelb

And after few minutes, this is what we get:

NAMESPACE     NAME                                     READY   STATUS    RESTARTS   AGE
kube-system   metrics-server-7566d596c8-pdz8w          1/1     Running   0          73s
kube-system   local-path-provisioner-6d59f47c7-t5c8v   1/1     Running   0          73s
kube-system   coredns-7944c66d8d-wj4pw                 1/1     Running   0          73s

Perfect, a minimal cluster! Lack of Traefik also means that there is no pre-allocated LoadBalancer service. Let’s start by deploying NGINX ingress controller with a service type LoadBalancer to see if we’re able to obtain an external IP for the service. Since NGINX controller for bare-metal ships with a NodePort service by default, we need to do the necessary change from NodePort to LoadBalancer.

$ wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.35.0/deploy/static/provider/baremetal/deploy.yaml
$ sed -i 's/NodePort/LoadBalancer/' deploy.yaml
$ kubectl apply -f deploy.yaml
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
configmap/ingress-nginx-controller created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
service/ingress-nginx-controller-admission created
service/ingress-nginx-controller created
deployment.apps/ingress-nginx-controller created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created
serviceaccount/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created

It’s time to check the state of the LoadBalancer service:

$ kubectl get svc -n ingress-nginx
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller-admission   ClusterIP      10.43.104.173   <none>        443/TCP                      18s
ingress-nginx-controller             LoadBalancer   10.43.136.0     <pending>     80:31657/TCP,443:31381/TCP   18s

Repeating this for a few times while also checking the logs, it doesn’t seem that there is anything to assign the IP. That is expected - we disabled the ServiceLB component of k3s. Time to try MetalLB again!

$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
$ kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
$ cat << EOF > metallb-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    peers:
    - peer-address: 10.0.2.1
      peer-asn: 65530
      my-asn: 64500
    address-pools:
    - name: default
      protocol: bgp
      avoid-buggy-ips: true
      addresses:
      - 10.0.4.0/23
EOF
$ kubectl apply -f metallb-cm.yaml

After a while, the BGP peering is seen as established on the router side.

$ /routing bgp peer p
Flags: X - disabled, E - established
 #   INSTANCE                                               REMOTE-ADDRESS                                                                         REMOTE-AS
 0   default                                                10.0.3.254                                                                             64500

And MetalLB controller has assigned an IP address to the service!

$ kubectl get svc -n ingress-nginx
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller-admission   ClusterIP      10.43.104.173   <none>        443/TCP                      10m
ingress-nginx-controller             LoadBalancer   10.43.1.19      10.0.4.1      80:30579/TCP,443:31078/TCP   100s

Hello world!

It’s time to see if BGP route advertisement really worked and we can reach the service from the home network (assuming that the firewall configurations permits that).

$ curl -vvv 10.0.4.1
*   Trying 10.0.4.1...
* TCP_NODELAY set
* Connected to 10.0.4.1 (10.0.4.1) port 80 (#0)
> GET / HTTP/1.1
> Host: 10.0.4.1
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Server: nginx/1.19.2
< Date: Sun, 27 Sep 2020 16:21:20 GMT
< Content-Type: text/html
< Content-Length: 153
< Connection: keep-alive
<
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.19.2</center>
</body>
</html>
* Connection #0 to host 10.0.4.1 left intact
* Closing connection 0
$ /ip route> /ip route p
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, B - blackhole, U - unreachable, P - prohibit
 #      DST-ADDRESS        PREF-SRC        GATEWAY            DISTANCE
...
 3 ADb  10.0.4.1/32                        10.0.3.254               20
...

Perfect! Although the server returns 404, the Server header hints that we’ve reached the NGINX. Any cluster service can now be exposed via the Ingress resource.

And that’s it for the day! The Raspberry Pi 3 B is somewhat overloaded, so it’s time to wait for Pi version 4 to arrive before experimenting further. All in all, this has been a great experience with K3s distribution - the ease of setup for project like this is perfect.