My Homelab for 2024 Part 2

Hugo CAMPION included in series Homelab 2024

04-02-2024 11-02-2024 1831 words 9 minutes

Series - Homelab 2024

Contents

The Kubernetes cluster, where a bunch of services lives…

1.26 was the best logo ever right…

The fundation

The cluster consists of 6 Virtual Machines running on the Proxmox Cluster:

3 Control Plane node with 2 Cores and 6G of RAM
3 Worker node with 6 Cores and 20G of RAM

At the time I’m writing this, they are running Debian 11 as OS and Kubernetes 1.28. I will update them to Debian 12 as I already started migrating some machines to 12 and the Hypervisor are already on 12. But for Kubernetes I like to stay one version behind, as they support the 3 last version, there is no problem doing this.

The cluster is provisionned and updated via the Kubeadm tool, I wanted to learn the Kubernetes “Vanilla” tool, then I found it quite simple so I ditched to idea to switch to K3S… even this product is quite cool too !

It is in HA Mode with Etcd as backend, CoreDNS for the cluster DNS and I use kube-vip in ARP mode to share a Virtual IP between my control plane nodes. I can always access the API is one node still up !

I use Calico as my CNI ( Networking ) plugin, deployed with the Calico Operator, but I want to try Cillium

Each proxmox host one worker and one control plane node.

Storage

Longhorn

I wanted to have shared storage between my nodes, but not by using my NAS as a single point of failure. So I use my NAS to access data on it via NFS Mounts for Musics and Media that are not used by critical workloads, but for things that I do not want off when my NAS is in maintenance, I use Rancher Longhorn. As my workers are connected through their own bridge with 10G, longhorn is taking advantage of the faster network.

Longhorn data are on a dedicated disk on the VM, this disk is on the nvme storage of the proxmox. I can case longhorn fill up the disk, the workload using longhorn will crash but the kube node will not…

Local volumes

I also use local volumes for some workloads when I prefer to manage the availability directly with the product… The most common use case I have for that is Postgres Databases. I do not need to have replicated storage as the replication is managed by the database itself… I also use this for Prometheus metrics for example because my Prometheus is in HA Mode too, more on this later !

This is the storage class I use for that :

1
2
3
4
5
6
7


---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

And if I want a volume :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-local-worker3-privatebin
spec:
  capacity:
    storage: 10G
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/postgres-volume/privatebin # Path on the host you want to use
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s-worker3

Scheduling

Notice how I make sure that the pod which ask for that volume will be scheduled on the right host, I use nodeAffinity for that. If you do not do that, the pod could be started on a different node each time, with data that is obviously not synced…

Do not use on NON-HA wordloads

Due to the fact that with this pattern the pod will be tied to a particular node, in case that node is down the pod is down too because unschedulable. So you have to make sure that you can loose the pods that are using this volume. That is my case with the workloads I choose to run like this !

LoadBalancer

To deploy Loadbalancer type service I use Metallb in ARP mode. My services are tied to a virtual IP and Metallb will manage that VIP to make it follow the service when the pods are moving. The downside of this mode is that all traffic is routed to one node, so it can not be considered as true loadbalancing, but for failover it works like a charm !

In case you need true load balancing

Metallb could also run in BGP mode where you will be able to split the traffic across nodes, I might try that with BIRD on my Linux router…

Ingress Controller

The Ingress controller of that cluster is Traefik because when I started to learn docker and swarm I found it perfect, so I tried it on kube and found the same level of happyness with it ^^. I use the Custom Ressources Definitions from Traefik but the “regular” ingress api object is also enabled so that other software can leverage traefik like helm, operators or cert-manager.

I also use it as my reverse proxy for non-kubernetes workload. My NAS, Hypervisor WebUI, PBS WebUI are also served by traefik. To do this I use external service defined “by hand”.

Here is the proxmox webUI example which use 3 servers for backend :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


apiVersion: v1
kind: Service
metadata:
  name: external-pve
  namespace: traefik
spec:
  clusterIP: None   # We set clusterIP to none as the service is OUTSIDE
  ports:
    - protocol: TCP
      port: 8006
---
kind: Endpoints
apiVersion: v1
metadata:
  name: external-pve
  namespace: traefik
subsets:            # Then we set the endpoints "by hand"
  - addresses:
      - ip: xxx.1
      - ip: xxx.2
      - ip: xxx.3
    ports:
      - port: 8006

Then the “normal” Traefik IngressRoute to use it :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: external-pve
  namespace: traefik
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`xxx.domain.com`)
      kind: Rule
      services:
        - name: external-pve
          port: 8006
          scheme: https         # We use https because the proxmox webui listen on TCP 8006 with TLS
          sticky:
            cookie:
              name: traefik-sticky # And we set stickyness to make the NoVNC console work !
  tls:
    secretName: xxx.domain.com

Scheduling

For monitoring purpose, log collection and metrics are enabled on the traefik

Addons

There is some Addons that I think are really usefull in a Kube cluster.

Kube metric server

The Kube Metrics Server is here to give you basic metrics on your cluster, it is needed for the top commands to works like so :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


kubectl top nodes
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
manager1   176m         8%     3566Mi          61%
manager2   170m         8%     3828Mi          65%
manager3   249m         12%    3548Mi          60%
worker1    978m         16%    14335Mi         71%
worker2    1247m        20%    15032Mi         75%
worker3    3362m        56%    11942Mi         59%

kubectl top pods
NAME                       CPU(cores)   MEMORY(bytes)
traefik-78d8f67fc8-scpv8   6m           115Mi
traefik-78d8f67fc8-sxb4z   13m          220Mi

But this is also what you need for Horizontal and Verticial Pod Autoscaler to work ! This is less true now because I’m pretty sure that I saw you can connecter thoses autoscaler to custom metrics providers.

Sealed Secrets

SealedSecrets is mandatory if you want to go full GitOps like me, it will allow you to push secrets to a git repo, even a public one, because your secrets become encrypted and only your kubernetes cluster can decrypt them ! It works like this :

You create a “regular” kubernetes secret in mysecret-unsealed.yaml
You encrypt that secret with kubeseal -f mysecret-unsealed.yaml -w mysecret.yaml
It will output a file to mysecret.yaml
kubectl apply -f mysecret.yaml
The kubeseal controller will see the sealedsecrets object in your cluster and will unencrypt it to a regular secret that you can use like any other k8s secret !
You can ditch the unsealed file if you want, in any case that file should NOT be commit to a git repo ! The sealed one is made for that !

I use it for every secrets, all my secrets are commited to a git repo, I also use a .gitignore with a pattern in case I forgot unsealed files somewhere…

Scheduling

You will need the kubeseal binary on your machine to be able to encrypt secrets, but it will manage and rotate key for you, and the install is very simple

Kured

Kured is as Good as simple… It watches your nodes ( deployed as a DaemonSet ) for a specific file and reboot them when that file is present. So as I have debian with automatic upgrades configured, it will reboot the node when new kernels are installed for exemple. You can configure a lot of things like :

Planning, when nodes are allowed or not to reboot
Where to send notification on reboot, I use teams
Only drain pods that match a label
A Prometheus instance to abort reboot if alerts are found
Max nodes to reboot at the sametime …

Defrag Cron Jobs

I use to have regular warnings of my Etcd Database being fragmented so I created a cronjob to take care of this and never had anymore alerts :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44


apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-defrag-cronjob
  namespace: kube-system
spec:
  schedule: "00 16 * * *"
  successfulJobsHistoryLimit: 5
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: etcd
              image: k8s.gcr.io/etcd:3.5.3-0
              securityContext:
                seccompProfile:
                  type: RuntimeDefault
              args:
                - /bin/sh
                - -c
                - etcdctl='etcdctl';
                  export ETCDCTL_API=3;
                  etcdctl --endpoints="https://IP_OF_A_MANAGER_NODE:2379" --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/etcd/peer.crt" --key="/etc/kubernetes/pki/etcd/peer.key" defrag --cluster;
              volumeMounts:
                - mountPath: /etc/kubernetes/pki/etcd
                  name: etcd-certs
          nodeSelector:
            node-role.kubernetes.io/control-plane: ''
          volumes:
            - hostPath:
                path: /etc/kubernetes/pki/etcd
                type: DirectoryOrCreate
              name: etcd-certs
          restartPolicy: OnFailure
          tolerations:     # Tell kube that this pod can run on managers !
            - key: "node-role.kubernetes.io/master"
              operator: "Equal"
              value: ""
              effect: "NoSchedule"
            - key: "node-role.kubernetes.io/control-plane"
              operator: "Equal"
              value: ""
              effect: "NoSchedule"

Cert Manager

Cert-manager allows you to request / issue certificates automatically. Several projects use it so I guess everyone needs it in a cluster at some point. Unless you manage certs yourself. I also use it to request HTTPS Certs for Traefik… I use the HTTP challenge with Let's Encrypt.

Compliance checker

I found this little tool called Popeye than can scan a cluster and tell you where you made some not so best practices things. There is an option to output in HTML format… So I use it directly in cluster, as a CronJob, and expose the result with a Caddy Web server pod. This way I can check the compliance of my cluster with a web browser.

Diun Image checker

Diun will watch your cluster and send alerts when new version of Images you are using got a update. As I’m running full GitOps I will switch this to Renovate but this little cool tool is worth mentionning !

Next part will be the supporting services running on that cluster