My Homelab for 2024 Part 3

Series - Homelab 2024

Infrastructure supporting services

Authelia Logo

After deploying the first services I wanted to add authentication everywhere because some webUI like Longhorn or Thanos do not implement auth. At the beginning I did basicauth with my traefik ingress but I quickly searched for a true Single Sign On solution. My criteria were :

  • Forward auth compatible to use it with traefik
  • All in one solution
  • OpenID provider
  • FIDO keys support

So I tried authentik then found it complicated for nothing, and this BUG

Then I found Authelia and use it now for everything. It is running in front of the services that needs auth because they do not provide it themselves. Services that are compatible with OpenID use it as a provider, that way I can login one time and acces everything. I use Yubikeys as my second factor.

Authelia Login banner on proxmox

Here is the proxmox exemple

All my authelia users are stored in a LDAP database, I use LLDAP to manage them, it is a very lightweight ldap implementation that is perfect for that use case.

Synology compatibility
Just be aware that LLDAP is not working with Synology due to This. For now, I dit not find anything else that won’t work with LLDAP !

I use the following middleware in traefik for the services that can not do OpenID :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
---
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: forwardauth-authelia
  namespace: authelia
  labels:
    app.kubernetes.io/instance: authelia
    app.kubernetes.io/name: authelia
spec:
  forwardAuth:
    address: http://authelia.authelia.svc.cluster.local/api/verify?rd=https://whatever.yourdomain.com/
    authResponseHeaders:
      - Remote-User
      - Remote-Name
      - Remote-Email
      - Remote-Groups

I used Ansible several times for work and found it amazing. We also leverage AWX to manage / launch our playbooks stored in GIT but for the Homelab I wanted to try to little brother of AWX : Semaphore. Features parity is not here, but for the homelab it will cover my needs.

Semaphore Web UI

Just managed to launch a playbook with everything stored in my git !

Git Logo

As said in Part 1, I have a Gitea instance ( soon to be replace by Forgejo ) that store every single YAML files deployed in my Kubernetes Cluster. I use the GIT as a single source of truth for the cluster. I will try to extend that beyond the Kubernetes cluster, with my VM on the proxmox nodes and with Ansible.

I use Woodpecker as my CI tools to build this website for example and some container Images. Nothing fancy here, juste a standard woodpecker install that runs in Kubernetes and launch pods for each steps in the pipeline. I use Kaniko to build docker images in kubernetes and push them in my private registry.

Woodpecker WebUI

ArgoCD is the tool that sync everything between my GIT repo and my cluster. I deployed it in HA mode with the YAML files of the project. With autosync and autoheal enabled it will fix the cluster if the state drifts from the desired one that is stored in Git. There is an additional tool that you can use : ArgoCD Image Updater, that will update the image of your app if it finds a new one in the registry.

Not ideal...
The image update project last release was one year ago and it doest not support webhook… It will polls the registry every 3 minutes so the changes are not instant and if you have a lot of images to watch it will be problem !

But asides from that Image updating part, ArgoCD is a wonderfull tool that also provide a nice view on all your apps :

App deployed with ArgoCD

I use it to deploy plain YAML files, Kustomize files and I recently found that you can deploy a Helm chart from a repo with values from another repo so I will migrate all my Helm chart to ArgoCD ! I should be able to get rid of the image updater addon with using Renovate bot… I also love the fact that your ArgoCD apps objects can be deployed as plain YAML too because they are CRDs, Argo store everything in Kube and do not need a Database or persistent volume :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  annotations:
    argocd-image-updater.argoproj.io/allow-tags: regexp:[1-9]+-[1-9]+
    argocd-image-updater.argoproj.io/git-branch: main
    argocd-image-updater.argoproj.io/image-list: yourprivateregistry.domain.com/hugo/hugo-blog
    argocd-image-updater.argoproj.io/update-strategy: name
    argocd-image-updater.argoproj.io/write-back-method: git
    notifications.argoproj.io/subscribe.on-sync-succeeded.teams: geco-teams
  name: hugo-blog-prod
  ...
spec:
  destination:
    namespace: hugo-blog-app
    server: https://kubernetes.default.svc
  project: default
  source:
    path: hugo-blog/overlays/prod
    repoURL: https://yourgitrepo.domain.com/h.campion/hugo-k8s-manifests
    targetRevision: HEAD
  syncPolicy:
    automated:
      selfHeal: true

You can also see there the annotations used to config to image updater addon and the teams notifications on succesfull sync.

Cloud Native Postgres Logo

For sure one of my favorite tool of the stack… And this one was suggested to me by someone on reddit, thanks for this one mate, I do not know your username anymore sorry…

CNPG is the solution I use to deploy all my databases. I went with postgres because of that operator, it just works, and without single failure since the start for me… And trust me I broke it everyway possible at the start and never lost a single bit of data… CNPG will do all of this for you:

  • Bootstrap a replicated PGSQL cluster with 3 members in my case
  • Init the DB and the user
  • Init from a previous backup if you need to
  • Create a scheduled backup of your DB to a S3 storage
  • Create the key / certificates combo used to connect to your database
  • Store them in a regular kube secret to let you use it in your app
  • Deploy an exporter to let you watch you cluster with prometheus.
  • Selfheal your cluster if you destroy a member…
  • More cool shit that I’m not aware of…

It works well with ArgoCD as you can definie your DB cluster as YAML files like so :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
# Example of PostgreSQL cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: vaultwarden
spec:
  backup:
    barmanObjectStore:
      destinationPath: "s3://cnpg"
      endpointURL: "https//your_s3_compatible_endpoint:9000"
      s3Credentials:             # The credentials of your s3 bucket
        accessKeyId:
          name: minio-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: minio-creds
          key: ACCESS_SECRET_KEY
  instances: 3                   # Number of members you want in that cluster
  primaryUpdateStrategy: unsupervised
  storage:
    size: 10G
    storageClass: local-storage  # As explained before,
    # I use local volumes as CNPG will manage
    # HA on its own and I want max IO performance for the database...
  bootstrap:
    initdb:
      database: vaultwarden
      owner: vaultwarden
      secret:
        name: vaultwarden-db-secret
  monitoring:
    enablePodMonitor: true    # Tell cnpg to enable prometheus monitoring

Prometheus Infrastructure Schema
Prometheus Infrastructure Schema
Prometheus Infrastructure Schema

I use Prometheus and Thanos deployed in Kubernetes with the kube-prometheus-stack. It ease the setup of :

  • Prometheus operator that will deploy and configure Prometheus and AlertManagers
  • Prometheus instances, 3 in my case, with local volume like CNPG
  • AlertManagers Instances
  • Sidecars for the prometheus instances that Thanos will need
  • Kube State Metrics which is a Kubernetes exporter
  • Node exporter deployed as a DaemonSet on all kube nodes
  • All the needed CustomRessourceDefinitions
  • Grafana to view dashboards

I found this HelmChart very good, you will need a good amount of time to read the values file and customize it for your needs but the default is killer to get started fast. I added a lot of exporters to the defaults ones :

  • Every linux server outside my cluster with the Node exporter
  • Powerdns cluster metrics exported by recursors and authoritative servers + piHole exporter
  • Blackbox exporter to monitor HTTP metrics of my services
  • PushGateway that is used to receive asynchronous metrics like K8UP job completions
  • SNMP Exporter to poll SNMP on my NAS and Unifi devices, switches…
  • Speedtest Exporter for internet bandwidth checks

I also try to gather metrics from all the services that supports it by leveraging ServiceMonitors or PodMonitors:

  • Prometheus / Thanos components metrics
  • Loki components
  • Grafana
  • Authelia
  • Traefik
  • CNPG clusters
  • Healthcheckio status
  • Longhorn
  • Argo and woodpecker
  • …and everything that can expose a /metrics endpoint ^^

On top of that I use thanos to be able to query my 3 prometheus instances, and ship the data to S3 compatible minio on my NAS. I keep one week of data on the prometheus instances, in case of NAS failure I have 7 days of metrics to look for if needed. Then I keep one year on the NAS with the compactor downsampling the data to 1 hour after 30days. In my every day usage, I do my queries with the Thanos webUI, and look alerts on it too. My Grafana is also connected to the Thanos query frontend.

Thanos query example in the WebUI

I used the bitnami chart for Thanos to help with the config of :

  • Thanos querier that will fetch metrics from Sidecars and Storage Gateway
  • Thanos query frontend that enhance the query path with caching and retry
  • Thanos compactor that will compact / downsample TSDB block in your s3 storage
  • Thanos storage gateway that make your S3 storage available as a datasource for the querier !

Of course Grafana is used because it is the best Open Source visualization product on earth !

In kubernetes your can configure it to be autoprovisionned from configMaps, that way the datasources and the Dashboards are kubernetes objects that can be stored in my Git and deployed with ArgoCD. I gathered some dashboards from internet, some comes directly from helm-charts or operator, and some from me. I’m currently in the process of migrating my own dashboards to configMaps to make my Grafana “stateless”.

Here are some example :

Grafana Overview dashboard

Overview of the kubernetes cluster

Grafana CNPG Dashboard

Overview of one CPNG cluster

Grafana Blackbox Dashboard

BlackBox probe metrics panel

Nice kube dash
Take a look here for nice kube dashboards that are very usefull !

Karma WebUI

I also have a screen always on with Karma on it which only display alerts

To monitor my cron tasks and backups that are not in Kubernetes I deployed Healthchecksio. In two words : a service that will wait for the process you want to say Hello ! on a defined schedule and send you alerts when it does not

Healthchecksio WebUI

Here we can see that some of my backup are in progress, in yellow. Green ones are already done

This website use Umami to do analytics. I considered Matomo too but the lack of postgres support made me give up the idea. I did not want to deal with two database operators just for Matomo. Umami is also simpler / lighter. Initially I was using a HUGO theme that was not compatible with Umami so I hid the script snippet in the footer but since I switched to a Umami compatible theme I should remove that and use the regular config.toml…

Umami site dashboard

As the site is not online the dash is pretty empty but you get the idea

Configuration
What I like about Umami is you can tell it to not gather analytics for some domain / from some IP, so I disable everything for my DEV domain, and everything from my monitoring system
Loki Logo

I’m still testing this on my setup because I have issue with storage right now ( space ) that I have to migrate but my previous tests where good. The setup I am evalutating is Loki installed in scalable mode with the official helm chart. The grafana agent is installed in Operator mode and use Promtail ( the loki agent ) under the hood to gather logs from my Pods. I managed to gather the acces log of my Traefik ingress controler without problem.

I used the same technique as CNPG and Prometheus for storage, local persistent volumes to have maximum performance. Logs are all shipped by Loki to my NAS via S3 protocol.

More on this later…

K8UP Logo

I use K8up to backup all my kubernetes persistent volumes.

Objects that I backup :

  • Host themselves with the Proxmox backup client CLI in a cronjob => to the pbs server
  • Virtuals machines config and drive with => to the pbs server
  • Kubernetes PVC with K8UP => to my NAS
  • CNPG Databases => to my NAS
  • Longhorn PVC with longhorn backup system => to my NAS
  • Configurations => to my GIT that is a saved vm to the pbs server

The nas is snapshoted several times a day as I use btrfs and I sync it to an offsite NAS every evening.

The proxmox backup server is also synced to offsite PBS everyday.

The backups are watched by prometheus or healthcheckio and send teams notifications in case of failure. I also see the failure on karma or grafana.

Grafana Backup Dashboard

Let’s now talk about other services