For the last 50 days, I have always wanted to create a Kubernetes Cluster with my storage engine as Longhorn. I use MicroK8S as my Kubernetes distro. I’ll be honest with you, this was the steepest learning curve I have had to endure in my brief life thus far. Within 50 days, I have tried this, given up, tried again, given up again – repeating this for very long. I decided to move on and look for a different storage engine for K8S, but I just couldn’t leave Longhorn alone. It felt like quitting. And quitting is no good habit. I had to go back and conquer it. I have done it, and this feels like freedom!
What was the Challenge
From the beginning, my aim was simple: Create a single node kubernetes cluster using MicroK8S and Longhorn. I was able to install MicroK8S well using snap on Ubuntu 20.04. I also followed the longhorn.io docs and installed Longhorn as per the guide. But it couldn’t work. The longhorn-driver-deployer pod couldn’t start. I got a workaround on the longhorn github page but this caused a mess once the server was rebooted. After every reboot, my mount points vanished, kubelet couldn’t attach a new pod to the volumes automatically, and the data in my stateful apps was always missing. Imagine rebooting a server and finding zero databases in your MySQL database!
The fact that I also had to manually mount the pods by creating their mount points was also so off. All the while the default MicroK8S storage(microk8s-hostpath) worked so well – but its not recommended for production.
This was definitely not a production-worth setup. Attempting to run production in such a setup was a risk, of course.
So I needed a cluster which could dynamically provision persistent volumes and would restart volumes and necessary pods automatically after server reboot. I got the answers this weekend and this is how I did it
- Ubuntu 20.04 server with root access
- root access to the server
- NFSv4, iSCSI initiator and snap
1.Update repos and install required packages
For this setup, NFSv4, snap and iSCSI are required
2. Start and enable nfs-common
By default, nfs-common installs on Ubuntu 20 while masked. Unmask it first so that its possible to start and enable it
[email protected]:~# rm -f /lib/systemd/system/nfs-common.service [email protected]:~# systemctl daemon-reload
Now, start and enable the service
3. Start and enable iSCSI
What you need is iscsid (initiator) service and not open-iscsi (client) service. You may notice that trying to start open-iscsi service does not work unless you troubleshoot as show on this article
[email protected]:~# systemctl start iscsid
[email protected]:~# systemctl enable iscsid
4. Install microk8s
[email protected]:~# snap install microk8s --classic --channel=latest
5. Enable microk8s dns (CoreDNS) and Ingress (Nginx ingress)
[email protected]:~# microk8s enable dns ingress
6. Install helm package manager.
[email protected]:~# snap install helm --classic
7. Install longhorn
Here, we use the official documentation for Longhorn 1.22 to install Longhorn
- Add the longhorn helm repo then update helm repos
[email protected]:~# helm repo add longhorn https://charts.longhorn.io [email protected]:~# helm repo update
- Install longhorn using Helm3
[email protected]:~# helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace
If the installation above fails due to kubernetes being unreachable, please check the troubleshooting section
From the docs and everything, this should be sufficient for the cluster to run. But its not. A quick view of the status of pods on the longhorn-namespace reveal that the longhorn-driver-deployer does not start well.
Check out the troubleshoot section on how to fix this and get it working well.
1. Error 1: Cannot install helm chart due to Kubernetes cluster unreachable:
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get “http://localhost:8080/version?timeout=32s”: dial tcp 127.0.0.1:8080: connect: connection refused
Create a kubeconfig file as below
- Create a .kube folder inside home drectory
[email protected]:~# mkdir .kube
- Create a file called config and set its permissions
- Copy the kubeconfig onto the file as follows
[email protected]:~# microk8s kubectl config view --raw >> .kube/config
2. Error 2: Longhorn driver does not start
After successful installation of longhorn and microk8s, longhorn-driver-deployer does not start
- status of longhorn driver is Init:0/1 instead of Running
- status of longhorn-ui is CrashLoopBackOff. It keeps crashing and never starts
Longhorn manager and UI cannot reach the
longhorn-backend service due to failed DNS resolution within the cluster.
As this is a DNS issue, you should now check the kubernetes DNS deployment, CoreDNS.
- Check the coredns pod for errors
[email protected]:~# kubectl logs coredns-7f9c69c78c-7dsjg -n kube-system
An output as below indicates an error in DNS resolution. CoreDNS cannot resolve dns.
.:53<br>[INFO] plugin/reload: Running configuration MD5 = be0f52d3c13480652e0c73672f2fa263<br>CoreDNS-1.8.0<br>linux/amd64, go1.15.3, 054c9ae<br>[INFO] 127.0.0.1:35941 - 17701 "HINFO IN 4725281324889338256.7661067425143258365. udp 57 false 512" NOERROR - 0 4.001660981s<br>[ERROR] plugin/errors: 2 4725281324889338256.7661067425143258365. HINFO: read udp 10.1.73.95:33226->220.127.116.11:53: read: no route to host<br>[INFO] 127.0.0.1:48855 - 60060 "HINFO IN 4725281324889338256.7661067425143258365. udp 57 false 512" NOERROR - 0 2.00083768s<br>[ERROR] plugin/errors: 2 4725281324889338256.7661067425143258365. HINFO: read udp 10.1.73.95:37897->18.104.22.168:53: read: no route to host<br>[INFO] 127.0.0.1:40459 - 55315 "HINFO IN 4725281324889338256.7661067425143258365. udp 57 false 512" NOERROR - 0 0.000247689s<br>[ERROR] plugin/errors: 2 4725281324889338256.7661067425143258365. HINFO: read udp 10.1.73.95:38124->22.214.171.124:53: read: no route to host
To fix this:
- allow port 53 udp on the firewalls such as ufw, iptables.
- turn off apparmor
- check if the cloud provider has a cloud based server covering your server and disable it or permit necessarry traffic
- check /etc/resolv.conf and update the nameserver to public resolvable IP address e.g 126.96.36.199 or your cloud providers dns resolver
- reboot the server and recheck the logs for coredns again. the error should be resolved.
After the steps above, longhorn-driver changes status from Init:0/1 to CrashLoopBackOff
Longhorn fails to get the kubelet root directory. This is because kubelet root directory is not default. Checking the longhorn-driver pod error log, below error is observed.
[email protected]:~# kubectl logs longhorn-driver-deployer-75f68555c9-hwmwg -n longhorn-system
time="2021-10-31T21:46:05Z" level=error msg="failed to get arg root-dir. Need to specify \"--kubelet-root-dir\" in your Longhorn deployment yaml.: failed to get kubelet root dir, no related proc for root-dir detection, error out" time="2021-10-31T21:46:05Z" level=fatal msg="Error deploying driver: failed to start CSI driver: failed to get arg root-dir. Need to specify \"--kubelet-root-dir\" in your Longhorn deployment yaml.: failed to get kubelet root dir, no related proc for root-dir detection, error out"
Set the kubelet root directory on longhorn driver deployment.
For MicroK8S, kubelet root directory is /var/snap/microk8s/common/var/lib/kubelet
- Edit the longhorn-driver-deployment as follows
[email protected]:~# kubectl edit deployment/longhorn-driver-deployer -n longhorn-system
- Under the environment variables section, just above name: CSI_ATTACHER_IMAGE add the lines below then save and close
- name: KUBELET_ROOT_DIR value: /var/snap/microk8s/common/var/lib/kubelet
The longhorn-driver deployment deployment should then restart itself and this time, it succeeds. All pods are now online and Longhorn is ready for use. Also, after server reboot, the pods restart themselves and are in running state as expected.
- Longhorn Driver Deployer cannot start
- CoreDNS Fails to resolve DNS
- Longhorn CrashLoopBackOff
- Microk8s kubelet root directory