Distributed Filesystems

Introduction

If you are familiar with Docker, you are likely aware of the concept of bind mounts. This is the practice of mapping a folder within the container to a static location on the host. We can actually do that in Rancher as well: mapping a folder on the host can be done with the hostPath directive.

Now we can do that, but we won’t. Bind mounts have several problems that quickly become apparent in kubernetes:

  • The permissions of the bind mount are reliant on the host’s user IDs. When you’re running every container as root, this is less of an issue. However most production container deployments do not run as root for security. This will cause you grief with hostPath.
  • You have to figure out how to atomically back up the volume. This is problematic, especially if you’re not using a next gen filesystem like btrfs or zfs.
  • What happens when you have more than a single node? hostPath does not scale well with high availability, and is in fact not highly available. You lose the node the bind mount is on, that data is gone.

These are problems that distributed Filesystems solve.

TL;DR

In this article, we will:

  • Deploy the Longhorn distributed filesystem
  • Set up a persistent volume for caddy to host permanent data

Deploying Longhorn

There are many options for a distributed filing system. Rancher (the company) also develops a distributed filing system, called longhorn. Longhorn integrates with Rancher and also works on top of existing ext4/xfs filing systems: both of which are good for us. Let’s install it!

  • In Rancher, Navigate to Apps & Marketplace. Navigate to longhorn.

  • Once in the longhorn description, pick install

  • Nothing to change on the first page, simply hit next

  • We have some settings to change for a single node deployment:
  • Longhorn Default Settings →
    • Check Customize Default Settings
    • set Default Replica Count to 1
  • Longhorn CSI Driver Settings →
    • Longhorn CSI Attacher Replica Count to 1
    • Longhorn CSI Provisioner Replica Count to 1
    • Longhorn CSI resizer Replica Count to 1
    • Longhorn CSI Snapshotter Replica Count to 1
  • Longhorn Storage Class Settings →
    • Storage Class Retain Policy to Retain
    • Default Storage Class Replica Count to 1

You can see me do this below:

  • once that’s all done, press Install!

After a while, you should get a SUCCESS message. Once that is observed, longhorn is now available for creating volumes. You should also have a new management section on the sidebar:

you may notice that a chunk of your CPU is now reserved. This is because Longhorn really, really doesn’t want to fail, and reserves 12% (by default) of your CPU to keep the distributed filesystem operational even at full load.

Assigning Persistent Storage to our container

Now that we have persistent storage to provision, let’s modify our deployment to actually involve that persistent storage.

  • Back in the rancher cluster explorer, navigate to Deployments. Use the 3 dots at the right to Edit Config

  • Under storage, choose Create Persistent Volume Claim

  • We need to set the following
    • The Persistent Volume Name to caddy
    • Use a Storage Class to provision a new Persistent Volume
    • Access Mode of Single Node Read/Write
    • Set the Storage Class to longhorn
    • Set the capacity to 2GB (or whatever you like)
    • Set the Mount Point to /usr/share/caddy
    • Save

If all goes well, your application should redeploy (killing our website in the process). However the new deployment now has a persistent container attached. From now on the /usr/share/caddy folder will persist it’s data in a longhorn volume. You can test that by generating a new index.html file using the container shell.

echo "<h1>hello world, this website is persistent!</h1>" > /usr/share/caddy/index.html

Now if you edit or redeploy your container, the webpage will always read “Hello World, this website is persistent!'“.

Storage Classes, Persistent Volume Claims, and Persistent Volumes

Coming from Docker, this terminology has probably thrown you for a loop. Let’s break down what they are

  • a persistent volume claim is a container asking for a volume on launch. The difference between that and a persistent volume is that a claim does not directly generate a volume: it simply asks kubernetes to give it one. That volume might not even exist yet. Here is the persistent volume claim we just generated:

  • a persistent volume also exists in docker. You can declare a volume in docker, and storage will be created in /var/lib/docker/volumes. Most people usually bypass this step, and bind mount a host folder instead, but you can technically let docker handle the volume management. Here is the persistent volume that got created from our Persistent Volume Claim (PVC):

For persistent volumes, you’ll have to change your filter to “Only User Namespaces”. That’s because persistent volumes do not get assigned a namespace

  • a storage class is how kubernetes figures out what persistent volume to give a container when they ask for a persistent volume claim. Kubernetes uses the storage class to define the rules that generate persistent volumes. We defined the storage class settings when we installed longhorn.

The whole process is essentially the same as docker, but these added extra steps allow containers to request volumes on the fly. That is useful if you are trying to automatically scale and need to automate volume creation or deletion.

Backing up Volumes

Fantastic! We now have a permanent storage location for our container data and can transfer data both into and out from it. However, we still have a big problem: We still don’t have a backup! Our volumes need to be backed up to be useful, so let’s address that in backing up Rancher.