Provisioning lots of VPSs at the same time with Terraform


Now that we know the basics of how to use Terraform, we need to cover the infrastructure that Kubernetes requires for a HA Cluster.

For the purposes of this tutorial, I am assuming you are completing this process as a learning exercise, and do not need a fully production ready cluster. If you are aiming to create a production ready cluster then your needs will need to be assessed on an individual basis. This is not the aim of this tutorial.

In the previous tutorial I suggested you use Digital Ocean for your cloud provider, as new sign ups get a $100 account credit, to be used over a 60 day period. This is really useful as the cost of running even a small HA cluster for any length of time isn't trivial.

Digital Ocean have a managed Kubernetes offering which works out considerably cheaper than rolling your own Kubernetes infrastructure using their Droplets individually. As mentioned in the previous tutorial, this section on Terraform is optional, and if using dedicated hardware for your cluster nodes, can be largely or entirely skipped. The one place it is still useful, imo, is for provisioning your load balancer VPS.

Kubernetes Roles

When running Kubernetes through Rancher 2 there are three roles that can be assigned to a given node. These are:

  • etcd
  • controlplane
  • worker

As a quick intro, etcd is a highly available key / value store. Kubernetes uses it to store all our configuration, secrets, and important cluster data.

The controlplane role runs all the important Kubernetes master components (with the exception of etcd) such as the kube-apiserver, kube-scheduler, kube-controller-manager and cloud-controller-manager.

The worker role allows us to serve our Docker-based apps to the hoards of waiting consumers.

At an absolute bare minimum, you will need at least two nodes in your Rancher 2 HA cluster. If you don't have at least two nodes, your Kubernetes cluster will simply refuse to come online.

The documentation strongly recommends that the etcd and controlplane roles be assigned to different nodes.

This means at a minimum, we should have seven nodes in production:

  • 3 x etcd
  • 2 x controlplane
  • 2..n x worker

Depending on your needs, having at least seven nodes sitting around, with six doing very little is not that appealing to self-hosters, or those more budget conscious. I hear you, believe me. If you have these quandaries, it raises the question of whether Kubernetes is right for your needs.

When reading the documentation for Kubernetes you will find typically that a small cluster is considered one with ~50 nodes. Based on this, our cluster could be put in the "microscopic" bracket.

As said above, I'm working on the assumption that we are learning here.

Even so, we will go with the seven nodes:

  • 2 x etcd
  • 2 x controlplane
  • 2 x worker
  • 1 x Load Balancer

If you really don't want to spin up that many, then the absolute minimum is three:

  • 2 x etcd, controlplane,worker`
  • 1 x Load Balancer

Yes, you're not supposed to put all of those roles on the same node. But you can. Again, don't do that in production.

Minimum Hardware Requirements

The minimum hardware requirements for each node are not super trivial, either.

Each role has a recommendation for minimum hardware requirements. Here's the one for etcd.

For the etcd and controlplane nodes I'm opting for the 16GB, or $80pm droplets. Documentation for requirements here is really generic. There are a non trivial amount of containers needed to run Kubernetes, but in my opinion I won't be coming close to troubling 4GB at a maximum, the rest is just head room.

For the workers, I'll go with another two 16GB droplets. In the real world, this is where I put my beef. In other words, this is where I whack in some appropriate hardware. If you're running a bunch of high intensity workloads, you need the kit to keep them going. We'll just be doing a little demo, so this is ridiculously overkill. Again, I'm just going with the suggested minimums.

Finally, the load balancer is not a typical node. It will not be a part of the cluster. It will reverse proxy traffic, balancing all the incoming traffic over our available nodes. Depending on our container setup, this could help with fault tolerance.

The load balancer will be a $5 node, because there's only going to be us that are troubling it, and it only needs to run the OS and a native NGINX (no container here).

If you've got your calculator handy, you'll be well ahead of me when I tell you that my monthly bill is now $565.

Boy, the costs ramp up quickly.

Might I remind you:

docker run -it --rm \ 
    -v ~/.ssh/id_rsa.pub:/root/.ssh/id_rsa.pub \
    -v $(pwd):/go/src/github.com/hashicorp/terraform \
    -w /go/src/github.com/hashicorp/terraform \
    hashicorp/terraform:0.11.11 \
    # don't forget it!
    destroy

Terraform Config

Them's all the words. What does this thing look like as config?

variable "digitalocean_token" {}

# Configure the Digital Ocean Provider
provider "digitalocean" {
  token = "${var.digitalocean_token}"
}

#  Resources
## Create a new ssh key
resource "digitalocean_ssh_key" "default" {
  name       = "my ssh key"
  public_key = "${file("~/.ssh/id_rsa.pub")}"
}

## Create etcd Nodes
resource "digitalocean_droplet" "etcd" {
  count    = 2

  name     = "etcd-${count.index}"
  image    = "ubuntu-16-04-x64"
  size     = "s-1vcpu-1gb"
  region   = "lon1"
  ssh_keys = ["${digitalocean_ssh_key.default.fingerprint}"]
}

## Create controlplane Nodes
resource "digitalocean_droplet" "controlplane" {
  count    = 2

  name     = "controlplane-${count.index}"
  image    = "ubuntu-16-04-x64"
  size     = "s-1vcpu-1gb"
  region   = "lon1"
  ssh_keys = ["${digitalocean_ssh_key.default.fingerprint}"]
}

## Create worker Nodes
resource "digitalocean_droplet" "worker" {
  count    = 2

  name     = "worker-${count.index}"
  image    = "ubuntu-16-04-x64"
  size     = "s-1vcpu-1gb"
  region   = "lon1"
  ssh_keys = ["${digitalocean_ssh_key.default.fingerprint}"]
}

## Create NGINX Load Balancer server
resource "digitalocean_droplet" "loadbalancer" {
  name     = "loadbalancer"
  image    = "ubuntu-18-10-x64"
  size     = "s-1vcpu-1gb"
  region   = "lon1"
  ssh_keys = ["${digitalocean_ssh_key.default.fingerprint}"]
}

I found the droplet size slugs on this page.

If you can live with all of your nodes being named node-0 to node-5 then you can combine most of this. I prefer a visual separation when working in the VPS admin panel. It's a matter of personal preference.

The various Rancher / Kubernetes nodes need to run Ubuntu 16.04 due to an older Docker version requirement that I hope gets remedied in a release sometime soon.

The load balancer is running the latest Ubuntu server release (at the time of writing / recording).

Terraform Output Count

One thing that's missing from the config is a way to get the IP addresses of the newly created nodes.

There are various ways we can approach this. The way I'm going to do so is to have one output per node group.

In the previous tutorial the output was simple enough:

# Display the IP address
output "ipv4_address" {
  value = "${digitalocean_droplet.node1.ipv4_address}"
}

But now we have multiple nodes per resource, thanks to the count parameter.

The syntax is slightly different as a result. Here's my changes:

# Display the IP addresses
output "etcd.ipv4_addresses" {
  value = ["${digitalocean_droplet.etcd.*.ipv4_address}"]
}

output "controlplane.ipv4_addresses" {
  value = ["${digitalocean_droplet.controlplane.*.ipv4_address}"]
}

output "worker.ipv4_addresses" {
  value = ["${digitalocean_droplet.worker.*.ipv4_address}"]
}

output "loadbalancer.ipv4_address" {
  value = "${digitalocean_droplet.loadbalancer.ipv4_address}"
}

Notice that the resources that use count have an array-like syntax, with a asterisk to denote each outputed entry.

After running, we should end up with something like:

Outputs:

# ... others

controlplane.ipv4_addresses = [
    138.68.147.172,
    138.68.147.255
]

# ... others

Where the nodes run in sequential order - 138.68.147.172 being controlplane-0, 138.68.147.255 being controlplane-1.

All of this info, and a whole whack more, is available in the resulting terraform.tfstate file.

Dynamic Inventory

There's one further point to discuss here before moving on.

In the next, and subsequent videos we will cover Ansible. In order for Ansible to work with our servers, it needs to be aware of the server's IP addresses.

Now, we have all this IP addressing information thanks to Terraform. But in this instance I am not going to over complicate things by pulling bits of inventory data from Terraform into Ansible. It can be done. But even though I am no fan of manual steps, in this case I'm willing to concede that less is sometimes more.

For the purposes of this tutorial, and in real life, I simply copy the info from Terraform (most typically the terraform.tfstate file) and plonk it into my Ansible inventory. Yes, it's a chore. Yes, it's potentially error prone. But it's just not worth the extra complexity for something I do so infrequently.

Your mileage may differ, and you're free to explore and use alternative approaches.

With that said, let's get onto provisioning these new servers.

Episodes