Welcome to the Build Your Own Heroku on Kubernetes series! Our goal is to build a self-hosted Heroku platform on Kubernetes because Heroku is expensive. In the previous post I told you I made a working prototype of our platform. Today you will learn how I used Terraform to create the cluster for our platform. I’ll share with you terraform files, conventions, and the system design choices I made for the cluster.

Three ways to create a GKE cluster

You can create a Kubernetes cluster on Google Cloud three ways. You can click through the cloud console. When you want to create a one-off cluster for experiments, the console is the best option. You can create a cluster using the gcloud-cli tool. The command line tool is great for scripting tasks.

When you want to declaratively provision resources Terraform is your best option. With Terraform, you can declare or write the infrastructure you want in a configuration file, version control the file with Git, and then provision the resources from the configuration file. If you forget what you provisioned, you can review the configuration file. You don’t need to click around in the console to inventory infrastructure.

Let’s organize our infrastructure project

Terraform is a simple tool. Most of the time you use the init, plan, apply, and destroy commands. The most difficult task is choosing the proper folder structure and file organization.

Disorganization will cause rewrites. Chances are high your rewrites will break state management in Terraform and your files won't be reachable by the cli tool. There are ways to recover from an incomplete state but you can minimize those possibilities with a setup that grows with infrastructure requirements.

I have yet to find the best guides on project structure but I’ve managed to find a convention that works well for a solo developer projects. The convention is infra/PROVIDER__PROJECT-NAME__SERVICE/RESOURCE-NAME/*.tf.

First, you create an infrastructure directory for terraform. Then, create a folder with the provider name, the project name within the provider, and the product I’m using from their service catalog. Then, create a folder to represent an instance of the service resource. In that folder, I’ll place my terraform files.

In practice the convention looks like, /infra/gcp__labs__gke/micro. From the path you know, you're on Google Cloud Platform, in a GCP project called labs, and a Google Kubernetes Engine cluster named micro exists. It’s not the best approach but it's been good enough for side projects. I'm a fan of this format because it's readable, double underscores prevent extensive nesting, and can be extended to include multiple providers. If you organize your Terraform project a different way, tweet me @stevennatera and let's chat!

How to write Terraform files

The following patterns are personal preferences. Even though we have multiple files, behind the scenes Terraform concatenates all your files before making infrastructure changes. If you wanted, you could put everything in single file. However multiple files prevent accidental changes other infrastructure components. Credentials are stored locally during iterations; we’ll figure out a better solution for production in later posts.

Image of our Terraform project structure.
Image of our Terraform project structure.

In definitions.tf you’ll find the declaration of variables we want to use within our configuration files. The env.tf file contains meta information needed for our resources. You’ll find the providers of our resources and the required Terraform version.

The main.auto.tfvars contains the values we want to assign to our variables. The values in files ending in *.auto.tfvars are automatically assigned to declared variables. The main.tf file contains all the resources we want to provision. Here you’ll find the cluster and node pool configurations.

# main.tf

###############################################
# Node pool
###############################################

resource "google_container_node_pool" "micro" {
  name     = var.gke_node_pool_name
  location = var.gke_location
  cluster  = google_container_cluster.micro.name

  # 1 node per zone
  initial_node_count = 1

  management {
    auto_repair  = true
    auto_upgrade = true
  }

  node_config {
    preemptible  = var.gke_node_is_preemptible
    machine_type = var.gke_node_types
    disk_size_gb = var.gke_node_disk_size

    metadata = {
      disable-legacy-endpoints = "true"
    }

    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }
}

#############################################
# GKE control plane
#############################################

resource "google_container_cluster" "micro" {
  name               = var.gke_cluster_name
  location           = var.gke_location
  min_master_version = var.gke_version

  # Here we create the smallest possible default
  # node pool and immediately delete it.

  remove_default_node_pool = true
  initial_node_count       = 1

  maintenance_policy {
    daily_maintenance_window {
      start_time = "03:00"
    }
  }
}
Detail view of main.tf

The Kubernetes control plane, where the master nodes reside, is declared in google_container_cluster. The data plane, a separately managed node pool for our workloads, is declared in google_container_node_pool. When we create our cluster with a separate node pool, instead of using the default one, we can maintain each resource independently.

A file not shown is terraform.tfstate. This file contains the current state of your infrastructure. If you delete this file, Terraform won’t be able to manage the declare resource. I save these files locally for prototypes. in production we should use a GCS bucket to store the state in a remote location. Now, let’s take a closer look at the values we specified.

# main.auto.tfvars
### Variables to be automatically substituted

core_project_id = "YOUR_PROJECT"
core_region = "us-central1"
core_zone = "us-central1-a"

gke_location = "us-central1-a"
gke_node_types = "e2-standard-4"
gke_node_disk_size = 60
gke_node_is_preemptible = true

gke_autoscaling_min = 0
gke_autoscaling_max = 1

gke_cluster_name = "micro"
gke_node_pool_name = "micro"
gke_version = "1.16"
gke_node_locations = [
  "us-central1-b",
]
Values defined in main.auto.tfvars file.

Since I won’t know resource requirements until the completion of our project, I want to iterate on infrastructure for cheap. I started with the least expensive options, then worked my way up until the cluster could install all the components without crashing.

With these values a single node zonal cluster was created. You can tell the cluster master nodes are zonal by the value at gke_location. The value in gke_node_locations tells us we want 1 node per zone in the list. To save money, we use preemptible nodes. Preemptible nodes shut down after 24 hours of use, but have discount pricing up to 80% off on-demand instance prices.

The single worker node is of type e2-standard-4 and runs Kubernetes version 1.16. I chose a node type with 4 CPUs and 16GB of RAM to make sure the platform has enough compute resources. When the platform is stable, we might downsize the requirements, and add better autoscaling limits. With these values I’ve been able to iterate on the infrastructure for less than $1!

Provision your infrastructure

To follow along, you can download these files from Github. Create a free GCP account if you don’t have one. Install the gcloud-cli, and terraform-cli. Create service account credentials for Terraform. The credentials must have the roles Compute Viewer, Kubernetes Engine Cluster Admin, and Service Account User. Rename your credentials as core.secret.json. Then follow the bash commands in the README.md file. The README.md has the steps above in more detail.

What’s next

In our next post we'll install a serverless container runtime with Knative and configure Istio as our ingress gateway to receive traffic. Join the mailing list to stay updated with the series!

Additional Reading