Terraform remote state for collaboration

Terraform remote state for collaboration

Youssef Naimi - 20 August 2019

6 minutes read

Terraform is a great tool that we use for building, changing, and versioning our cloud infrastructures safely and efficiently as we explained in our previous article, this is great for building, changing and destroying infrastructures from local machine for testing and development purpose, but when it comes to real production environments, things are different, usually you will have a team managing and maintaining these production environments, the use of a local state file makes Terraform usage complicated because each user must make sure they always have the latest state data before using Terraform and also make sure no one else runs Terraform at the same time, otherwise the state data might get corrupted.

Terraform has what is called remote backends. This feature tells Terraform that the snapshot of your infrastructure should no longer live in the terraform.tfstate file on your local filesystem and that it should be stored in a remote location which makes team-based workflows supportable and easy by allowing Terraform to use a shared storage space for the state data, so your team can use Terraform to manage the same infrastructure without worrying about the state data.

This is great right! it doesn’t stop here, remote backends has also another feature which provides a locking mechanism so that people’s changes don’t override another’s (for example, S3 with a DynamoDB table or Consul) and data can be referred to across different environments which brings with it the opportunity for smart separation of concerns when it comes to Terraform configuration.

The main goal of this post is to show you how you can use only the AWS lambda function, which means no servers (EC2 instances) is needed, to create an image resizing task. When an image gets loaded from the s3 bucket through an API Gateway endpoint, the lambda function will be triggered which will resize the image (if the image with the specified size does not exist in the bucket) based on the specified size, and it will return the resized image URL.

Setting up the remote state in AWS

Remote backends support multiple cloud providers, for this example we are going to use AWS, the requirements for setting up the remote state with AWS are the place where Terraform will store the state file which in our case will be an S3 bucket, and a key-value database needed to store the state lock information (the locking mechanism we talked about) and make sure we have consistency checking, for this AWS DynamoDB makes perfect choice.

Those two resources need to be created first, one thing to note is that you want your S3 bucket to be as protected as possible and also have some kind of backup for the state files inside, because they’re the source of truth for your team, and you won’t be in a good situation if it gets deleted accidentally. For this AWS has a solution, S3 versioning which is a kind of backup of the objects inside an S3 bucket by keeping multiple versions of an object, the other form of protection we can have for our bucket is the meta-argument prevent_destroy available to use with Terraform, this argument when set to true will cause Terraform to reject with an error any plan that would destroy the infrastructure object associated with the resource.

Now lets see how the configuration of the remote backend looks like in Terraform code :

# S3 bucket to store the state file
resource "aws_s3_bucket" "tf-remote-state" {
    bucket = "tf-remote-storage"
    versioning {
      enabled = true
    }
    lifecycle {
      prevent_destroy = true
    }
}
# dynamodb table for storing the lock key
resource "aws_dynamodb_table" "dynamodb-terraform-state-lock" {
  name = "tf-state-lock"
  hash_key = "LockID"
  read_capacity = 20
  write_capacity = 20
attribute {
    name = "LockID"
    type = "S"
  }
}

Before we can use this, Terraform needs some set of AWS IAM permissions to be able to access the two resources and use them. for the S3 bucket it needs the following IAM permissions :

  • s3:ListBucket on arn:aws:s3:::tf-remote-storage
  • s3:GetObject on arn:aws:s3:::tf-remote-storage/path/to/key
  • s3:PutObject on arn:aws:s3:::tf-remote-storage/path/to/key

The json body of this AWS IAM Statement looks like this :

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::tf-remote-storage"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:::tf-remote-storage/path/to/key"
    }
  ]
}

DynamoDB Table Permissions

When using state locking mechanism, Terraform will need the following AWS IAM permissions on our DynamoDB table :

  • dynamodb:GetItem on arn:aws:dynamodb:::table/tf-state-lock
  • dynamodb:PutItem on arn:aws:dynamodb:::table/tf-state-lock
  • dynamodb:DeleteItem on arn:aws:dynamodb:::table/tf-state-lock

This can be seen as follow in form of AWS IAM Statement:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": "arn:aws:dynamodb:*:*:table/tf-state-lock"
    }
  ]
}

Using the S3 remote state

After we apply the two resources, they can be used to configure the s3 remote backend, this can be achieved using the following code :

terraform {
  backend "s3" {
    encrypt = true
    bucket = "tf-state-storage"
    dynamodb_table = "tf-state-lock"
    key    = "state-lock-storage.keypath"
    region = "us-east-1"
  }
}

After this, the working directory has to be initialized again.

The end!

Terraform remote state is a powerful feature, combined with the advantages of S3 and DynamoDB leverage an extra layer of security from encryption of your DynamoDB table and your state files to versioning which also allow your team to take the advantage of comparing the differences between previous and current version of your state files, providing an easy way to backup/rollback in case of incidents. I hope this article was useful for your use cases, any feedback is welcome.

Comments