Cleaning up a Terraform state file — the right way!

We have all been there, the moment terraform apply crashes because someone made a manual change and removed a resource that terraform is expecting to be available. You try to do a terraform refresh but to no luck! What do you do at this point? Sometimes the only option is to make modifications to the terraform state file. This article will walk you through how to make state file modifications, both the right and the wrong way, so that you can educate other in the future on how to make statefile changes properly.

This article was originally published on Medium. Link to the Medium article can be found here.

The wrong way

One could easily open up the terraform.tfstate file and manually do “JSON surgery” but this is not a recommended action, mainly for the high chance of human errors and potentially wrecking your state file. That being said, allow to me show you how.

If your state file is stored locally (bad practice), then all you need to do is simply make a backup of the terraform.tfstate and open up your favorite text editor begin to make changes in terraform.tfstate. However, if your state file is stored remotely, say an S3 bucket then there are a couple of steps we need to take.

  1. Run terraform init
  2. Comment out the backend logic
  3. Run terraform init and answer yes to copy your statefile locally
  4. Open up the state file, check your .gitignore file if you are unable to see it!
  5. Begin statefile surgery
  6. Save your changes
  7. Add your remote backend logic/config
  8. Run terraform init and answer yes to copy your local config to the remote site

The terraform state file is in a JSON format (see below). As you can tell, all terraform defined resources fall under the resources array block. So if we wanted to remove the aws_instance resource, we would have to remove the entire { } that the resource falls under.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"version": 4,
  "terraform_version": "0.12.3",
  "serial": 6,
  "lineage": "4e32218a-6f16-1e51-2523-b3875f604783",
  "outputs": {},
  "resources": [
    {   <------- This is where the aws_instance resource starts
      "mode": "managed",
      "type": "aws_instance",
      "name": "web",
      "provider": "provider.aws",
      "instances": [
        {
          "schema_version": 1,
          "attributes": {
            "ami": "ami-0b898040803850657",
            "arn": "arn:aws:ec2:us-east-1:140040602879:instance/i-0cd2055ad2783a11b",
            "associate_public_ip_address": true,
            "availability_zone": "us-east-1c",
            "cpu_core_count": 1,
            "cpu_threads_per_core": 1,
            "credit_specification": [
              {
                "cpu_credits": "standard"
              }
            ],

Modules are also found under the main resources array block. Modules look like the following (see below):

If you have a module that creates several resources, expect to find the module block for each resource. Yes, this means that you can expect to find many entries with the same name of the module you created. So if we named a module module "buckets" {} and it creates two aws_s3_bucket resources then you can expect find the module entry twice- unless you used the count parameter, in that case you would only find one entry.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
      "module": "module.buckets",
      "mode": "managed",
      "type": "aws_s3_bucket",
      "name": "demo-mod-1",
      "provider": "provider.aws",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {

At this point, if you are removing resources simply ensure you remove the proper resources code blocks. If you are changing a value, then the value might also have to be changed manually on the real resource it represents.

An example of this is an aws account created through terraform, if you desire to change the account name, then you have to change the state file resource attribute name value, as well as manually in the account. Why? Because otherwise terraform will observe the change and attempt to create another resource and that is not the desired behavior, therefore both terraform and manual actions have to be implemented.

Note: ensure no outputs are depending on the resources being removed, if so remove the outputs as well. Outputs can be found at the beginning of the state file under the outputs code block outputs: {} .

The right way

The proper way to handle the state file is through the terraform CLI. The two most common commands pertaining to the state file is terraform state mv and terraform state rm .

If you desire to rename a single resource, and by resource I mean the terraform resource name, not to be confused with the resource’s attribute name.

1
terraform state mv aws_instance.my-ssh-server aws_instance.foo

If you want to move a resource into a module.

1
$ terraform state mv aws_instance.my-ssh-server module.servers

If you want to move a module into a module.

1
$ terraform state mv module.servers module.aws.core

If you want to move a resource to another state file.

1
2
terraform state mv -state-out=otherstatefile.tfstate \
    module.iam module.iam

If you want to remove a single resource.

1
terraform state rm aws_instance.my-ssh-server

Within a module

1
terraform state rm module.aws.core.servers[0]

If you want to remove a module.

1
$ terraform state rm module.buckets

If there are nested modules then it would look like the following. In this example there is a module named “east” inside the parent module “buckets”

1
$ terraform state rm module.buckets.east

Conclusion

Using the terraform CLI is a much cleaner and safer way to remove modules and resources. Especially if you are removing a module or nested modules. That being said, there are times when the Terraform CLI come short and you have to resort to manual intervention, proceed with caution in these scenarios and always make a backup. Ideally you won’t have to resort to these techniques but sometimes accidents happen on your Infrastructure as Code journey.

0%