How to add, use, and update `.terraform.lock.hcl` without pain

This is the first guest article in this blog. This is one is by Maxym Vlasov - my co-author of the CatOps channel.

Jun 20, 2023

Also, this is the first time I use AI-generated image for a post.

Pre-history

As you may know, Terraform 1.4.0 has introduced changes, which break the previous unintentional behavior. Previously, you could ignore the lockfile and use cached providers as long as the version constraints in the code were okay with your local cache. Starting from 1.4.0, Terraform always checks the lockfile before going into your cache directory. In practice, it means that if you ignore the lockfile or remove it completely, Terraform will run full init, no matter what is in your TF_CACHE_DIR or the .terraform directory.

So, here are a few options to solve this:

Keep using Terraform 1.3.x as the new 0.11
Set TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE=true
Start using the lockfile and move on

When all hope was gone that our lovely workflow with **/.terraform.lock.hcl in .gitignore won’t backfire, I chose to try to add .terraform.lock.hcl to all our 289 root modules. You may ask:

Why are these lockfiles needed?

Well, except “highly recommended by Hashicorp way”, which force you to use lockfiles, here are a few additional reasons why you would like to use them - Repeatability and Security.

Repeatability

Imagine that you have aws or kubernetes provider and you trust that the maintainers use SemVer as it is designed. So, you specify:

terraform {
  required_version = "~> 1.3"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.0"
    }
  }
}

Everything works nice, until…

Provider is going to be broken for 3 workdays, because of a lack of testing ¯\_(ツ)_/¯

Provider adds a breaking change in a minor release because they forgot to add it in a major one

Both of these issues happened last month.

Of course, you can set patch versions explicitly. For example, "5.0.0" or "2.19.0" and use Renovate/dependabot or tfupdate pre-commit hook for intentional updates. Yet, with tfupdate you’re forced to use exactly one terraform/provider/module version across all the source code. This way you will avoid the problems above. But there is more.

Security

There are these h1 and zh hashes inside .terraform.lock.hcl:

provider "registry.terraform.io/hashicorp/kubernetes" {
  version     = "2.21.1"
  constraints = ">= 2.21.1, < 3.0.0"
  hashes = [
    "h1:2spGoBcGDQ/Csc23bddCfM21zyKx3PONoiqRgmuChnM=",
    "h1:7cCH+Wsg2lFTpsTleJ7MewkrYfFlxU1l4HlLWP8wzFw=",
    "h1:I1qWLUFmB0Z8+3CX+XzJtkgiAOYQ1nHlLN9lFcPf+zc=",
    "h1:gP8IU3gFfXYRfGZr5Qws9JryZsOGsluAVpiAoZW7eo0=",
    "zh:156a437d7edd6813e9cb7bdff16ebce28cec08b07ba1b0f5e9cec029a217bc27",
    "zh:1a21c255d8099e303560e252579c54e99b5f24f2efde772c7e39502c62472605",
    "zh:27b2021f86e5eaf6b9ee7c77d7a9e32bc496e59dd0808fb15a5687879736acf6",
    "zh:31fa284c1c873a85c3b5cfc26cf7e7214d27b3b8ba7ea5134ab7d53800894c42",
    "zh:4be9cc1654e994229c0d598f4e07487fc8b513337de9719d79b45ce07fc4e123",
    "zh:5f684ed161f54213a1414ac71b3971a527c3a6bfbaaf687a7c8cc39dcd68c512",
    "zh:6d58f1832665c256afb68110c99c8112926406ae0b64dd5f250c2954fc26928e",
    "zh:9dadfa4a019d1e90decb1fab14278ee2dbefd42e8f58fe7fa567a9bf51b01e0e",
    "zh:a68ce7208a1ef4502528efb8ce9f774db56c421dcaccd3eb10ae68f1324a6963",
    "zh:acdd5b45a7e80bc9d254ad0c2f9cb4715104117425f0d22409685909a790a6dd",
    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
    "zh:fb451e882118fe92e1cb2e60ac2d77592f5f7282b3608b878b5bdc38bbe4fd5b",
  ]
}

Terraform uses them to pull exactly the same artifacts for your platform, as they were used during the last terraform init and => terraform apply commands.

It decreases the probability of the supply chain attack when the weakest link in your supply chain is terraform providers.

Preparation for lockfiles addition

Note: In all the examples below I used GitHub Workflows. Yet, you can port it to any other CI.

First of all, you need to have a valid terraform configuration.

You cannot just skip this step if you have a huge terraform codebase: almost certainly there is something broken.

So, let me introduce to you the terraform validate command! Just kidding. Yet, this is exactly what we need here. Sometimes the validation requires the terraform init -backend=false setting which you need to run for all the root modules.

For this case, there is another pre-commit solution, which inits your modules (and fixes existing .terraform if they are outdated/broken), and run validations. To use it you have to:

Install dependencies in any way described in the pre-commit-terraform.
Create a .github/.pre-commit-tf-lockfiles.yaml file with the content as below:

Note: We will use this file to auto-update lockifles in the CI later. .github/ is present in the file path just to hide it from regular users and keep it as close as possible to .github/workflows/

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.81.0
    hooks:
      - id: terraform_validate
        args:
          - --hook-config=--retry-once-with-cleanup=true
          - --tf-init-args=-upgrade
        # files: '^path/to/your/terraform/root/folder/[a-c]'
        exclude: '(\.)?modules/'

      # - id: terraform_providers_lock
      #   args:
      #   - --hook-config=--mode=always-regenerate-lockfile
      #   - --args=-platform=linux_arm64
      #   - --args=-platform=linux_amd64
      #   - --args=-platform=darwin_amd64
      #   - --args=-platform=darwin_arm64
      #   files: '^path/to/your/terraform/root/folder/[a-c]'
      #   exclude: '(\.)?modules/'

If you have huge repo - uncomment this line and specify the correct # files: '^path/to/your/terraform/root/folder/[a-c]'
files and exclude uses a Python re.search regex (docs). By specifying [a-c] at the end, we can limit the number of directories that should be processed by a single run
Run the command and chill for a couple of minutes

pre-commit run -a --config .github/.pre-commit-tf-lockfiles.yaml

Once the command has finished, check that all your root modules pass the validation. If not, fix the errors and rerun pre-commit until all the modules are valid.
Edit your .gitignore to not ignore lockfiles. For example:

!path/to/your/terraform/root/folder/[a-c]*/.terraform.lock.hcl
!path/to/your/terraform/root/folder/[a-c]*/**/.terraform.lock.hcl

Add lockfiles

Go to the previously created .github/.pre-commit-tf-lockfiles.yaml and:
- Uncomment terraform_providers_lock hook
- Set your -platform=’s
- Copy files and exclude sections from terraform_validate to terraform_providers_lock
- Comment terraform_validate hook to save extra time
Run the command below. It takes more time than the first command, so you can do something else in the meantime.

pre-commit run -a --config .github/.pre-commit-tf-lockfiles.yaml

In my tests, it took about ~2,5s per platform per provider per root module. So, for a module with 6 providers with 4 platforms, you may need about 1 minute to generate a lockfile.
Check that all lockfiles have zh hashes for each provider.
Don’t forget to remove empty files generated in the Preparation section for the directories without the terraform code.
If some lockfiles do not have all the required hashes, check the logs. In most cases, it means that you still use something from the Terraform 0.11, which does not support one of the specified platforms (in my case -platform=darwin_arm64 for hashicorp/template and mumoshu/helmfile)
If you also encounter these problems, modify .github/.pre-commit-tf-lockfiles.yaml and rerun pre-commit until everything is Ok:

 - id: terraform_providers_lock
    args:
      - --hook-config=--mode=always-regenerate-lockfile
      - --args=-platform=linux_arm64
      - --args=-platform=linux_amd64
      - --args=-platform=darwin_amd64
      - --args=-platform=darwin_arm64
    exclude: |
      (?x)
        (/(\.)?modules/

        # hashicorp/template 2.2.0 is not available for darwin_arm64
        |^terraform/bootstrap/
        # mumoshu/helmfile 0.14.0 is not available for darwin_arm64.
        |^terraform/helmfiles/
      )      

  # TODO: Rewrite these modules to newer providers
  - id: terraform_providers_lock
    name: Lock terraform provider versions w/o darwin_arm64
    args:
      - --hook-config=--mode=always-regenerate-lockfile
      - --args=-platform=linux_arm64
      - --args=-platform=linux_amd64
      - --args=-platform=darwin_amd64
    files: |
      (?x)
        # hashicorp/template 2.2.0 is not available for darwin_arm64
        (^terraform/bootstrap/
        # mumoshu/helmfile 0.14.0 is not available for darwin_arm64.
        |^terraform/helmfiles/
      )

Note: To save some time, you may need to comment out the last hook section for future lockfiles generation.

Automate lockfile updates in CI

Once you have all the lockfiles, it’s time to automate their updates.

Go to .github/.pre-commit-tf-lockfiles.yaml and:
- Change terraform_validate files sections to:

files: '\.terraform\.lock\.hcl$'

- to limit terraform init run only for dirs with lockflie.
- Remove files sections in the terraform_providers_lock hooks
In the end, you will get something like this:

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.81.0
    hooks:
      - id: terraform_validate
        args:
          - --hook-config=--retry-once-with-cleanup=true
          - --tf-init-args=-upgrade
        files: '\.terraform\.lock\.hcl$'

    - id: terraform_providers_lock
        args:
          - --hook-config=--mode=always-regenerate-lockfile
          - --args=-platform=linux_arm64
          - --args=-platform=linux_amd64
          - --args=-platform=darwin_amd64
          - --args=-platform=darwin_arm64
        exclude: |
          (?x)
            (/(\.)?modules/

            # hashicorp/template 2.2.0 is not available for darwin_arm64
            |^terraform/bootstrap/
            # mumoshu/helmfile 0.14.0 is not available for darwin_arm64.
            |^terraform/helmfiles/
          )          

      # TODO: Rewrite these modules to newer providers
      - id: terraform_providers_lock
        name: Lock terraform provider versions w/o darwin_arm64
        args:
          - --hook-config=--mode=always-regenerate-lockfile
          - --args=-platform=linux_arm64
          - --args=-platform=linux_amd64
          - --args=-platform=darwin_amd64
        files: |
          (?x)
            # hashicorp/template 2.2.0 is not available for darwin_arm64
            (^terraform/bootstrap/
            # mumoshu/helmfile 0.14.0 is not available for darwin_arm64.
            |^terraform/helmfiles/
          )

Add a GitHub workflow, which installs all the dependencies and run pre-commit run every Monday. It creates a new PR in the Renovate style:

name: Maintain Terraform lockfile up-to-date
# It is required at least Renovate fixes https://github.com/renovatebot/renovate/issues/22417

on:
  workflow_dispatch: {}

  schedule:
    - cron: '0 4 * * 1' # Execute every Monday at 04:00

permissions:
  contents: write
  pull-requests: write


env:
  # Prevent GH API rate-limit issue
  GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}

jobs:
  pre-commit-tf-lockfile:
    runs-on: ubuntu-latest
    container: python:3.11-slim
    steps:
    - name: Install container pre-requirements
      run: |
        apt update
        apt install -y \
            git \
            curl \
            unzip \
            jq \
            nodejs # Needed for Terraform installation
        curl -L https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 > /usr/bin/yq &&\
          chmod +x /usr/bin/yq        
    - name: Checkout
      uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
      with:
        ref: ${{ github.base_ref }}

    - uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
    - run: |
        git config --global --add safe.directory /__w/infrastructure/infrastructure
        git fetch --no-tags --prune --depth=1 origin +refs/heads/*:refs/remotes/origin/*        

    - uses: hashicorp/setup-terraform@633666f66e0061ca3b725c73b2ec20cd13a8fdd1 # v2.0.3
      with:
        terraform_version: ~1.3

    - name: Execute pre-commit
      uses: pre-commit/action@646c83fcd040023954eafda54b4db0192ce70507 # v3.0.0
      with:
        extra_args: >
          --all-files
          --config .github/.pre-commit-tf-lockfiles.yaml
          --color=always --show-diff-on-failure          

    - name: Create Pull Request
      if: failure()
      id: cpr
      uses: peter-evans/create-pull-request@284f54f989303d2699d373481a0cfa13ad5a6666 # v5.0.1
      with:
        commit-message: 'chore(deps): Update terraform lockfiles'
        branch: pre-commit/update-tf-lockfiles
        delete-branch: true
        title: 'chore(deps): Update terraform lockfiles'
        body: >
          This PR update provider versions in Terraform lockfiles to their most resent values

          > **Warning**: Before merge, please, make sure that all Terraform CI runs pass successfully.          
        labels: auto-update
        branch-suffix: timestamp

    - name: Pull Request number and link
      if: failure() && steps.cpr.outputs.pull-request-number > 0
      run: |
        echo "Pull Request Number - ${{ steps.cpr.outputs.pull-request-number }}"
        echo "Pull Request URL - ${{ steps.cpr.outputs.pull-request-url }}"

For 289 root modules with 1180 lockfile provider definitions, it takes 2h 40min or ~2,288s per platform per provider, which is ~0.2s faster than running it locally.

Well, that’s it from the technical perspective. It’s time to deal with the questions like: Why don’t you just use Renovate for this, dude?

Why not Renovate?

Yes, I heard about Renovate and dependabot. Check out, my talks about Renovate at Anton Babenko stream and HUG Kyiv(Ukr).

We don’t use dependabot for the infrastructure repository because it has too many problems with monorepos, you can’t simply force dependabot.yml for the whole organization, it is less configurable than Renovate, etc.

GitHub itself does not use dependabot properly in its repositories...

A newly generated repository from actions/typescript-action contains

Dependabot PR is there for 2 weeks!

How long can a dependabot PR live without any attention

I maintain Renovate for my organization here: Sharable Config Presets for Renovatebot, especially useful for DevOps folks. Also, Renovate has a lockFileMaintenance option but…

For now, Renovate cannot resolve != version constrain (renovate/#22417), so it just fails to create a PR if at least one version constraint with != exists in a repository.
If you don’t have any !=, Renovate creates nice PRs, but it does not respect provider constraints used inside the child modules of your root module.
So you get something like

provider "registry.terraform.io/hashicorp/aws" {
version     = "5.2.0"
constraints = "~> 5.0"

in cases when terraform providers lock command will create something like

provider "registry.terraform.io/hashicorp/aws" {
version     = "5.2.0"
constraints = ">= 2.0.0, >= 3.0.0, >= 3.64.0, >= 4.0.0, >= 4.9.0, >= 4.18.0, >= 4.22.0, >= 4.23.0, >= 4.49.0, ~> 5.0"

And it will work fine until someone inside these modules wouldn’t specify != 5.2.0 or < 5.2.0.
Renovate specifies all the available h1 hashes (all available provider platforms), which is pretty nice. Yet, it does not specify “vanilla” zh hashes, which, in my opinion, are more strict. Thus, I prefer to have zh hashes when it is possible.

And one more thing:

Make sure to add lockfiles to all the new root modules

Just introduce a rule for your terraform configurations:

Run terraform init when you add a new root module.

It adds a basic .terraform.lock.hcl, which you can commit as it is and wait for the next lockfile update job.

Or, you could add .pre-commit-config.yaml with:

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.81.0
    hooks:
      # Validate and run `terraform init` which needed for terraform_providers_lock
      - id: terraform_validate
      - id: terraform_providers_lock
        args:
          - --hook-config=--mode=only-check-is-current-lockfile-cross-platform
          - --args=-platform=linux_arm64
          - --args=-platform=linux_amd64
          - --args=-platform=darwin_amd64
          - --args=-platform=darwin_arm64
        ## TODO: Rewrite these modules to newer providers
        # exclude: |
        #   (?x)
        #     (/(\.)?modules/

        #     # hashicorp/template 2.2.0 is not available for darwin_arm64
        #     |^terraform/bootstrap/
        #     # mumoshu/helmfile 0.14.0 is not available for darwin_arm64.
        #     |^terraform/helmfiles/
        #   )

And automate pre-commit executions in PRs like this or this.

Summary

Here are a few takeaways:

It’s better to have lockifles than have no (repeatability, security)
It makes sense to automate all these updates and maintain the same versions across the codebase, and be on the cutting edge without bleeding (Renovate, dependabot, tfupdate via pre-commit)
For a better user experience, you need a source of truth (automated terraform plan in CI, Terratests, etc.) which shows that changes do not break anything. It can be Atlantis, Spacelift, Terraform Cloud, or you can do it in your own CI.

If you want to publish your article here as well, just ping me.

CatOps Newsletter

Discussion about this post

Ready for more?