Testing Infrastructure as Code: A beginner’s guide

This post is for testers who wants to know about testing Infrastructure as Code. We are also aiming to help testers who are looking for a change to their testing domain.

I worked as a functional tester for more than 18 years in various roles. For the past one and half years, I am a tester in a DevOps team. We have forty plus small teams that produce code independently at one of Qxf2’s clients. Our team integrates all the code changes within the product and deploys a distributed stack that all teams can use for their integration testing. My team is responsible for deploying the infrastructure and testing that the deploy went well before handing off the test environment to testers from all the other teams. We run post deployment checks, troubleshoot infrastructure issues and run a limited set of tests.

I had a rough start to this project. There was quite a lot to learn about deploy tools and how to test the deployed infrastructure. I come across different build and automation tools. My learning happened in bits and pieces. It is pretty hard to Google for stuff like this. There are plenty of resources to learn about DevOps but very few resources to learn how to test the deploy. I am using this post as a brain dump of all the material I have come across. I hope this will serve as a starting point for your explorations. You might stumble across words in this post that you can use to Google and learn more. I am also sharing some of my experiences in the final section so you get a flavour for the kind of work you will do if this field interests you.


Infrastructure as Code

Imagine a distributed application living across several pieces of infrastructure – multiple servers, containers, lambdas, cloud services, databases, data lakes, etc. Creation of such distributed infrastructure is a tedious and error prone process. Making modifications is even tougher. Most software teams in such situations prefer to store their infrastructure as code. This makes the process of deploys more predictable and less error prone. It gives confidence that if something is not working as expected you can always roll back. Also storing your Infrastructure as Code gives better visibility to your entire team and allows you to make changes quicker. Infrastructure as Code doesn’t depend on the traditional programming languages like Java or C++. Instead it uses descriptive languages and human readable formats – YAML, JSON, etc.

If you are coming from the land of simple deploys, your usual techniques will not work well. SSH-ing into every node, validating the deployed resources, verifying access control lists and running some connectivity tests work very well when the infrastructure is reasonably limited. But as the number of nodes grow, this manual process is time consuming. The chances of human errors increase. And worst of all, it is cumbersome! So the solution the industry has come up with is to store infrastructure as code.


Why are we even testing deployed environments?

Wouldn’t testing the application be enough? If we test our application, we will identify any bad deploy and/or misconfiguration anyway.

These days application complexity is growing. It is necessary to use emerging open-source technologies, various frameworks, automation tools to orchestrate operations on complex infrastructure. There are many components such as virtual machines (VM), Containers, Serverless functions, Security (e.g.: IAM, KMS), Networking (e.g.: VPC, Subnets, Firewalls, Load Balancing), Application Performance Monitoring (Monitoring, Logging, Alerting), Data (Queues, brokers), Databases (SQL, NoSQL), Object Stores (like S3), etc.

The provisioning of infrastructure with various components is a complex process. Companies are spending huge money to build multiple environments. The testing teams run their integration tests on this deployed infrastructure. If we deploy the infrastructure without testing, there are high chances of testing teams noticing bugs because of a bad deploy. This wastes the time of all teams involved! The infrastructure testing will play key role to mitigate problems related to bad deploys and misconfigurations. This gives the confidence to the testers that issues discovered are because of the application and not the environment. Given that testing the infrastructure and the quality of the deploy saves a lot of time of all teams involved, it makes economic sense to do this step well.


Currently there are many tools available in this space – few are free and open-source and other tools cost money. This post will list some of the popular tools. Use the list here as a starting point and use Google to dig into details about the various tools.


A brief introduction to tools used in Infrastructure as Code

Please skip this section if you are already familiar and/or regularly work with Infrastructure as code. This section is about the tools and frameworks used for deploys. The tools you will choose for testing might depend on what tools your team uses for deploys.

Storing Infrastructure as Code is a common practice. It helps in the process of provisioning and managing infrastructure. There are network resources, servers (EC2 instances), load balancers, firewalls, databases and so much more that qualify as ‘infrastructure’. Infrastructure as Code is not limited to cloud-based resources. In fact, these techniques works even with non-cloud environments. Configuration files contains the infrastructure specifications, which is easy to edit and easy to distribute. Infrastructure as Code takes away the majority of the manual intervention. By executing a script you have your infrastructure. Any modern development pipeline should use Infrastructure as Code to handle their Infrastructure.

Build Tools and Code Platforms

Here is a list of tools you are likely to hear about in this space. I have tried to add a sentence or two about them. But I am not an expert. I am listing the tools here so you can Google for what interests you and learn more.

  • Terraform:

    HashiCorp’s Terraform is one of the popular open-source tool with a good client base written in Golang. It uses configuration syntax to describe Infrastructure and the syntax is called HCLHashicorp Configuration Language. Terraform is used to provision/manage infrastructure and deploy across a variety of cloud platforms. Simple descriptive language and CLI can be used to validate terraform plan which ensures your end result meets expectations.

  • AWS CloudFormation:

    AWS CloudFormation is an IaC tool used to model, provision, manage AWS infrastructure as well as third party resources. YAML or JSON templates are used to automate AWS resources. This tool has a feature to rollback previous state if there are any issues in deployment. No need to worry about AWS dependencies – this tool will automatically help you resolve the dependencies.

  • Ansible:

    Ansible open source tool is used for provisioning and configuration of Infrastructure and managing application deployment. Ansible uses YAML to define playbook configuration languages and variable files.

  • Pulumi:

    Pulumi open source tool provides simple, multi language and multi cloud support. It is used to create, deploy and manage infrastructure.

  • Chef Infra:

    Chef Infra This tool automates deploy and manage infrastructure. It is used to deploy cloud, non-cloud and hybrid infrastructure.

  • Puppet:

    Puppet another Infrastructure configuration management tool uses its own declarative language models to configure systems.

  • CFEngine:

    CFEngine is another open source tool and supports complex configurations.

  • Azure Resource Templates(ART):

    ART tool used to build Infrastructure within the Azure environment. It uses JSON to configure Infrastructure.


Testing Infrastructure as Code

This is the meat of this blogpost. We discussed about Infrastructure as Code and various deployment tools. Now we are going to discuss about Infrastructure testing. It is a process of validating and verifying that the right infrastructure and resources got setup and configured. As a tester, I found it really hard to Google for tools and techniques used when testing infrastructure stored as code. So, hopefully, the lists and my experience in the next section will help you get started.

Infrastructure as Code testing is an integral part of testing a distributed application spanning multiple teams. It supports application deployment at almost any scale. Writing automated tests for deployed infrastructure requires good knowledge in programming skills. You might be required to learn different programming languages. For example TerraTest uses Golang, terraform-compliance uses Python and AWSSpec uses Ruby etc. If you are looking to switch domains, please keep this point in mind.

Pre-deploy checks

Before an army of testers descends and tests a highly distributed application, it makes a lot of sense to test the deployment (not the application!). After deployment, we test the quality of the infrastructure, the relevant configuration and some portions of the application. This makes it cost effective, time efficient, and secure. Especially when working on Cloud applications, teams don’t need to spend hundreds of dollars by connecting to the actual cloud resources for testing your applications in local environment. This is potential saving of cost and time. There are various options available in different build tools to test the infrastructure code before actual deployment.

When I was trying to come up with a tool belt, I noticed two options. One, we could work with whatever deploy tools were used. Two, we would use specialized tools.

Let us begin with testing tools that are part of the deploy tool ecosystem itself. Lets list down few approaches to capture minimal errors and execute these type of tests using options that are available in build tools.

Localised tooling and emulators

Post static analysis it is also necessary to test individual files. Each file performs certain tasks as defined. Testing individual module in isolation without external resources. If necessary, test using in-house simulators or emulators.

Particularly for Cloud infrastructure defining the strategy for unit tests and using the localised environment tools can save cost and time. As part of unit tests we can actually deploy, validate and destroy without using the actual cloud resources. We can also use some of the techniques mentioned above as part of the unit testing.

Along with individual module testing it is necessary to execute Integration Tests which is testing more than one interrelated modules together to achieve specific result. There are options available to make use of localised environment tools and emulators.

  • LocalStack:
    LocalStack used to test AWS cloud and serverless apps, developing mocking framework for cloud applications, cost effective testing, local/offline testing, no dependency on cloud and easy to use.
  • Moto:
    The Moto AWS SDK for Python library easily mock test on AWS Infrastructure. It is very convenient library to mock AWS services.
  • Azure-functions-core-tools:
    Azure-functions-core-tools Command line tools for Azure Functions and provides local development experience for Azure applications for developing, testing, running and debugging Azure functions
  • Azurite emulator:
    The Azurite emulator tool that emulates the Azure Blob, Queue, and Table services for local development purposes. We can also test your applications against the storage services locally without connecting to actual Azure cloud services.
  • Cosmos DB Emulator:
    The Azure Cosmos DB Emulator emulates Azure Cosmos DB Service and test your application locally without creating an Azure subscription.

Testing by deploying

Testing Infrastructure as Code is similar to testing application code in terms of a testing pyramid. As you start writing automation test cases for Infrastructure as code testing from bottom to top the cost, the time, the brittleness will go up. Testing the entire environment in other words, similar to “End to End Testing” means testing the complete workflow of deployment. This involves provisioning and configuring Infrastructure and deploying the software application. The goal of these End to End tests are to ensure the environment behaves as expected.

Sometime in the year 2003, I remember speaking to a systems administrator who used to prepare servers for test, staging and then followed by pre-production and finally for Production. The system administrator use to feel there is no need for testing the environment. Due to that, there was no guarantee that the Production environment worked as expected. Testing the environments (i.e., infrastructure + configurations + deployed software) gives the application a better chance at doing what it is supposed to do. In the past, we used to perform these tests manually. With that approach there was a high possibility of failure or inconsistent behaviour on different environments. Nowadays, automated testing tools play an important role in testing Infrastructure. In the below section let’s go through few automation tools to test Infrastructure.

  1. Terratest:
  2. Go library, which has features of deploy, validate and destroy. Can be used to test Infrastructure written in Terraform, Kubernetes, Packer, Docker, Servers, Cloud API’s for any cloud service provider like AWS, Google cloud and Kubernetes. We can use Terratest to execute real IaC tools, deploy real Infrastructure, validate Infrastructure and clean up our deployed Infrastructure. We can also write unit tests, integration tests and end to end tests.

  3. Chef InSpec:
  4. An open-source testing framework that works with Servers and Cloud APIs. Used for auditing, Integration testing and end to end testing, compliance and security testing. Validating the actual state vs the desired state, CI/CD and version control process can be implemented. Easy to read and write code (Ruby). Results can be generated as a report by integrating with Allure report. Multiplatform support like Windows, Linux, Mac OS, RedHat etc. Github – https://github.com/inspec/inspec

  5. Kitchen-terraform:
  6. It is used for testing Infrastructure as code written in Terraform only. This tool has feature of deploy, validate and un-deploy. It provides set of kitchen plugins, to converge terraform configurations and verify end result with Inspec controls. This tool supports various cloud service providers as well.

  7. Serverspec:
  8. Open source and BDD framework. Based on RSpec testing tool and used to configure testing Servers correctly. To test servers actual state by executing locally using SSH, winRM, Docker API. The development is not as active as other projects.

  9. Goss:
  10. YAML based Serverspec alternative tool. Used for Server configurations validation. There is no deploy, undeploy feature.

  11. AWSpec:
  12. To test AWS resources and RSpec library

  13. Testinfra
  14. Open-source tool,it’s plugin to PyTest majorly used for testing server states, limited to OS testing, cannot use for cloud services.

  15. Molecule:
  16. Open source tool. To test Ansible roles it supports multiple instances, OSes, multiple virtualizations. Limited to Ansible testing


My Experience as a DevOps tester

I am noting my experience coming from the world of purely functional testing. My feeling is that the niche I am testing now has potential. I want to share my experiences just in case it benefits other folks who might be looking to switch streams.

I started using existing Terraform scripts and Ansible to deploy and configure cloud infrastructure. Both the tools are available to prepare and deploy infrastructure and support deployment of the application onto different cloud services. This includes an OS, different types of EC2 instances and the supporting configurations, S3 Bucket, RDS, Load balancer etc. Based on the Infrastructure type, the deployments of components to cloud varies. I did not dive into testing directly. Instead, I spent some time understanding how the deploys work. The testing happened gradually.

Before the deployment, the application performs a few pre-deploy checks without installing the packages. That gives confidence that the deployment will go fine. This is a mandatory check the deployment will enforce before the actual deployment.

Post deployment of infrastructure and application, I used to run few manual checks as part of Infrastructure verification. The number of manual checks depends on the type of infrastructure deployed. Manual checks involves logging to management console/database console verifying all deployed resources, connectivity tests, etc. These are time consuming but have to be performed. One of my colleagues (in a different project), wrote some simple Python scripts to perform most of these kind of checks.

We also run automated scripts written in one of the Open-Source tool InSpec for testing and auditing applications. InSpec is used to compare the actual service version (Chart-Kubernetes resources) of your system with the desired service versions. There are different test frameworks maintained based on the resources that you deploy. For example, cloud based infrastructure uses one framework while local deploys use a framework that is a standalone structure with its own distribution and execution flow. For reporting Inspec has been integrated with Allure report framework.

We use Packer (an open-source tool) for testing Amazon Machine Images (AMI) that have all software installed and configured. We deploy infrastructure in different ways – like Terraform’s Pass module, some internal scripts, etc. The goal is to provide all teams a quick way to deploy a standardized set of environments and perform tests against deployed infrastructure.


Some issues I observed when testing these automated deploys

As a tester, we are drawn to bugs. I felt more comfortable once I started to notice errors and patterns in the errors. In spite of all the testing we do on the infrastructure and deploys, things can still go wrong. Here are some common problems you might encounter that are not necessarily software bugs but rather bugs with underlying hardware, network and people.

  • Timeout issues

    Due to network problems or high-latency related issues many times there are timeout issues. I do not know of any proper solution to resolve this. We just re-initiate the deployment job. I know that is not good but we have not found a more robust solution so far.

  • Resource crunch

    This issue is mainly seen on test environments when Infrastructure is no longer needed and can be ready for teardown. If the teardown doesn’t happen properly, there is a chance that unnecessary resources will be held up with old systems and during new deployments in a particular region there are high chances of resource crunch. To avoid this problem make sure in a particular region teardown the systems which are no longer needed. For example, after terminating nodes ensure the public IPs are released for availability. Actually I recommend automating the tear down process as a regular practice on test and staging environments. This will save you a lot of money.

  • Improper code merges
  • Not frequent but I had seen sometimes due to a wrong code merge the deployments getting failed and it is unnecessarily causes delays.

  • Incorrect parameter values and credentials:

    By providing wrong cloud credentials there are chances of incorrect behaviour of Infrastructure and also due to supplying wrong parameter values deployments are getting failed which will directly impact test schedule.


  • In this post, I gave a brief overview of infrastructure as code, tools used to deploy and options for using tool belt to test infrastructure. Also mentioned the importance of Infrastructure testing, my experiences as an infrastructure tester over the last year and a half, and issues encountered when testing these automated deploys.


    References

    1. https://www.youtube.com/watch?v=xhHOW0EF5u8&t=47s
    2. https://www.youtube.com/watch?v=6UN2aVvIShc
    3. https://www.meshcloud.io/2020/03/13/testing-infrastructure-as-code/
    4. https://www.equalexperts.com/blog/our-thinking/testing-infrastructure-as-code-3-lessons-learnt/
    5. https://www.techtarget.com/searchitoperations/tip/Infrastructure-as-code-testing-strategies-to-validate-a-deployment
    6. https://stackoverflow.com/questions/53888830/local-cloud-stack-for-azure-similar-to-localstack-for-aws

    4 thoughts on “Testing Infrastructure as Code: A beginner’s guide

    Leave a Reply

    Your email address will not be published. Required fields are marked *