Consensus AWS Case Study

Summary:

Consensus is a Software Development Professional Services Company, creating cloud-based solutions for in-store, web, and mobile unifing the complex and interdependent data streams between retailers, manufacturers, digital service providers, underwriters and network operators. Consensus engaged Taos to help with the design, strategy, and optimization of the continuous deployment of their SaaS platforms. Taos played an instrumental role and utilized technologies including AWS, Terraform, and Chef in the solution.

Challenge:

The existing deployments at Consensus were taking as long as 5 hours to complete and required many manual steps to create the entire environment once completed.

The environment was built using Cloud Formation and OpsWorks; custom cookbooks were written in the author's scripting language of choice and then executed rather than using the OpsWorks-flavored Chef resources and tooling as designed. This led to issues including unexpected application restarts and multiple entries of the same configuration when applying custom cookbooks a second time or more.

The objective was a full CI/CD pipeline leveraging Terraform, Chef, GitHub Enterprise, Jenkins, and Artifactory. Production would be A/B between regions to replace the Disaster Recovery thinking and take advantage of Auto Scaling Groups to ensure that traffic bursts would be handled and scale down when no longer needed.

Solution:

Taos consultants effectively scaled the Consensus environment by using Terraform and Chef to provision Nodes within the AWS production environment. Consensus uses S3, EC2, RDS, Data Pipeline, and VPCs extensively among other services.

At a high level, Terraform was used to provision most AWS resources and the Chef 'environment' for instances to live within with attribute overrides and other environment specifics.
When Terraform created EC2 instances, it would use the Chef provider to 'attach' those instances to the environment it created earlier, provide the instances run list via Chef Role, and then perform a Chef client run that configured the system.

On the Chef side, Taos consultants used Test Kitchen and Serverspec to author cookbooks and ensure correct performance. With adequate test coverage, it can be ensured that future changes won't break current functionality. Jenkins tests each cookbook change and acts as the pipeline for cookbooks to be delivered to both the Chef server (where the managed nodes pull them from) as well as private Supermarket. This provided a method to manage and work with dependent cookbooks via Berkshelf.

At this stage, each managed node sets the Chef client to run every 25-35 minutes – providing the management component of configuration management – minimizing drift and making necessary updates on the fly.
 

Results:
The implemented solution provided the following improvements:

  •  Created a job that does a full suite of tests for application or service-level cookbooks any time a core cookbook is changed
  • Applied Terraform configuration testing
  • Once this is done Jenkins can perform more tasks for automatically
  •  Introduced a Clustered Chef server
  • Moving from the current setup of each environment having its own Chef server to a central High-Availability clustered environment
  • Implemented active/passive accounts instead of Disaster Recovery
  • One account in use as 'production'
  • The other account receives latest code and tests are performed
  • Once latest code is verified, passive account becomes active production and active becomes passive, then is torn down and rebuilt with current code until new code is ready to deploy
  • Auto Scaling Groups implemented to help manage costs and handle load surges