Eliminate Human-Based Actions With Automated Deployments: Improving Commit-to-Deploy Ratios Along the Way
Remove Toil in roles and responsibilities and drive toward next-gen development lifecycles using DevOps pipelines and integrating with existing observability platforms.
Join the DZone community and get the full member experience.
Join For FreeEditor's Note: The following is an article written for and published in DZone's 2024 Trend Report, The Modern DevOps Lifecycle: Shifting CI/CD and Application Architectures.
Thirty years later, I still love being a software engineer. In fact, I've recently read Will Larson's "Staff Engineer: Leadership beyond the management track," which has further ignited my passion for solving complicated problems programmatically. Knowing that employers continue to accommodate the staff, principle, and distinguished job classifications provides a breath of fresh air for technologists who want to thrive as an engineer.
Unfortunately, with the good sometimes comes the not-so-good. For today's software engineer, the reality isn't quite so ideal, as Toil continues to find a way to disrupt productivity on a routine basis. One common example is when it comes to deploying our artifacts — especially into production environments.
It's time to place a higher priority on deployment automation.
The Traditional Deployment Lifecycle
The development lifecycle for a software engineer typically centers around three simple steps: develop, review, and merge. Building upon these steps, the following flowchart illustrates a traditional deployment lifecycle:
Figure 1. Traditional development lifecycle
In Figure 1, a software engineer introduces an update to the underlying source code. Once a merge request is created, the continuous integration (CI) tooling executes unit tests and performs static code analysis. If these steps are completed successfully, a second software engineer performs a code review for the changes. If those changes are approved, the original software engineer merges the source code changes into the main branch.
At this point, the software engineer starts a deployment to the development environment (DEV), which is handled by the continuous delivery (CD) tooling. In this example, the release candidate is deployed to dev and additional tests (like regression tests) are executed. If both steps pass, the software engineer initiates a deployment into the QA environment via the same CD tooling. Next, the software engineer creates a change ticket to release the source code update into the production environment (Prod). Once the approving manager approves the change ticket, the software engineer initiates a deployment into Prod. This step instructs the CD tooling to perform the Prod deployment.
Unfortunately, there are several points in the flow where human-based tasks are involved.
Time to Focus on Toil Elimination
Google Site Reliability Engineering's Eric Harvieux defined Toil as noted below:
"Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows."
Software engineers should alter their mindset to become cognizant on identifying Toil in their roles and responsibilities. Once Toil has been acknowledged, tasks should be established to eliminate these items that do not foster productivity. Most Agile teams reserve 20% of sprint capacity for backlog tasks. Toil elimination is always a perfect candidate for such work.
In Figure 1, the following tasks were handled manually and should be viewed as Toil:
- Start DEV Deployment
- Start QA Deployment
- Create Change Ticket
- Manager Approve Change Ticket
- Start Prod Deployment
In order to drive toward next-gen deployment lifecycles, it is important to become Toil-free.
DevOps Lifecycle and Deployment Automation
While Toil elimination is an important aspect to next-gen deployment lifecycles, deployment automation via DevOps is equally as important. Using DevOps pipelines, we can automate the deployment flow as noted below:
- Create the release candidate image when the merge-to-main event is completed.
- Automate the deployment to DEV when a new release candidate is created.
- Continue to deploy to QA upon successful deployment to DEV.
- Create the change ticket programmatically once QA deployment is successful.
In implementing the automation noted above, three of the five human-based tasks are eliminated. In order to mitigate the remaining two tasks, the observability platform can be leveraged.
Service owners often rely on their observability platform to support and maintain applications running in production. By extending the coverage to include the lower environments (like DEV and QA), it is possible for DevOps pipelines to interact with metrics being emitted during the deployment lifecycle using an open-source tool such as Ansible.
This means that as the DevOps pipelines are making changes to an environment, an Ansible Playbook can be created to monitor a given set of metrics in order to know if the deployment is running as expected. If no anomalies or errors surface, the pipeline will continue running. Otherwise, the current task will abort and the prior state of the deployment will be restored.
As a result, using a collection of metrics defined by the service owner and the observability platform, the need for manager approval becomes diminished. This is because the approval of the merge request is where the change was analyzed. Additionally, the approving manager step often was added because a better alternative did not exist. With the manager approval step replaced, the deployment to Prod can be triggered by the same DevOps pipeline.
In taking this approach, the status of the change ticket can reflect the actual status as tasks are completed by the automation. Example statues include Created
, To Be Reviewed
, Approved
, Started
, In Progress
, and Completed
(or Completed With Errors
).
Next-Gen Deployment Lifecycle
By eliminating Toil and introducing DevOps automation via pipelines, a next-gen deployment lifecycle can be created.
Figure 2. Next-gen deployment lifecycle
In Figure 2, the deployment lifecycle becomes much smaller and no longer requires the approving manager role. Instead, the observability platform is leveraged to monitor the DevOps pipelines.
With the next-gen deployment lifecycle, the software engineer performs the merge-to-main step after the merge request has been approved. From this point forward, the remainder of the process is completely automated. If any errors occur during the CD pipeline steps, the pipeline will stop and the prior state will be restored.
Compared to Figure 1, all of the existing Toil has been completely eliminated and teams can get into the mindset that a merge-to-main event is the entry point to the next production release. What's even more exciting is the improvement that teams will see with their commit-to-deploy ratios in adopting this strategy.
Shattering Unjustified Blockers
When considering next-gen deployment lifecycles, three common thoughts are often raised:
1. We Need to Let the Business Know Before We Can Deploy
Software engineers should strive to enhance or update services in a manner where business-level approval is not a requirement. The use of feature flags and versioned URIs are examples of how automated releases can be achieved without impacting existing customers. However, it is always a great idea to communicate what features and fixes are planned — along with the expected time frames.
2. The Manager Should Know What Is About to Be Deployed
While this is a fair statement, the approving manager's knowledge of the update should be established during the sprint planning stage (or similar). Once a given set of work begins, the expectation is that the work will be completed and deployed during the given development iteration. Like software engineers, managers should adopt the mindset that merge-to-main ultimately results in a deployment to production.
3. At Least One Person Should Approve Changes Before They Are Pushed to Production
This is a valid statement, and it actually occurs during the merge request stage. In fact, the remaining approval in the next-gen deployment lifecycle is where it is for a very good reason. When one or more approvers review a merge request, they are in the best position — at the best point in time — to review and challenge the work that is being completed. Thereafter, it makes far better sense for the observability platform to monitor the DevOps pipelines for any unexpected issues.
Conclusion
The traditional development lifecycle often includes human-based approvals and an unacceptable amount of Toil. This Toil not only becomes a source of frustration but also impacts the productivity and mental health of the software engineer over time. Teams should make it a priority to eliminate Toil in their roles and responsibilities and drive toward next-gen development lifecycles using DevOps pipelines and integrating with existing observability platforms. Taking this approach will allow teams to adopt a "merge-to-main equals deploy-to-Prod" mindset. In doing so, commit-to-deploy ratios will improve as a nice side effect.
Thirty years ago, I found my passion as a software engineer, and 30 years later, I still love being a software engineer. In fact, I am even more excited for the path ahead, free from human-based approvals due to DevOps automation and Toil elimination.
Have a really great day!
Resources:
- "Staff Engineer: Leadership beyond the management track" by Will Larson, 2021
- "Identifying and Tracking Toil Using SRE Principles" by Eric Harvieux, 2020
- "Monitoring as code with Sensu + Ansible" by Jef Spaleta, 2021
This is an excerpt from DZone's 2024 Trend Report,
The Modern DevOps Lifecycle: Shifting CI/CD and Application Architectures.
For more:
Read the Report
Opinions expressed by DZone contributors are their own.
Comments