Skip to content

October 26, 2021

How to Use GitOps: What Every CIO Needs to Know

By IT Revolution

This post is adapted from the 2021 DevOps Enterprise guidance paper by Chris Hill, Tom Limoncelli, Dr. Gail Murphy, Cornelia Davis, and Dwayne Holmes.


In many organizations, engineers spend too much time waiting for infrastructure change requests. Limited resources cause infrastructure-dependent changes to lag behind, increasing development demand. GitOps presents one solution by repurposing your organization’s existing Git pull-request workflows to permit
infrastructure-oriented teams to safely democratize changes.

Distributing control down to the dependent teams improves transparency, velocity, predictability, auditability, and more. In a GitOps world, all infrastructure elements are defined as code. Changes are proposed in the code repository with the same mechanism as developers working on traditional applications. When the changes are approved and committed, automation validates the change and deploys them into production safely.

Developers are already familiar with your existing Git pull-request workflow, which can make it easy for them to switch to and from the application code they are already developing. Other benefits include collaborative discussions, ephemeral environments, formal change traceability, automated testing, and human-in-the-loop approval—all enabling your infrastructure changes.

In this post we’ll define GitOps, explain where it is best applied and why, and make suggestions on adopting this practice in your organization.

GitOps Definition

GitOps is the application of the Git pull-request workflow in an infrastructure change context. Owners of infrastructure provide Git pull-request workflows as the interface for proposed changes as follows:

  • Proposed changes to the infrastructure are presented in the form of a pull request in a designated Git repository.
  • Owners of the infrastructure collaborate on the pull request and ultimately approve or reject the change.
  • Automation actively tests and deploys the changes in a temporary context to generate data to support the proposed change.
  • Owners of the infrastructure provide documentation and templates to guide those who create pull requests.

Most software developers are already familiar with using Git with source-control management and with the traditional pull-request workflow that follows:

  1. Any developer on any team can pull a new story or bug.
  2. The developer creates a branch and iterates until ready for approval to trunk.
  3. The developer submits a petition to change the trunk branch, otherwise known as a pull request.
  4. The feature branch CI/CD kicks off automated tests and syntax checks.
  5. Other developers collaborate, review, and give feedback until the change is approved.
  6. The pull request is merged, and CI/CD is kicked off from trunk.

This repository should open up permissions to anyone to submit a pull request in order to welcome changes from anyone within your organization. The petition to change includes a description of the difference between previously described infrastructure and the impending newly described state of the infrastructure. This uses unique source code files containing a language that defines (or “declares’’) the desired shape and configuration of a resource, also known as infrastructure as code. 

When the merge request/change is approved, automation deploys the change by comparing the existing actual  infrastructure state to what is desired and described in code. It then makes the minimum number of changes to “true up” the infrastructure to the desired state.

GitOps shifts transactional (ticket-based) requests to collaborative (pull-request based) requests. The collaborative nature improves quality as errors are caught earlier and the collaboration provides unparalleled transparency.

Other forms of creating self-service infrastructure requires software projects that are expensive and burdensome. Creating a web-based, self-service portal for infrastructure changes is expensive. It requires UX design skills that are rare; expecting operations engineers to also be UX experts, which is a losing game. It requires the web frontend to keep pace with infrastructure changes, which means more life cycles to maintain. In reality, one will lag behind the other, making nobody happy. GitOps creates a better paradigm because it permits safe collaboration within the system that the infrastructure team already uses.

While we describe GitOps as using Git’s pull-request workflow, the technique can be used with other systems. For example Perforce’s equivalent is their CL workflow. Bitkeeper and others also have equivalent workflows. While specific CI/CD tools may lend themselves to a GitOps-based narrative, most industry leading tools can be configured to run the GitOps workflow.

Problems GitOps Aims to Solve

GitOps aims to solve a number of existing problems that are exemplified in the following statements one might commonly hear on a software development team:

  • “We have to babysit every production infrastructure change and make changes on the fly.”
  • “We’re never sure exactly who did what during an infrastructure change.”
  • “Infrastructure changes are impossible to roll back to the previous state; we must roll forward.”
  • “The infrastructure team is always too busy to accept my meeting invites to answer design questions.”
  • “I hope this infrastructure change goes in smoothly. We couldn’t replicate in staging.”
  • “My infrastructure change didn’t follow procedure.”
  • “My infrastructure change ticket got rejected without any reason.”

GitOps is a tool in your DevOps toolbox. Like all such tools, there is a place and time where it has the most impact. Situations where GitOps is most useful include:

  • When the Infrastructure or resources can be described by a declarative or infrastructure-
    as-code language, such as Kubernetes, DNS, network devices, etc.
  • When requests for change are standard add/modify/remove requests.
  • When the type of request typically requires intense human scrutiny.
  • When a human in the loop (HITL) is required. In this situation, leveraging the pull-request approval mechanism alleviates the need to reinvent the wheel.
  • When the request  can be deemed contentious and might require large amounts of discussion. By using GitOps in this situation, engineers can discuss hypotheticals endlessly. Viewing the request as a pull-request focuses the discussion.
  • When requests are infrequent. In this situation, GitOps leverages the existing user interfaces for pull requests, providing an easy to use interface without requiring new investment.
  • When pull requests are common practice or used as a mechanism to increase developer productivity.

GitOps is not always appropriate. Situations where we do not recommend GitOps include:

  • When there is no declarative or infrastructure-as-code language for describing the resource or infrastructure and creating one would not have a positive return on investment.
  • When requests are not transactional. Such as a request that requires assistance investigating an anomaly.
  • Where programmatic rollbacks are not possible.

GitOps Tenets

GitOps can be ineffective if it is used without the below foundational tenets. (Since the paradigm is constantly evolving, so is this list.)

Declarative not Imperative

The syntax that defines the infrastructure “desired state” maintains definitions that are declarative in nature, not  imperative. Declarative in this context means specifying a defined state without including exact steps and/or instructions on how a system should transition to that state. This means that the mechanism for transitioning from an existing “actual state” to the destination “desired state” can be left up to the native functions of the configuration management tool themselves (see idempotency definition later).

You could compare imperative definitions to a blueberry muffin recipe and declarative definitions to the end state after executing the recipe (i.e., end state = nine hot baked blueberry muffins ready for consumption). Even this example is dangerous because “hot” and “ready for consumption” are subjective to perception bias.

Pull Requests Are the Workflow

For any company that builds software, the pull request should sit at the heart of how software is developed. Another developer reviewing your code should be seen as an effort to make you a better developer and ensure value and quality for your business. By investing in the etiquette, process, and culture of using pull requests, developers help everyone get better as a team and build a story behind the why with historical comments and discussions. Many companies can use the pull-request process to mentor, coach, and grow as a development community.

Pull request feedback can be made in multiple contexts: “Why did you take that approach?”, “Will this scale for the future?” “Will this actually do what it’s intended to do?” “Is that the best use of the language?” “Are these external dependencies safe?” “What is the expected experience or customer journey?” There may be a collective decision to close a merge request as a result of the questions being asked—that’s okay too—the learnings have been captured and can be referenced in the future. All the same benefits, principles, and feedback can be relevant and appropriate for GitOps changes as well.

Infrastructure as Code (IAC)

The entire nature of GitOps requires text representation of your system to collaborate, manipulate, and automate with. This is known as infrastructure as code, and many tools already exist  in the industry, including Terraform, Helm, Chef, Puppet, Ansible, and bespoke languages such as DNSControl, home-grown languages, and YAML.

The defined text is stored in a source code repository (Git) and the trunk branch can be considered the “truth” desired state. Every proposal (source branch in a pull request) to change has a text diff representation of before and after. This typically reserves the cognitive focus for conversations about the what rather than the how. The entire set of activities is auditable and recorded in the history of Git and the pull request itself, creating a ledger of engineering discussions.

Continuous Integration and Continuous Delivery (CI/CD)

GitOps requires familiarity with the relationship between automation and the code change. Continuous integration is the principle of ensuring every commit attempts to validate a predefined success criteria as if it was already the truth before it becomes the truth. Continuous delivery ensures that the truth matches the system state without manual involvement. These mechanisms are vital to understand as they prop up the foundation of what GitOps uses.

Example: Walkthrough Developer Experience

Sara is a lead software engineer whose team is responsible for the hat-selling ecommerce site HatDash.com. Her team has been preparing for months for the big day when the site will go from beta to completely open to the public. During beta, employees of HatDash have been able to access the work-in-progress site through a backdoor for beta users only in order to actively assist in the quality assurance process. This process has finally come to fruition, and HatDash is now ready to open the site up to their entire prospective customer base.

A different internal software team maintains the main ingress load balancer (think routing switch board), so Sara follows the below process:

  1. Navigate to the load balancer configuration repo. Read README.md on how to use it.
  2. Create a branch with desired change to open the external ingress infrastructure object to the public internet.
  3. Raise a merge request to the trunk branch.

After raising her merge request, a pipeline kicks off. Within a few minutes she receives feedback with a failed notification—a linter failed with a syntax error—and an error message suggesting an alternative. Sara fixes this and re-pushes her update. Another pipeline kicks off, linter passes then proceeds to execute a series of tests ensuring her intent matches the results in a dry run. Fortunately, everything shows up green! The ingress team is notified of a successful pipeline result from a pending merge request in Slack and jumps in to start the review process. This initiates beleaguered discussion within the merge request about expected volume, DDOS protection, and SLO/SLAs. Sara is able to satisfy the criteria with her answers and obtain merge request approval, which is then merged to the trunk branch.

After the merge hits the trunk branch, another pipeline kicks off, which deploys the new configuration change to the production ingress controller. Traffic from HatDash.com is now being routed to her team’s site. When the switch occurs, Sara’s team has built a monitoring dashboard to monitor the traffic, navigation experience of clients, and conversions. She is then notified via Slack on how to trigger a back-out by a feature flag toggle in the event that she needs to rollback the previous deployment.

Strategies

GitOps as a concept can be implemented poorly and suffer from impeding constraints in other areas of your workflow. Below are a few areas to give consideration when implementing GitOps.

  • Branching Strategies: Your environments (fixed and/or ephemeral) are directly correlated with the strategy at which your source control repository is organized. Environment organization or taxonomy can constrain your branching strategy and vice versa.
  • Environment Integrity: If only part of your environment stack is controlled by GitOps and other dependent areas are not, you’ve created a hybrid mutable state that can lead to inconsistency in results.
  • Merge Request Paradigm: The merge request discussion is one of the most valuable places engineering decisions can be preserved historically, and it is a test harness for “what-ifs.” Establishing a mandatory automation pattern and communication etiquette can help ensure the right questions are asked and verified at the right time.
  • Andon Cord: As external dependencies become increasingly more common, it’s equally important to track the latest changes upstream and manage the change risk introduced into your environment. GitOps scheduled intervals or triggers from upstream pipelines can be built to ensure upstream work did not break anything to warn the next roll out. A break coming from upstream could be considered an automated pull of an andon cord.
  • Policy as Code: Using tools like Open Policy Agent, GitOps can become an enabler for policy enforcement to ensure infrastructure as code falls within policy constraints. This can be extremely beneficial for security-related measures to protect assets from going “out of bounds.”
  • Idempotency: Creating a consistent final actual state is important to ensure that no matter what actual you started with, or how many times the automation is run, you can predict what the final state will be. Most automation written in GitOps uses idempotent actions to create this consistency.

Call to Action

Getting started with GitOps has a few approaches that could naturally fit into most enterprise workflows.

  • Propose during Incident Review: An incident review might lead to finding that the process at which an offending change was done might be prime for improvement. It’s important to ensure that the context fits, but it may be helpful to ask some questions: “Would GitOps have helped make the infrastructure change more predictable?” “What was the experience for this customer to change the infrastructure? Could it be improved?” If there is a question as to the credibility of the paradigm, there are more resources available to reference, such as the OpenGitOps group at the Cloud Native Computing Foundation (CNCF).
  • Small Batches: GitOps isn’t something that has to be implemented for the entire organization or the entire ecosystem at once. Rolling out GitOps can be completed incrementally using the small batch principle. For example: Start with a known particular resource (Kubernetes, DNS, network configuration, virtual machine OS, etc.) and begin defining. Then change it using infrastructure as code. Next, use a source code repository to keep track of your resource state. Then use the CI/CD system to push changes instead of a local machine. Establish a written template of what human pull request approvers must do to validate and approve a pull request. Permit pull requests from outside the team that owns the infrastructure or resource by providing “how to” documentation and examples. Accept pull requests for improvements to such documentation, as your users know what they need better than you do.

Next, add automated tests incrementally. Start with basic syntax/lint checking and then automate items from the rubric over time. Strike a balance by automating boring mechanical checks so humans can focus on items that require wisdom and judgement per the compensatory principle.

Finally, institute continuous improvement by adding new tests to the rubric or automation as the need arises. For example, a new check may be envisioned as the result of an incident review.

DevOps vs GitOps

DevOps provides a set of practices and tools that enable organizations to more efficiently build, evolve, and operate a software system or service. Traditionally, the development and the operation of the software were separate, conducted independently within an organization. Many problems tend to result from this separation, including significant lags in moving software from development into use. Adopting a DevOps approach involves integrated development and operation teams that can seamlessly move new software into operation and employ tools to automate as much of this movement as possible.

While DevOps practices and tools help to narrow the gap between software development and operating software, lags can still exist in ensuring the infrastructure to operate the software is appropriate. As the software being built evolves, new requirements are typically placed on the infrastructure and the environment in which the software operates. GitOps practices aim to reduce lags in the speed of needed infrastructure changes, to ensure that development and operation teams are collaborating effectively regarding what the software needs to run. GitOps also ensures that as much of the tedious work of operations as possible is automated.

Conclusion

When infrastructure changes are high friction, it hurts productivity, motivation and engagement, and encourages people to create more layers to distance themselves from dependent teams. In the end, it hurts the success of your organization.

GitOps repurposes your company’s existing Git pull-request workflows to permit infrastructure-
oriented teams to safely democratize their changes. By creating an interface for humans to meet technically, emotionally, and organizationally without ambiguity, GitOps drives a paradigm closer to how software already flows. The result is a process with improved consistency, safety, accountability, and transparency.

Without GitOps, infrastructure organizations either provide an API (which requires coding skills to use) or take all requests via tickets (which requires human involvement at every step). With GitOps, the normal pull-request workflow that developers are used to is repurposed as the interface for making changes. A draft pull request means the proposed change is still a work in progress and can have ongoing automated testing upon each draft edit. Once approved, automated deployment turns the pull-
request into reality.

The benefits of GitOps include shorter wait times, less back and forth for each change, and fewer opportunities for error. It also creates new opportunities for collaboration and automation.

Getting started is easy and doesn’t require an “all or nothing” deployment. In fact, it may not require any new tools as organizations should utilize their existing CI/CD infrastructure. We recommend picking a specific part of the infrastructure most in need of improvement. Adopt infrastructure-as-code techniques, if not already in use. Evolve to using a pull-request workflow with automated testing for drafts and automated deployments after approval. Once successful, repeat in other areas.

Encourage the use of GitOps when brainstorming both tactical and strategic improvements. Make sure that GitOps success stories are highly visible. When people see the benefits of Gitops, they will be more likely to adopt the practice on their own. Soon you will be on your way to a GitOps-driven organization that is agile, safe, and automated.


Download the entire DevOps Enterprise Forum guidance paper from the IT Revolution library here. Or read all the 2021 DevOps Enterprise Forum papers in the fall issue of the DevOps Enterprise Journal.

- About The Authors
Avatar photo

IT Revolution

Trusted by technology leaders worldwide. Since publishing The Phoenix Project in 2013, and launching DevOps Enterprise Summit in 2014, we’ve been assembling guidance from industry experts and top practitioners.

Follow IT on Social Media

No comments found

Leave a Comment

Your email address will not be published.



Jump to Section

    More Like This

    Serverless Myths
    By David Anderson , Michael O’Reilly , Mark McCann

    The term “serverless myths” could also be “modern cloud myths.” The myths highlighted here…

    What is the Modern Cloud/Serverless?
    By David Anderson , Michael O’Reilly , Mark McCann

    What is the Modern Cloud? What is Serverless? This post, adapted from The Value…

    Using Wardley Mapping with the Value Flywheel
    By David Anderson , Michael O’Reilly , Mark McCann

    Now that we have our flywheel turning (see our posts What is the Value…

    12 Key Tenets of the Value Flywheel Effect
    By David Anderson , Michael O’Reilly , Mark McCann

    Now that you've learned about what the Value Flywheel Effect is, let's look at…