Skip to content

How Google SRE and Developers Collaborate

By Christof Leng, Tracy Ferrell, Alex Bligh, Michal Gefen, Betsy Beyer

This paper describes how Google’s Site Reliability Engineering (SRE) teams collaborate with their developer (Dev) counterparts to improve the reliability, efficiency, and velocity of Google’s products and infrastructure. The authors explain SRE’s mission and guiding principles, and outline three main types of engagements between SRE and Dev teams.

The paper includes case studies illustrating each engagement type. It also discusses anti-patterns to avoid and how to get engagements back on track when challenges arise. While every organization is different, the core SRE principles around specialist expertise, engineering focus, Dev partnership, and adaptive engagement can be applied flexibly in many contexts.

  • Format PDF
  • Pages 22
  • Publication Date May 2022

Features

  • Powerful Insights

    Learn how SRE and Dev teams collaborate effectively to improve reliability and velocity.

  • 3 Key Concepts

    Discover the three main types of SRE engagements and when to use each for maximum impact.

  • Real World Insights

    Gain insights from real-world case studies of successful and challenging SRE-Dev partnerships.

  • Applicable Guidance

    Understand core SRE principles that can be adapted to optimize your organization's approach.

About the Resource

This paper describes how Google’s Site Reliability Engineering (SRE) teams collaborate with their developer (Dev) counterparts to improve the reliability, efficiency, and velocity of Google’s products and infrastructure. The authors explain SRE’s mission and guiding principles, and outline three main types of engagements between SRE and Dev teams:

1. Baseline engagements: Ad-hoc support and consultation from SRE to help Dev teams with reliability best practices and incident response.

2. Assisted engagements: SRE provides proactive, project-focused support to Dev teams, often related to design and implementation of new products or features. Dev retains primary responsibility for the service.

3. Full support engagements: SRE takes on significant operational responsibilities for critical, evolving services that require the highest reliability. SRE and Dev work in close partnership.

The paper includes case studies illustrating each engagement type. It also discusses anti-patterns to avoid and how to get engagements back on track when challenges arise. While every organization is different, the core SRE principles around specialist expertise, engineering focus, Dev partnership, and adaptive engagement can be applied flexibly in many contexts.

Christof Leng
Tracy Ferrell
Alex Bligh
Michal Gefen
Betsy Beyer
Christof Leng

Christof Leng

SRE Engagements Product Area Lead at Google

To Author Archive
Tracy Ferrell

Tracy Ferrell

SRE Manager at Google

To Author Archive
Alex Bligh

Alex Bligh

Technology exec, SRE, engineering leader, entrepreneur

To Author Archive
Michal Gefen

Michal Gefen

Engineering Manager (SRE) at Google

To Author Archive
Betsy Beyer

Betsy Beyer

Technical writer for Google Site Reliability Engineering in NYC. I previously provided documentation for Google Data Center and Hardware Operations teams. Before moving to New York, I was a lecturer in technical writing at Stanford University.

To Author Archive

Similar Resources