Jason Cox is a Director of Systems Reliability Engineering (SRE) and DevOps Enterprise Summit programming committee member. He is also a coauthor of Investments Unlimited: A Novel About DevOps, Security, Audit Compliance, and Thriving in the Digital Age (available September 13, 2022). What is Site Reliability Engineering? Site Reliability Engineering (SRE) is often considered an expression of DevOps. Like DevOps, SRE seeks to bridge the traditional gap between development and operation … [Read more...]
How Google SRE and Developers Work Together
One of the prominent themes in so many DevOps Enterprise Summits has been Site Reliability Engineering, which I've always been dazzled by. There are so many interesting things about these principles and practices that Google pioneered back in 2003. I think it's one of the most incredible examples of how one can create a self-balancing system that helps product teams get features to market quickly, but in a way that doesn't jeopardize the reliability and correctness of the services they create. … [Read more...]
Gene Kim’s Site Reliability Engineering (SRE) Playlist
How Google SRE and Developers Work Together (US 2021) I love this talk because we hear from two leaders at Google on how Dev and SRE work together, both in the ideal and not ideal. They talk about SRE funding models, engagement models, what it looks like when things are going well, and when things are not going well, what they do to try to put things back on track. And you’ll hear hints on what it’s like to run customer-facing properties at Google: (e.g., “you never truly understand a system … [Read more...]