LLMs and Generative AI in the enterprise.
Inspire, develop, and guide a winning organization.
Understand the unique values and behaviors of a successful organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how to enhance collaboration and performance in large-scale organizations through Flow Engineering
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
Exploring the impact of GenAI in our organizations & creating business impact through technology leadership.
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
The debate over in-office versus remote work misses a fundamental truth: high-performing teams succeed based on how they’re organized, not where they sit.
Leaders can help their organizations move from the danger zone to the winning zone by changing how they wire their organization’s social circuitry.
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
February 13, 2024
Google pioneered the concept of Site Reliability Engineering (SRE) back in 2003. After 20 years and thousands of SREs managing Google’s massive infrastructure, they’ve learned a thing or two about running large-scale, reliable systems. Dr. Christof Leng, SRE Engagements Engineering Lead at Google, recently shared 10 insightful lessons that both business leaders and software developers should take note of.
Just like air and food, reliability is easy to take for granted until something goes wrong. There needs to be a voice advocating for it in every organization.
Have standardized, interchangeable components instead of unique and fussy “pet” systems that require special care. This allows easier scaling and change management.
When people aren’t afraid to reveal issues, you can discover weaknesses and fix root causes. Pointing fingers rarely solves anything.
Metrics drive behavior, so ensure they incentivize the right outcomes and iterate if needed. Don’t just blindly follow numbers.
Being on call helps SREs deeply understand systems and build credibility with developers. But don’t play the hero — solve issues as a team.
Automation increases consistency and frees up more time for engineering improvements. Make it a priority, not just a future wish list item.
Roll out changes gradually to limit blast radius. Never deploy without code reviews or on Fridays. Wait until rollouts are flawless before deploying without oversight.
Outages will happen, so have fast rollback procedures in place and focus first on restoring service. Collect data, but analyze the root cause later.
Have a written record of actions taken and info discovered so the full team can quickly get up to speed and help resolve issues.
Monitor and pay down historical issues proactively or risk unmanageable systems no one will want to touch.
By keeping these proven lessons from Google’s SRE team in mind, technology leaders can foster the habits and culture required to run reliable, resilient systems as they scale.
To watch the full presentation, please visit the IT Revolution Video Library here: https://videos.itrevolution.com/watch/872732131
SRE Engagements Product Area Lead at Google
Managing Editor at IT Revolution working on publishing books and guidance papers for the modern business leader. I also oversee the production of the IT Revolution blog, combining the best of responsible, human-centered content with the assistance of AI tools.
No comments found
Your email address will not be published.
First Name Last Name
Δ
Artificial intelligence and large language models (AI/LLMs) have emerged as powerful tools that can…
As enterprises begin exploring generative AI, understanding the core technical components becomes essential—even for…
Organizations building complex cyber-physical systems face mounting pressure to innovate faster while maintaining reliability,…
There’s no denying that generative AI stands out as perhaps the most transformative force…