Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
LLMs and Generative AI in the enterprise.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
New half-day virtual events with live watch parties worldwide!
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
September 22, 2022
Is it just me or does it seem like things have a tendency to break on the weekend when the parts and service repair companies are closed? This past weekend, as temperatures outside nudged up above 100°F, a strange electrical odor started filling the house. By evening, it was starting to get warm inside. I did a quick check of the air conditioning system. The condenser unit outside was running, but the inside blower was not working. Oh great! Why is the blower not working? Turning “fan only” mode on didn’t fix it. I did some quick investigation on our system and discovered we have a ECM blower motor. ECM ,which stands for electronically commutated motor, means that it is a microprocessor controlled motor that is designed to optimize efficiency. It is very cool! I have to admit, by this point I was distracted by the tech and not focused on the restoration of service. With the warming room and questions from the rest of my family, I was reminded to go back into incident management mode and focus on addressing the outage.
I took apart the unit and began testing the control circuit and all the voltages. The main 110V feed was going to the motor. The ECM 24V control lines that program the motor speed were also good. The air handler control system seemed solid, so that meant it had to be the motor itself, and very likely the microcontroller that runs the motor. Of course, some quick online searches revealed that this wasn’t going to be an easy replacement with our current supply chain issues and chip shortages. I reached out to several service companies and received several, “We will get back with you on Monday” responses.
Now the funny part of the recent adventure is that we just had the entire system inspected and serviced earlier this year. We didn’t want to end up in the heat of the summer and unable to find service or parts to fix any issues. Yet, here we are. This is a great, if not warm, reminder that failures will occur despite all our efforts.
Things fail. Reliability engineering teaches us that it is not “if,” but “when.” Because we know things will fail, we design systems and processes to mitigate those failures, restoring service as fast as possible. Done well, failures are addressed quickly, and sometimes even automatically, helping us continue to meet our service level objectives for the particular system. With complex systems, it can be challenging to plan for all the failure modes that can occur. How do you know what can fail? One way to achieve this is by probing the weakness and safety boundaries of a system through Chaos Engineering.
We often laugh that we don’t need Chaos Engineering because the apps we run inject their own chaos! But, Chaos Engineering is not about testing known broken parts of the system. If we already know something is broken (like a motor or an app), we should prioritize and fix it. Chaos Engineering is about discovering otherwise unknown weaknesses and limits of a working system. By introducing various degrees of planned failures (chaos) into our system, we learn new ways our service levels (SLOs) can be impacted. This gives us the opportunity to learn more about the system and improve it for faster recovery when it does fail.
Learning is key. Failures, chaos engineering and experimentation are all teachers who can impart wisdom to all who seek their advice. And as I just experienced this past weekend, life is full of lessons. The only true failure is the failure to learn. We are surrounded by a laboratory of learning that can instruct us and make us better. Let’s make sure we maximize the opportunities that come our way and implement improvements that make the systems we run, better. Oh, and depending on your heat tolerance level, you may want to stock a spare EMC blower motor on the shelf.
Jason Cox is the coauthor of the upcoming book Investments Unlimited: A Novel about DevOps, Security, Audit Compliance, and Thriving in the Digital Age.
Jason Cox is a champion of DevOps practices, promoting new technologies and better ways of working. His goal is to help businsses and organizations deliver more value, inspiration and experiences to our diverse human family across the globe better, faster, safer, and happier. He currently leads SRE teams at Disney and is the coauthor of the book Investments Unlimited. He resides in Los Angeles with his wife and their children.
No comments found
Your email address will not be published.
First Name Last Name
Δ
Organizations face critical decisions when selecting cloud service providers (CSPs). A recent paper titled…
We're thrilled to announce the release of The Phoenix Project: A Graphic Novel (Volume…
The following post is an excerpt from the book Unbundling the Enterprise: APIs, Optionality, and…
A few years ago, Gene Kim approached me with an intriguing question: What would…