Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
LLMs and Generative AI in the enterprise.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
New half-day virtual events with live watch parties worldwide!
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
November 6, 2012
Another of my favorite talks from the Velocity Conference 2012 in June was a 12 min talk by Mike Brittain (@mikebrittain) of Etsy, where he is Director of Engineering and Operations (LinkedIn). In this talk, he described how companies should build user experiences that are resilient to failure, and less like developer error codes. Haha.
He began with a metaphor of a grocery store. He asked the audience what should happen if the restroom sink flooded. Does the security guard clear out the store, telling everyone they must stop shopping and leave, and that the store will be closed until the problem is fixed?
Obviously, that’s ridiculous. But, he suggests that this is what almost all websites and online service properties do. When we have an major error in the application/infrastructure or a minor error in a supporting service (our equivalent of the plumbing), we’ll close the entire store, kicking the customer out.
And because every service that our sites are dependent upon creates additional surface area for the site to fail, there are a bewildering number of things that can cause error. Given this, how do we make the user experience more resilient and insulated from these problems?
Here’s an example of an error message that Brittain finds deplorable. (He joked, “An error page with an application stack trace with the company logo on it isn’t much better.”)
At Etsy, they began by identifying services that make up the critical path and identifying those that only provide “primary value” vs. “ancillary value” (i.e., secondary value). For Etsy, they identified that the most important areas of the site are the Product listings, Production Descriptions, Photos, and Add to Cart feature. Everything else is just gravy.
The first design principle to building a resilient user experience is that when ancillary services fail, instead of displaying an error page or anything that interrupts the user experience, simply hide it (e.g., the “like” button, user message inbox, etc.). Casual users will simply continue their experiences unimpeded, usually without realizing anything is wrong. It will only be the extremely savvy users, he joked, that will actually notice the failure.
Etsy favors fast page load times, as opposed to completeness. For Etsy Sponsored Ads, they require that it get rendered within 400ms, or get dropped. Other areas of the site use non-blocking Ajax elements to decrease load times (by reducing load on browser client). And yet other areas of the site use the “circuit breaker” pattern described by Mike Nygard in“Release It! Design and Deploy Production-Ready Software” to prevent clients from calling functions that we know that are failing, thus exacerbating the failure. (Ben Christiansen describes this further in the article “Fault Tolerance in a High Volume, Distributed System” at Netflix here.) By creating a system where failures are hidden from users and where back end services can fail independently, Etsy created the grocery store that doesn’t kick people out when the sink overflows. It allows customers to continue doing the most important thing: shopping, despite the fact that all the details didn’t make it onto the page.
“Here’s a very simple hack that everyone should do. Put a beacon on your error page, and generate a graph like this, and show it the business every day. There are the page views where the users have gotten off track, and will leave the site.”
At Etsy, Brittain says, they favor a fast page over a complete page. By focusing on the big picture and keeping customers in mind, they’re building successful UIs that help the business win.
The biggest takeaway here is that after Brittain and his team identified the critical path services and separate them from background services, they were able to plan for failure scenarios and ensure their customers weren’t “kicked out” of the store because of a failure entirely unrelated to what they were trying to do: purchase goods.
Mike Brittain, Etsy: “Resilient User Experiences”
Gene Kim has been studying high-performing technology organizations since 1999. He was the founder and CTO of Tripwire, Inc., an enterprise security software company, where he served for 13 years. His books have sold over 1 million copies—he is the WSJ bestselling author of Wiring the Winning Organization, The Unicorn Project, and co-author of The Phoenix Project, The DevOps Handbook, and the Shingo Publication Award-winning Accelerate. Since 2014, he has been the organizer of DevOps Enterprise Summit (now Enterprise Technology Leadership Summit), studying the technology transformations of large, complex organizations.
No comments found
Your email address will not be published.
First Name Last Name
Δ
As manufacturers embrace Industry 4.0, many find that implementing new technologies isn't enough to…
I know. You’re thinking I'm talking about Napster, right? Nope. Napster was launched in…
When Southwest Airlines' crew scheduling system became overwhelmed during the 2022 holiday season, the…
You've been there before: standing in front of your team, announcing a major technological…