Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
LLMs and Generative AI in the enterprise.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
New half-day virtual events with live watch parties worldwide!
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
April 7, 2017
Back in October of 2016, I was at OpenStack Conference in Barcelona and ran into a long-time friend, Rob Hirschfeld. He surprised me by talking about a problem domain that we have had discussions about for years, reframing it as “the data center underlay problem.”
His provocative statement was that while OpenStack solves many problems, it didn’t address the fundamental challenges of how to run things like OpenStack on actual physical infrastructure. This is a problem domain that is being radically redefined by the container ecosystem.
This is a problem that Rob has been tirelessly working on for nearly a decade, and it was interesting to get his perspective on the emerging ecosystem, including OpenStack, Kubernetes, Mesos, containers, private clouds in general (which include Azure Stack), etc. I thought it would be useful to share this with everyone.
For those of you who don’t know, Rob Hirschfeld is a long-time data center automation advocate with a history of blogging about open source, DevOps and Lean Process at robhirschfeld.com. He’s served four terms on the OpenStack board, started the Kubernetes Cluster Ops SIG and co-founded the open Digital Rebar hybrid provisioning project. He’s also known as @zehicle online, and is currently the CEO of RackN.
Gene Kim: Tell me about the landscape of docker, OpenStack, Kubernetes, etc. How do they all relate, what’s changed, and who’s winning?
Rob Hirschfeld: Containers have dramatically altered the IT landscape because they combine improved distribution/packaging models with improved performance and efficiency. These dual benefits are highly disruptive to platforms built assuming virtualization (like OpenStack, SDN and SAN vendors).
They are also highly disruptive to platforms that assumed slow or controlled software distribution processes (like Chef, Puppet or Cloud Foundry). Even hardware vendors and cloud providers will be impacted because the smaller footprint of containers drives towards commodity hardware and frustration with the “VM tax.”
While very exciting, containers create a management challenge for users and operators. Since they are short lived and rapidly regenerated, keeping track of containers through a development pipeline and into production requires sophisticated automation such as Kubernetes, Docker Swarm, Mesos, Rancher or other platforms. These platforms also expose gaps in software defined storage (SDS) and networking (SDN). Ironically, OpenStack’s rocky progress in these areas will likely be scavenged to accelerate their successor.
There are real developer productivity and operational efficiency gains from containers and the benefits are accelerating. I expect to see a wave of security and monitoring improvements that will make container packaging required in the near term. That progress comes at the expense of VM platforms like OpenStack.
GK: I recently saw a tweet that I thought was super funny, saying something along the lines “friends don’t let friends build private clouds” — obviously, given all your involvement in the OpenStack community for so many years, I know you disagree with that statement. What is it that you think everyone should know about private clouds that tell the other side of the story?
RH: The world is hybrid, everyone needs to adjust.
Telling everyone to go to public cloud is equally silly—the reality is that each company and workload is different. RackN’s mission is to have our customers not care about which infrastructure runs their application. Part of that is making Google scale data center automation accessible to everyone AND part is making workloads portable.
[GK note: I love the term for this: “GIFEE”: Google Infrastructure For Everyone Else]
The problem with OpenStack and building private clouds/infrastructure is that we tried to layer complex software on top of inconsistently managed, heterogeneous infrastructure. Even if OpenStack had been amazingly easy to run (and it was far from it), the resulting clouds would have been inconsistent. In my opinion, OpenStack did not solve the real problems that were driving people to AWS.
“Friends don’t let friends build private clouds without strong foundational process and automation” would be a better way to say it. Let’s help people do Ops better before we cloud-shame them.
GK: We talked about how much you loved the book Site Reliability Engineering: How Google Runs Production Systems by Betsy Beyer, which I also loved. What resonated with you, and how do you think it relates to how we do Ops work in the next decade?
RH: I strongly identify with Operators who write software to improve Operations. The whole RackN team often feels like we are odd ducks because we care about data center operations and running infrastructure, not just pumping out software. The book really speaks and embraces that mentality as an important mindset.
The book also establishes great ground rules. Justifying the 50/50 coding/Ops balance is critical because so many operators are underwater and cannot find time to improve. We really underestimate the importance of time and focus to improve. The book also does a good job talking about different types of Ops work and the importance of staying connected to running the applications and infrastructure.
Most importantly, it humanizes Google’s Ops teams. Google made incredible investments in Operations. By sharing their lessons learned, we can build tools that focus on the right priorities. By explaining the value of their approach, we can explain why those tools are worth the investment.
GK: Tell me what about the work you did with Crowbar, and how that informs the work you’re currently doing with Digital Rebar?
RH: Crowbar was created at the dawn of OpenStack because my team at Dell had learned from other cloud efforts that installation was a system problem (this is a lesson repeated in the SRE book). We needed to three degrees of control: the full node from bios to application, between nodes to build cluster, and outside of nodes to interact with data center services like DNS, DHCP, NTP, and PXE. Crowbar, which is still in use by SUSE, was successful when it could fully control those elements.
Unfortunately, no install ever allowed us to control all the elements so Crowbar was way to brittle in practice. It also lacked the state machinery needed to do upgrades. It turns out that upgrades are much more important (and harder!) than installs.
When we started over with Digital Rebar, we took a very different approach to the problem. We needed a design that could adapt to different site requirements, track state of a running system and make controlled incremental changes. We also wanted the platform to be much more transparent in how it was working so that mortals could troubleshoot issues in the field. Taken together, those are key SRE values.
The resulting Digital Rebar platform is a mix of specialized orchestration (annealing), functional work units (roles), and API-driven services. These three aspects of the architecture work together to create a composable platform. That allows us to mix physical and cloud infrastructure, mix Chef, Puppet, Ansible and Bash, and mix different operating systems and hardware types. Most critically, the composable nature allows us to make incremental changes needed for upgrades. Of course, the system is API driven and deployed as microservice containers!
We feel like Digital Rebar provides a solid foundation for SREs to build on. It allows them to (re)use open best-practice automation on their commodity gear. With those advantages, SREs can finally focus on the parts of their job that are actually different from other companies.
###
About Rob Hirschfeld: Rob Hirschfeld is CEO and co-founder of RackN, leaders in physical and hybrid DevOps software. He has been in the cloud and infrastructure space for nearly 15 years from working with early ESX Betas to serving four terms on the OpenStack Foundation Board and becoming an Execute at Dell. As a co-founder of the Digital Rebar project, Rob creating a new generation of DevOps orchestration to leverage the containers and service-oriented ops. He believes that the technology of running data centers and applications on cloud is just part of the bigger story. He trained as an Industrial Engineer and carries a passion for applying Lean and Agile process to software delivery.
Gene Kim has been studying high-performing technology organizations since 1999. He was the founder and CTO of Tripwire, Inc., an enterprise security software company, where he served for 13 years. His books have sold over 1 million copies—he is the WSJ bestselling author of Wiring the Winning Organization, The Unicorn Project, and co-author of The Phoenix Project, The DevOps Handbook, and the Shingo Publication Award-winning Accelerate. Since 2014, he has been the organizer of DevOps Enterprise Summit (now Enterprise Technology Leadership Summit), studying the technology transformations of large, complex organizations.
No comments found
Your email address will not be published.
First Name Last Name
Δ
As manufacturers embrace Industry 4.0, many find that implementing new technologies isn't enough to…
I know. You’re thinking I'm talking about Napster, right? Nope. Napster was launched in…
When Southwest Airlines' crew scheduling system became overwhelmed during the 2022 holiday season, the…
You've been there before: standing in front of your team, announcing a major technological…