The following is an excerpt from a presentation by Dana Finster, Sr. Software Engineer, and Bryan Finster, Staff Software Engineer at Walmart, titled “Scaling Continuous Delivery at Walmart.”
|Dana Finster||I’m Dana Finster. I’m a CD Evangelist and Senior Software Engineer in Information Security.|
|Bryan Finster||I’m Bryan Finster. I’m a Staff Software Engineer and Team Lead for the CD Sherpa Team. We work for a small retailer in northwest Arkansas, you may have heard of it, Walmart.|
|Dana Finster||In 1950, Sam Walton opened his first small, little ‘Walton’s 5 & Dime’. But today, Walmart employs 2.3 million associates who support almost 12,000 stores in 28 countries worldwide, with half a trillion dollars in sales annually. This is our scale.|
|Bryan Finster||On the IT side, we’ve got hundreds of development teams worldwide, deploying the hundreds of thousands of nodes supporting every business that we have. We have really diverse tech stacks, everything from mainframe, to go line.|
|Dana Finster||We’re here to talk about scaling DevOps to this size, to Walmart size. Let’s start with the first rule of DevOps. Everyone knows the first rule of DevOps, right?|
|Bryan Finster||Don’t talk about DevOps.|
|Dana Finster||DevOps is overloaded. The term is interpreted in many different and often confusing ways. You can’t just go out and “buy the DevOps,” you can’t “hire the DevOps,” but at its core, DevOps is really simple — people collaborating together, using lean processes and heavy automation to deliver quality software, rapidly.
But if we don’t talk about DevOps, what do we talk about? What we do instead is focus in on the outcomes that we’re looking for and foster the culture to attain them. We’re all here looking to deliver quality software rapidly. And the key to the culture change that’s needed to attain that outcome is our people and our teams.
|Bryan Finster||We know from experience that we can grow effective development teams by having them focused on trunk based continuous integration, real continuous integration. Then by reducing the delivery increments to keep driving down that batch size. Also by asking ‘why can’t we deliver today?’ and solving those problems.
When the team is able to solve those problems, they not only become good problem solvers, but it generates a lot of teamwork. You get a team that can deliver value very rapidly. For instance, the team that I came from went from 0-12 deliveries a day to production.
|Dana Finster||We started by holding annual DevOps events to educate people about concepts like DevOps and continuous delivery.|
|Bryan Finster||The way we’re approaching scaling this to Walmart, since we can’t go team to team to change it, is by using gamified metrics for sharing culture and community, a unified deploy platform, and Sherpa guides to help teams with any struggles that they have.|
|Dana Finster||It started by educating people, holding DevOps days to teach people about the concepts of DevOps and continuous delivery. These really got people excited and it started getting the word out.
I went to one a couple of years ago and I was also very excited to bring continuous delivery back to my team. I knew that it would make our lives easier. I knew that it would work better with our business partners and deliver value faster.
The problem I encountered was that I couldn’t find an area within the organization to learn more, to find out what initiatives we’re currently working on and how to implement continuous delivery.
I looked around and found a lot of pockets of really good progress. We had teams that were building pipelines, we had teams that were focused on testing and continuous integration, and many teams were trying to solve exactly the same problems independently.
I had to figure this out, so I decided to host another DevOps day and I brought in leaders and developers from all over the country to share the vision and highlight the progress that they were making within the organization.
Unfortunately, I knew that this event was going to garner a whole lot more excitement and that people were going to learn more, and want to accelerate faster. But these same excited people were going to end up just like I did, looking for that central area to kind of guide them and what those next steps are.
So, I built us a home. I started Continuous Chai, which is a CI/CD user group where people can come to share and learn about continuous integration, continuous delivery, and the myriad of topics that go along with it.
This community of sharing is the first of four initiatives that we want to share with you today.
I believe that when we want to change culture, it works to help use that culture to teach the culture.
Sharing is a key tenant of DevOps and it’s important to share early and often. We have old habits and human nature that hold us back. We only want to show people beautiful, shiny, finished products after we’re all done. Then we say, “Look how successful I was.”
What we’ve built in Continuous Chai is a forum where people have the freedom to share off the cuff ideas, to share their work in progress, and to highlight not just their successes, but a trusting environment where they can honestly speak about their failures.
By sharing early and openly, teams can learn a lot from each other and avoid wasting time with duplicate work efforts struggling to solve the same problems alone. This act of trusting user community is key to enabling large scale change.
|Bryan Finster||Having that network in place has been a really valuable tool. As people start onboarding the new tech stacks, they start asking the same questions. We see it over and over again in chat ops “how do I test this react op? Or how do I plug sonar into this thing?”
And we said, “Well, have you asked in Continuous Chai?” And they go to the community and you have the community dive in and help them. You get solutions so much faster you do by Google or Stack Overflow.
In fact, when Dana and I were working on this stack together, we went to Continuous Chai to get feedback because we knew not only that we had a trusting environment with friends, but we knew we were getting actual real feedback, not, “Oh yes, it’s wonderful.” It was a little daunting to see all the notes come by, but it absolutely helped us improve this material.
|Dana Finster||As we’ve talked a bit about what an asset that a user community can be, I hope that if you don’t have an engaged community that you might be thinking about starting one.
But, I’ve also learned some things along the way. First of all, a leader of a user community has to have passion. It’s not something that you can just tell someone to take care of and expect it to be finished. Building a community takes ongoing work to engage associates, maintaining a consistent schedule, and bringing interesting demos and discussion topics to the group.
Secondly, it takes a lot of patience. When we first started, there were many times when I was sitting in a room by myself or with one or two people. Yet, even a handful of people can brainstorm ideas and start bringing true value to the group. We have iterated in different formats along the way. We’ve done informal coffee chats, we’ve done demo and discussions focused on specific topics, and even offsite meetings.
Over time, we’ve come to find that we have the most success in our environment with meetings that have specific demo topics. We just keep iterating on the format and on the timing, and we currently have over 600 members and offer weekly demo and discussion sessions. It can’t be built in a day. When momentum does periodically slow because it will, there’s one fail-safe way to incentivize people to keep showing up — swag and free food.
|Bryan Finster||It’s true. It’s amazing what we’ll do as developers for a t-shirt. But the important thing is, this t-shirt is not something you’d get for showing up. You only get this t-shirt for contributing to the community. It’s a badge of honor. And so people celebrate it, “look, I have a Continuous Chai tee shirt.” It’s been really important.
The other thing is a unified platform.
When we first started on this journey, we had several areas and the areas were really digging into CD, they’d go and spin up their own Jenkins instance or whatever tooling they were using while other areas weren’t focusing on it all. They didn’t have the tooling, and didn’t have the bandwidth to get it done.
But it doesn’t scale for every single area, or every single team to get their own platform stood up. Product teams should be focusing on delivering products. Having a consistent unified platform they can use and is easy is absolutely key.
I work in software delivery enablement where the area that’s responsible for building the CD platform and it’s a set of tools we’re building. It’s delivery as a service. We want teams to be able to focus on those products and then just use the automation to deliver. We don’t deliver it for them. We just build automation.
We’re using open source tools and scaling them to Walmart. And I’ll tell you, we break a lot of tools, and we want to make the right thing the easy thing, we want you to flow downhill to success. The initiative we work on is called Irresistible Developer Experience. We want you to use the tools because they believe that they are better, that they’re easier to use and we find it’s really fast on more people on these tools.
|Dana Finster||Our delivery platform is designed to be implemented by all the teams across all the tech stacks in the organization. Having this single pipeline allows for security and code standards to be consistent across all the products in the enterprise.
New tools and controls can be injected and all teams can immediately benefit. Our platform uses simple configuration files that hide the complexity of the implementation from the development teams. Not all of the features are able to be configured by the developers as well. Things like code scanning and security controls are automatically turned on. Developers don’t have to set those up and more importantly, they can’t be skipped.
We showed this slide and a Continuous Chai presentation and one of the developers there noted that he’d never seen all the intricacies that go on within the pipeline. To him, it was like magic. He said, “as a developer, it’s almost transparent to me it goes to git, it gets built, magic happens.”
|Bryan Finster||A good example of this is Concord. It’s our workflow orchestration engine. It’s a general automation tool. We use it mostly for our CD pipelines. We also use it for any automation we want to do, including signing people up for classes. It has plugins for all the tools we use, and it’s easily extensible for other plugins that we need. More importantly, developers don’t need to understand the underlying implementation. All they need to understand is how to call those things from Concord.
It has also been planned from day one to release this back to the community as open source, which has enabled a good use case for Dana’s team.
|Dana Finster||My team supports our security infrastructure and incident response teams. Because of that, we are on a completely segregated air-gapped network. Since this is designed to be released to the broader community it’s designed to be very easy to install.
We’re able to implement our enterprise platform in our segregated network and very easily pull in the new features and still be able to take advantage of all the work going on across the enterprise.
Here is an example of how simple it is to configure the tools from the developer’s standpoint.
We simply have a configuration file located right alongside the code using a simple declarative language. This allows for configuring and versioning individual repositories to be very simple.
It hides the complexity from the developers, and each feature is just a simple function call. We can see right that a single line of code calls Hygieia and publishes the build metrics from this repo.
|Bryan Finster||Metrics are also super important. If teams have pipelines, but they don’t have goals to deliver to, they have no idea what their outcomes are supposed to be.
Therefore, it’s important to make those goals clear and the metrics clear so they understand how they’re trending against those goals.
To do that we use Hygieia. If you don’t know about Hygieia, Capitol One open-sourced Hygieia several years ago. Teams around our building had been using it for years, and we’ve now we’ve integrated into our pipeline. It gives you a real-time view of the CD pipelines and is a really important tool for the teams.
|Dana Finster||The product dashboard gives teams the metrics they need to understand the health of each individual repository with metrics including build stability, the frequency of commits to master, static analysis and test results, as well as code coverage and the frequency of deploys per day within each environment.
We’ve also added scoring to Hygieia. The metrics are weighted and aggregated to give an overall health score. This scoring gamification helps drive improvement and it allows teams to quickly see which code bases are more hardened and which might need a little bit of attention.
Teams can analyze where they might need to put attention by drilling down into each individual metric widget. For example, we can see that most of the code repo score is determined by the frequency of merge to master, but if teams are committing directly to master without using a pull request, then they’re going to take a hit on that score.
|Bryan Finster||I recently spoke to Scott from Columbia Sports who said, we should have taken a psychology class to get teams to change. He called it “hacking the biggest undocumented API,” you poke and prod and see what the outcomes are going to be.
Metrics are incredibly dangerous things if used inappropriately. You need to really understand those metrics to understand how people react to those metrics. Just because you put a metric in place and expect an outcome doesn’t mean you’re going to get it. Go and investigate.
An example we had of this, was when we implemented the scoring and we had teams coming to us saying, “Now I have to go and make changes to repositories that currently don’t need changes, just to keep the score up.” Well, that just generates waste. We don’t like waste and we want value.
We made some additional changes to make things better. We created a higher level view on top of Hygieia that aggregates the metrics up and averages them across the team.
We have a tool in inside Walmart that tells us how big a development team is, how many engineers are on that team. Therefore, we can average the scores based off of the team size and we can get those deploys per day, per developer, or commits to master per day and find out how teams are doing that to say, “Here are our goals, here’s where you are, how do we help you achieve those goals?”
But even then, we currently have all the scores weighted equally even though commits and deploys are far more important than code coverage. We have teams that are right now trying to raise their scores by raising your code coverage, which was incredibly easy to do.
What we’ll do in the near future is to drop the waiting on code coverage and increase the waiting on commits and deploy to get outcomes we want.
Furthermore, all of the scoring and the widget changes, we have pushed those back to Capital One and you can find those on their master today.
|Dana Finster||That’s team view also adds some competitive fun with the teams because you can see the scores of all the teams in the enterprise. I know this firsthand because I have a tech lead who pulls up the dashboard every day to make sure that we’re still winning. Ultimately, it goes a long way to provide visibility, especially for more competitive teammates or teams.
While Hygeia does give us the metrics to evaluate how we’re doing and as a team where improvements might be needed, sometimes teams need a little extra help to actually determine how to implement those next steps and gets to the next level.
|Bryan Finster||This is why we have Sherpa guides.
We’re a group of developers who’ve done this before and we can embed with teams and help them out. The team that I lead is CD Sherpa Team. We have been up and down the mountain. We know where the ravines are, we know where the landmarks are, we don’t want you to become one the landmarks so we help with anything required to get it done.
We do platform support for the tools to make sure you understand how to use the tools, but we were also run tech workshops on domain driven design or my favorite one, Agile rehab. That was a really popular one.
We do leadership outreach where we explain that this is a change in how teams should work. You need to understand how this impacts how you incentivize the teams to get the outcomes you want.
We do team boot camps where we will embed with the team for six weeks run two and a half day sprints. It’s very similar to the other dojo’s you may have heard about from other companies, and help the team with whatever their biggest constraint is helping move the needle, show them that improvements not only possible, but it can be really fun and build teamwork.
Finally, we are pie-shaped developers, not T-shaped developers. We have to have breadth and depth and it’s really hard to find people for this team. You can talk to anybody trying to build teams like this. It’s incredibly difficult.
We tell the teams, we’re not Agile coaches. We can coach you on Agile because you have to be good on this stuff to get this done. But we will also help you with planning out of legacy strangulation. If you need help with test architecture. We’ll peer program with you to teach you how to unit test if you need, we’ll do anything required.
If our team doesn’t know, we’ve been here for a long time, we know people that do, we’ll bring them in and get that knowledge to you as fast as we can.
|Dana Finster||At Walmart, we’re focusing on outcomes and fostering culture change to attain them. It’s not an easy task and it’s taking work from all directions.
Growing an excited base of people to advocate every day is important. We’re reaching out to leadership and developing a strategy that includes a single enterprise deployment pipeline, metrics that are focused on the right outcomes, and teams dedicated to training and enabling to deliver value safely and quickly.
The single pipeline across all teams and tech stacks makes the right thing, the easy thing to do. Standardized metrics make progress visible and understandable at different levels because it’s standardized. The gamification makes it competitive and fun. Our people really keep the momentum going to enable large scale change.
|Bryan Finster||Now, this works for us.
We’re seeing a lot of improvement using this process, but context is really important. You need to u understand that nothing you see is a cookie cutter solution. You need to find out what works in your culture. If people were incentivized by badges, give them badges. If they’re incentivized by certifications, do that. Whatever it takes to move the needle, get it done.
Also, it’s important to understand, you need to give people permission. Dana and I are not management. We are developers. Walmart has a strong culture of grassroots improvement. We didn’t ask for permission. When Dana needed a DevOps Day to learn more. She said, “How do I reserve the auditorium?” Not “May I have a DevOps day?”
When she decided to start Continuous Chai, she just got meeting rooms together, spun up a Slack channel, started Continuous Chai, and then made all of the leadership help.
There are people that are passionate about this in your organization, so make sure they know they have permission. Don’t assume. Find those people, elevate them and give them all the runway to bring everybody else along.
|Dana Finster||To wrap up, we’re going to share some of the outcomes that we have seen so far.
First of all, teams are collaborating. Lots of collaboration between teams is helping to remove duplication of efforts and to shorten the learning curve for teams. Teams who were focused on continuous delivery are delivering faster and they’re delivering with higher quality. When they see that and start to do that, they realize that CD removes the drama from delivery and improvement is addictive. Teams are actively trying to improve and using metrics to measure that progress.
|Bryan Finster||Teams were also having more fun. The motto of my team is “deploy more, sleep better.” When I get to sleep at night, I can find more entertaining ways to get things done at work, and I have time to find joy.
Thanks very much.
|Dana Finster||Thank you.|