Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
LLMs and Generative AI in the enterprise.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
New half-day virtual events with live watch parties worldwide!
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
March 21, 2019
The following is an excerpt from a presentation by Simmons Lough, IT Specialist, United States Patent and Trademark Office (USPTO), titled “If We Can Do It, You Can Do It!: DevOps Transformation at the US Patent and Trademark Office.”
You can watch the video of the presentation, which was originally delivered at the 2018 DevOps Enterprise Summit in Las Vegas.
I’m a tech lead on a system at USPTO called FPNG. USPTO stands for the ‘United States Patent and Trademark Office.’ We are the agency that grants patents and registers trademarks, and in doing this, we are meeting a mandate, Article I, Section 8, Clause 8 in the United States Constitution. When I think about the Constitution and our mandate and how we’re helping to drive the entire U.S. economy, and to some extent the world economy, we think this is a pretty big deal.
I want to give a general experience report of how over the past three years we’ve started applying some DevOps principles to the agency. I’ll sprinkle in some architecture, some little how we dealt with the product owner, the business, the executives, etc. One thing that is a little unique is that I’m right dab in the middle of an organization of 15,000 employees, and you don’t have to be an executive to make change happen.
A little bit about my system that I’m responsible for, FPNG which stands for Fee Processing Next Generation. It’s the replacement of a legacy system at USPTO called RAM which had been running since the early ’80s. One thing to note, USPTO doesn’t take money from Congress or the American taxpayer in the classic sense. We charge a fee for the goods and services that we provide. All told, what rolls through FPNG is a little south of three and a half billion dollars per year.
The way the workflow works is like this, if you have an idea for an invention anywhere in the globe, and you want a U.S. patent or a trademark, you submit your application, and then this system calculates how much in fees you owe. Then, of course, that goes into the USPTO bank.
In a 24-hour time slice in data, you’d have a large bubble over New York/D.C. That’s where we see a lot of patent and trademark activity happen. Of course, you’d also have Texas. I’m assuming that’s around Austin. Then, of course, there’d be a big bubble over Silicon Valley.
The other thing you’d notice is that as patents and trademarks come through and people make payments 24/7. It moves as the sun moves around the globe. My point is that since this is a real government system, and deals with three and a half billion dollars, it’s got to be up 24/7. During our heavy times, it’s about a million dollars an hour in collection, so outages or downtime is a big deal to us.
A few years ago, my daughter drew for me a big dinosaur, which may look familiar as to how software has historically been built and delivered in the federal government. Let me explain it to you.
In the belly of the beast, you have Dev and Test. You may have a backlog, you may have a scrum master, and you may meet every morning, and you maybe even have a retrospective. But then after two weeks, everyone looks at each other, and says, “All right. We’re done.”
But they’re not really done, there is still all this important work that has to happen. The way this usually works is you fill out a form in SharePoint. It gets routed to a different group, in this case, it’s the security group.
That gets put in a queue, and a week later, they then say, “Okay, we’ll schedule to run security scans next Tuesday at 2:00.” They run those scans. You get your feedback. Then there’s a negotiation of what you fix and what you don’t fix.
The same thing happens with the coding standards review. This is a third-party group where you fill out a form and ask for a code review. It’s a completely different area in the agency with a completely different set of contractors. They come back, give you some negotiation. Of course, all this time you think you’ve been done. This goes on, and in a lot of agencies, this could be 15, 20, 30 different groups you’re dealing with.
Finally, you come to something called the production readiness review meeting, or this ORR, (operation readiness review,) and you’re usually in a room. It’s all 20 or 30 of these people from these different groups. They go around and basically vote. You’re thumbs up, so on and so forth, but it’s only at that point can you actually put your software in production.
These are obviously anti-patterns for DevOps. Guess what? Because there’s so much bureaucracy, and the dinosaur takes so much effort to get through, software releases to production are few and far between.
We did have some good news— there were some building blocks already there. My hunch is that this is the case in other federal agencies and maybe even other large commercial companies. We did have an Agile Dev management tool. There was a place for user stories and tasks. We did have a scrum master.
One of the better things we did was to have what we called a CICM platform. We had a shared repo. This was a huge accomplishment in the federal government just to have source control. We had Jenkins stood up, your typical CICM thing.
We also had an automated infrastructure. My sister organization at NIST, however, has these five rules of what a cloud is, and I don’t think we met any of those rules. That said, there were some good parts to it that. If you compiled software and pushed it to the artifact library, there would be a robot that could pick up that artifact and install it. There wasn’t human intervention.
While these were all good things, we certainly couldn’t go from commit to production in any kind of fast manner, largely because of that dinosaur bureaucracy tail was getting in our way. We wanted to turn that tail around, and instead of having 20 or 30 third-party groups tell me with my software that I’m good to go, we wanted the app team to be responsible and be able to make that decision. With the help largely of a robot running some automated tests, they could click a button, or we could click a button, and that software would go to production.
We did the oldest sales trick in enterprise software: we created a pilot, with a pretty small scope. We thought the pilot was important because as people started to hear what we planned to do, they were getting pretty upset. There were a lot of people, all across the agencies and large commercial companies, whose job it was to fill out that questionnaire, to go to that meeting, and make it turn green.
In fact, when we started first talking about this, we were all in a meeting with the director in charge of that production readiness release, explaining to her what we wanted to do. I’m still to this day not exactly what I said, but I think it was something like, “I don’t want to fill out CRQs any more.” She literally left the room and stormed out. It was a contentious situation.
We thought the pilot would at least get our foot in the door. I’ll share with you the pilot: “Deploy, within a 24-hour timeframe, FPNG-approved software fixes for defects found in production.” Let me highlight three of these things there.
Folks around the agency knew what we were trying to do, but at some point, we needed to go and pitch it to the executive, the CIO and CFO. The main reason was we needed a signed document from them. This is very classic in the government, there are memorandums of understandings (MOU,) policy documents, procedure documents, etc, but we needed this document for two reasons.
One, there were a lot of folks not swimming the same way we were. We needed to be able to show folks the document and say, “Look, I have the authority to not go through the classic process. I can do this a different way.”
Secondly, we go through half a dozen different types of audits every year, and a large part of what the auditors are doing is reviewing our documentation to see how it matches how we make changes in production.
During this pilot, a few things came out. We saw that we’re collecting fees around the clock or around the globe, so we came up with this ‘no outage deployment pattern’.
We used blue/green.
The way that worked is, let’s say, we have two instances of FPNG running in our data center. One we call “green”; one we call “blue”. Green is version 2.0 and we want to upgrade to version 3.0 — we deploy that software to the passive side, where we’re able to kick the tires on it a little bit, and which we can do during the day.
Let’s say, America’s most famous inventor, Thomas Jefferson, is making a payment on the active side. He’s filling out his application. When we’re ready with version 3.0, we can just flip a switch at the load balancer, and traffic is routed to the blue side. Thomas Jefferson has no idea this is happening. We’ve rolled out the code to production during the day, and of course, if there’s an issue, we can simply flip back.
With the product owner, we actually kept him in the dark. Whether you call him the business or the product owner, by these I mean the accountants, or the finance folks in Patent and Trademark Office.
This was on purpose because I’ve seen so many failed attempts where you say, “Hey, we’re going to do DevOps,” or “We’ve got continuous integration,” to the product owner and the whole thing doesn’t really make sense to them. I think we wanted to do more of a show versus tell.
For me, I was actually presenting and our product owner was in the audience. He came up to me right afterward. I mean, I felt terrible. He was like, “Simmons, what is this blue/green thing you’re working on?” I said, “Okay, now is the time for the meeting.”
So, I wore my best suit, and I sold it. I explained the whole conversation to him, and he was like, “Wait, wait, wait, Simmons. Are you telling me that I can tell you what to do, and then you just do it the next day, and there’s no outage?” I’m like, “Yeah, exactly.”
I concentrated on the continuous delivery piece and the fact that he has control. He tells us what to do, and we’ll do it. So that made a big impact for sure.
Also important to keep mind, for the legacy system that used to be around, RAM, he was used to, “Hey, I want this change.” It would go in some backlog or however they did it, and then he wouldn’t see it in production for another six months, and when you did that, it was a 12-hour outage. This was a big change for him.
We collected a ton of data, maybe even too much data. I’ll run through some of it here.
We took all that data after that nine months and we went to the top floor and met back up with the CIO and the CFO, and said, “Hey, we want to expand the scope of the pilot. We don’t want to do defects any more. We want to do everything related to FPNG.”
Two interesting things happened during that meeting.
The first was, that the enterprise started to trust us. We weren’t these rogues in the government. I think the enterprise thought that we were bad boys, and we’re trying to cheat the system, and not go through the rigor, the same rigor that’s in that dinosaur tail. In fact, even our biggest supporter ended up being the director in charge of that release process who stormed out on our meeting. I think it was originally a misunderstanding where she thought I just didn’t care about the rigor, and I thought she was just married to the process. The truth was we were both passionate about rigor, I just wanted to do it in an automated way. When she saw that, she was good with it.
The second interesting thing was the CIO at the time decided that he wanted to expand the scope of the policy, so that it wasn’t just FPNG. We were going to write this policy in such a way that all 100-plus systems in USPTO could do it. If they got to a level bar, they could opt in not and have to do the dinosaur anymore.
It’s the federal government, so there will be no surprise here that we’re dealing with a lot of compliance, audit, assessments, etc. from NIST, Federal Information Security Management Act (FISMA), the Inspector General, OMB — you name it. But, we actually think doing DevOps and having the automation in this policy strengthens our posture. Some of the terms in these procedures are a little weird with the way the policy is written, because of how the auditors are used to seeing them. We think it’s going to make the audit easier because it’s not going about interviewing people, “Can you find the email where you said that you ran the security scans,” etc. It’s just going to be the run logs from the robot.
Of course, this particular financial audit is kind of like the government version of a SOX audit. It very much concentrates on production changes, who’s approving them, and how the approval is done to these production changes. Of course, DevOps isn’t a shortcut or a waiver. In fact, I think it’s harder to some extent, but it’s better.
I would say about a year and a half ago we started noticing that some of our build times were taking longer. The test runs were taking longer, and we needed a way to shrink down that time. We started to barely dip our toes into microservices. Our definition of microservices or the key term we use is ‘independence.’ We ended up having an independent repo, independent build, independent test, independent deployable artifact, and independent database schema.
We tried this just with one microservice. We had a new feature coming out where a customer could request a refund. After about a month I think, the developers started saying, “Why don’t we do this with everything?” At this point, we’re pushing 25 different microservices.
Before we were doing DevOps, in 2015, we were lucky to get a deployment to production once every quarter.
When we started the DevOps pilot policy in ’16. The numbers started increasing. Toward the mid part of the year FY17, we did phase two of the policy where we could do bug fixes and features. At this same point, we also did microservices. Last quarter we had 33 production deployments. If you knock out weekends, on average we’re doing two or three deployments a week, about every other day, and still moving in a good pattern.
Going through a few different challenges:
In just one of our microservices, we’ve had close to 1500 tests. They ran in seven minutes and 14 seconds. We then can take those same tests, and turn a knob on it, to run a performance test on it. Our SLA is at the 95th percentile. It has to be under a second. You can see that that’s well under. All this from functional to performance testing running in under 30 minutes. This historically would have taken us two months. Then we run security scans, and our policy states we’ve got to have zero criticals and zero highs.
Trusted by technology leaders worldwide. Since publishing The Phoenix Project in 2013, and launching DevOps Enterprise Summit in 2014, we’ve been assembling guidance from industry experts and top practitioners.
No comments found
Your email address will not be published.
First Name Last Name
Δ
You've been there before: standing in front of your team, announcing a major technological…
If you haven’t already read Unbundling the Enterprise: APIs, Optionality, and the Science of…
Organizations face critical decisions when selecting cloud service providers (CSPs). A recent paper titled…
We're thrilled to announce the release of The Phoenix Project: A Graphic Novel (Volume…