This case study was excerpted from the second edition of The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, John Willis and Nicole Forsgren, PhD.
HMRC, Her Majesty’s Revenue and Customs, is the tax collection agency for the UK government. In 2020, HMRC distributed hundreds of billions of pounds to UK citizens and businesses in an unprecedented financial support package that would eventually see around 25% of the entire UK workforce supported by public money. HMRC built the technology to do this in just four weeks, under conditions of incredible pressure and uncertainty.
HMRC’s challenges went way beyond aggressive timescales. “We knew we would have millions of users, but nobody could actually tell us how many. So whatever we built had to be accessible to everyone and had to be capable of paying out billions into bank accounts within hours of launch. It also needed to be secure, with checks being conducted before money was paid out,” says Ben Conrad, HMRC’s Head of Agile Delivery, at the 2021 DevOps Enterprise Summit-Europe.
And they nailed it. All the services launched on time, with most launching a week or two ahead of expectations without any issues, resulting in a 94% user satisfaction rating. HMRC went from being the least popular of all government departments to the people customers relied on to help out.
In order to achieve this amazing result, HMRC adopted some key processes and leveraged a mature digital platform, one that had evolved over the last seven years to allow teams to build digital services rapidly and deliver them at hyper-scale.
HMRC’s platform (the Multichannel Digital Tax Platform or MDTP) is a collection of infrastructure technologies that enable the organization to serve content to users over the internet. Business domains within HMRC can expose tax services to the public by funding a small cross-functional team to build a microservice or a set of microservices on the platform. MDTP removes much of the pain and complexity of getting a digital service in front of a user by providing tenants with a suite of common components that are necessary to develop and run high-quality digital products.
MDPT is the largest digital platform in the UK government and one of the largest platforms in the UK as a whole. It hosts about 1,200 microservices built by more than two thousand people split into seventy teams across eight geographic locations (although, since March 2020, the teams have all worked 100% remotely). These teams make about one hundred deployments into production every day.
“The teams use agile methods, with deliberately lightweight governance, and they’re trusted to make changes themselves whenever and as ever they see fit,” says Matt Hyatt, Technical Delivery Manager with Equal Experts. “It only takes a few seconds to push changes through our infrastructure, so getting products and services in front of users happens really fast.”
Pivotal to the success of the platform has been a constant focus on three key things: culture, tooling, and practices. MDTP’s goal is to make it easy to add teams, build services, and deliver value quickly. The platform has evolved around that goal so that a cross-functional team can be spun up quickly and then use the common tooling to design, develop, and operate a new public-facing service.
In practice, the MDPT provides a place for a digital team’s code to live and automated pipelines for the code to be built and deployed through various environments and into production, where the team can get rapid user feedback. Common telemetry tooling is configured to enable a team to monitor its services via automated dashboards and alerting mechanisms so that they always know what’s going on. The platform provides teams with collaboration tools to help them communicate, internally and between one another, so that they can work effectively, both remotely and in person. This is all made available to a team more or less instantly, with minimal configuration or manual steps required. The idea is to free the digital team so it can focus solely on solving business problems.
With over two thousand people making changes, potentially several times a day, things could get very messy. To avoid this, MDTP uses the concept of an opinionated platform (also known as a paved road platform or guardrails). For instance, if you build a microservice, it must be written in Scala and use the Play framework. If your service needs persistence, it must use Mongo. If a user needs to perform a common action like uploading a file, then the team must use a common platform service to enable that when there is one. Essentially, a little bit of governance is baked into the platform itself. The benefit to the teams of sticking to the rails is that they can deliver services extraordinarily quickly.
But the benefits don’t stop there. By limiting the technology used on the platform, it’s far simpler to support. Moreover, digital teams are prevented from spending time rolling their own solutions to problems that have already been solved elsewhere.
“We can provide common services and reusable components that we know work with all the services. It also allows people to move between services, and, indeed, allows the services to move to new teams without worrying about whether our people have the required skills to do the job,” says Hyatt.
Another key differentiator is that MDTP abstracts the need to care about infrastructure away from digital teams, allowing them to focus solely on their apps. “They can still observe the infrastructure through tools like Kibana and Grafana, but none of the service teams have access to AWS accounts themselves,” Hyatt notes.
Importantly, the opinions of this “opinionated platform” can and do change according to user needs and demands, with a focus on self-service. “A service can be created, developed, and deployed on our platform without any direct involvement from platform teams at all,” Hyatt says.
MDTP plays an important role in enabling the rapid delivery of the COVID-19 financial support package, but this isn’t the whole story. A herculean effort by the team was still required. Good decision-making and fast adaptation of processes also proved vital.
“The most important aspect of delivering a system at speed is the ability for engineers to ‘just get on with it,’” says Hyatt. A combination of balanced governance, established best practices, and a mandate from HMRC teams to use proven tooling meant they weren’t risking new technology problems or security concerns that would have ultimately delayed delivery.
To meet the aggressive timescales, the teams also adapted their composition and communication. A platform engineer was embedded into each COVID-19 digital service team to safely “short-circuit” existing processes. This allowed the teams to eliminate risks early and maximize collaboration, particularly on requests for new infrastructure components and for help with performance testing. Those working on the key services were also made easily identifiable via collaboration tooling, like Slack, so that requests for help could be prioritized, not just by the platform teams but by the whole two thousand–strong digital community.
The ability of the teams and business leaders to adapt so quickly in unison and then flex back into place once the services were launched is a great demonstration of the mature DevOps culture at HMRC.