The following is an excerpt from a presentation by Anne Marie Fred a Senior Engineering Manager at IBM titled “Compliance and Audit Readiness: The DevOps Killer?”
You can watch the video of the presentation, which was originally delivered at the 2018 DevOps Enterprise Summit in London.
Think about the software development teams that you work with on a regular basis, how many would you say are deploying software at least once per year?
How about once a month? Once a week? Once per day?
Now imagine, and maybe it’s not so hard to imagine, that you’ve built a compliance and audit readiness culture and processes around deploying maybe a few times per year, and suddenly you’re deploying several times per day.
What kind of pain would you experience? That’s what I’m going to share about today.
- I’m going to talk about myself and the group that I work in, as well as our challenges around compliance and audit readiness.
- Then I’ll talk about why devops and continuous delivery can make these a little bit more difficult and then go into several aspects of compliance. There’s a pattern to what we learned.
A bit about me
I’ve worked for 17 years as a software engineer at IBM and in the last three as a manager. It’s very important to note I am not an attorney. I am not a compliance expert. I am not a consultant. You get what you pay for. These are my personal memories and opinions, and what I hope you will do is take them back, anything that you find interesting, and run them by your own people and see if they’re interested in trying it.
In the Digital Business Group, we have about 350,000 employees at IBM, and we have several hundred in the Digital Business Group.
What we do in our group is we manage IBM’s digital presence worldwide. That includes things like websites, pricing information, checkout, provisioning software as a service offerings when you order them, search engine optimization, analytics, developer outreach programs, like developerworks, and even conferences and events. As you can see, we do a great deal of customer facing work, but we don’t sell any products ourselves.
If you look at my reporting chain, you’ll see IBM at the top and then the Digital Business Group.
There are about 75 squads in DBG. If you’re not familiar with a squad, it’s basically an autonomous team with a clear mission, and they have everybody they need on that team to deliver on that mission. So, they have their own business owners, designers, developers, project managers, and so on.
And myself, I am a manager for four squads, and between these four squads, we’re responsible for roughly 150,000 of the web pages on IBM.com.
Everybody has to care about compliance and audit readiness
But in large enterprises, we have to do it at scale.
We have thousands of applications and services at IBM. For each one of them, we have to ensure compliance. We have to worry about all these things, and it can get a little bit overwhelming.
ATTEND THE DEVOPS ENTERPRISE SUMMIT
Some things that make this interesting.
- First of all, we have very frequent deployments. Any process that relies on doing something before you release is not going to work very well with continuous delivery.
- Secondly, we have very short-lived services. Anything that’s very heavyweight or cares about the specific IP address or location of a service or anything like that, it will need to change.
- Third, we have very few technical gates to production. Anybody can go get a free trial on some public cloud and deploy an application out there without asking you.
- Finally, we’ve blurred the lines between developers and operations. Those responsibilities that your operations team used to have, if you don’t have a separate operations team doing the deployments, you have to teach the developers how to take those on.
Think about something with me
If your CEO asked you today what applications and services you were running right now, would you be able to answer that question? How long would it take you to answer the question? How accurate would it be? How many things are deployed out there that you don’t even know about, necessarily?
What you need is an application and service registration system. We call ours the enterprise application library. It includes information like the system application name, the business and engineering owners, and other basic information.
If I could change one thing about our library it would be this
We had this library, and people saw this as a perfect opportunity to make sure that systems were compliant at their first release.
They said, “You have to register before you can release, and you have to be compliant before you can register.” The problem is that meant for some applications, like those that process personal data, it was taking six to eight weeks to register the application.
I think this is a mistake. You should make it very easy to register an application and then follow up on the compliance immediately after that.
Because what happens is developers are like water. If you make something difficult, they will find a way around it. People were very resistant to registering the applications.
Make it easy, and then make it very clear what your guidelines are for which applications need to be registered with your registration system. Finally, you want to have very clear people who are personally responsible for making sure that those applications are there.
For us, that’s the business owners and our HR managers.
One of the first questions on that registration form
‘What is your business continuity value?’
For this, we ask people to assess how critical the application or service is to IBM’s business.
We ask them to think about things like:
- If your service is down, will we lose money, cause any irreparable harm?
- Will we break our contracts or service level agreements?
- Will we even harm our company’s reputation or anger our customers?
Depending on your answers to these questions, you’ll get a score, 1-4, of what your business continuity value is.
If you have a high BCV score, that obviates the need for more caution in how you deploy your service. For those applications, we need to have offsite data backups, a disaster recovery plan that you’ve actually practiced and tested, an IT support workforce continuity plan, and so on.
Web and application security is obviously critical
You don’t ever want to be in a situation where at any time your systems are exposed to hacking, right? But, again, with frequent deployments, we can’t rely on manual processes to enforce this.
Fortunately, there’s a whole field of study in this now. It’s called Dev Sec Ops.
I also want to mention one thing that was particular to compliance, which is this GDPR secure by design requirement. Your applications need to be secure by design.
What does that mean? It’s an evolving area, and I think that we’re growing in our understanding of what that means to us, as well.
ATTEND THE DEVOPS ENTERPRISE SUMMIT
Here are a few things that we’re doing to make our applications secure by design.
- Education. Everybody in the company gets annual IT security education, and we’re actually tracking that they complete that. We have just a general level of familiarity with IT security across the company.
- Security Focals. As a best practice, we want to have a security focal in each squad, somebody who has a moderate amount of security training. They know what some of the common attacks are. What is a man in the middle attack? What are the things that I can do wrong on my web forms that will make me vulnerable to hacking?
- Experts. We have experts who are our security architects, and they work across several squads. That’s their life’s work, is IT security. I’ll show you in a second here how they get involved.
we do have a pretty extensive IT security standards checklist
Which we use for your first deployment into production.
This is a set of a couple dozen questions, and what we are doing is asking you to fill this out in order to educate the security architect on how your application works and how it’s secured.
Then you send that to the architect, and they review it with you. So that together, you come up with a set of remediation steps that you need to take. You get those completed, and then everybody signs off that it’s good at the first deployment.
Then we have to maintain our security on an ongoing basis
One thing that we’re starting to ask people is, “Have you considered the security implications of this change before you make it?”
This is early on in the planning process before you write a line of code. Just asking people to spend 10 seconds thinking that through.
Secondly, in our code reviews, we’ve trained our developers to think, “Does this have any security implications, this code change that I’m about to put in? Has somebody done something silly like check a private key into our source code depository?” (That never happens…) These are two good ways that we maintain this on an ongoing basis.
Another kind of fun thing that we do is periodic external penetration testing
We bring in outside consultants who try to hack into our systems on a regular basis. They write up bug reports for any vulnerabilities that they find or even potential vulnerabilities. Then they don’t just throw it over the wall. The nice thing is, they stay with us and help us fix them.
security automation tools
We have two classes of automation tools that are used pretty broadly. One is static code analysis tools. These are able to actually process any number of different programming languages and find common vulnerabilities or mistakes that people make in their code. These can run in every build, and they can fail your build and prevent a deployment that would make your application less secure.
We also have web crawlers like IBM AppScan Web that run against our production servers on a frequent basis, maybe daily. They are checking for common exploits and hacks. Again, they can create a report and automatically open the defect against us, and we can fix that very quickly.
A Quick Definition:
“Access control is the selective restriction of access, whereas permission to access a resource is called authorization.”
In a DevOps world, some of the access controls that we see frequently are API keys, IDs, and passwords.
Just a couple of rules of thumb that we find useful: we want to use individual credentials for any manual action so you can trace who made a change. This is great for audit or fraud detection. You never want to share a password, even amongst the team, because you lose that auditability and traceability.
In cases where it makes sense for something to last a long time, if people come and go, functional IDs are a really good answer there. We also have API keys that can be either long-lived or short-lived, depending on the account they come from. We actually have our manager set up the functional IDs and own those, and then they will just encrypt the secrets before they put them into our deployment pipeline.
ATTEND THE DEVOPS ENTERPRISE SUMMIT
A couple things that are special about GDPR
- It actually requires you to have solid access controls.
- It requires you to limit who has administrative access to your applications.
- You have to have a plan for what you’re going to do in case of a fraud or a security breach and how you can respond quickly.
- You need to be able to revoke access quickly when it’s no longer needed. You should have a documented process for every time somebody leaves the company, how do you revoke their access.
Global Privacy Assessment
A global privacy assessment is one of the few things that we do require people to do before delivering to production. What this is is another questionnaire about what personal data you collect. How do you process it? How do you store it? Who uses it and why, which is very important to GDPR, what’s the purpose of the processing. The access controls you put into place, what countries are involved in storing or processing the data, and so on.
The answers to this questionnaire are then reviewed by our legal and privacy experts in various countries, to make sure that we’re not breaking any local laws. The output of this, just like our IT security testing, is a series of actions that are required to comply with the laws, including GDPR.
We do have two fast passes through the global privacy assessment. One is for applications that don’t store or process any personal data at all.
Another one is for applications that only process a very limited type of personal data, which I’ve heard called pseudo anonymized data, which is basically data that are not personal in nature, but it’s a reference to a person. This is something like an IP address or maybe an internal ID number that we use to identify a person, that does not equal their email address. If that’s all you have is maybe some IP addresses in your logs, there’s a fast pass through this assessment. You can get through it in a day or two. This is the thing that was taking six to eight weeks if you do process personal data.
To be GDPR compliant, the good news is if you look at what we talked about earlier, it sets you up very well for GDPR compliance.
- You need accountable, easy to find owners for your applications and services. That’s your application registry.
- You need to have clear, documented security standards and compliance with sign-offs that those are completed.
- You need to have strong access controls.
- You need to do global privacy assessments,
- You need to have audit-ready documentation in case somebody comes and claims that you’re not processing their personal data or controlling it correctly.
You also need to ensure that the third party services you’re working with are themselves GDPR compliant. For example, anybody who’s a processor for us, we want to make sure that they’re GDPR compliant.
We also want to make sure they’re up to our IT security standards, and we do it through our procurement process. We are not allowed to pay for a third party service unless they have agreed to abide by our standards. This can make procurement take a longer time, but for us, it’s worth it. We’ve renegotiated so many contracts because of GDPR, and I think many people did as well.
Another thing that’s kind of interesting about GDPR is the data subject access requests (DSAR).
DevOps makes this a little bit more difficult, especially microservices. We have many services with many small databases. You can end up with a proliferation of personal data repositories.
How many of them are storing a copy of some data from the user’s profile, so they don’t have to look it up again later?
To address this, the first step was really to identify where our personal data repositories were. We started with our application registry and went from there.
We said, “Okay, here are a series of questions that will tell you if you’re a personal repository or not, yes or no. If you are, you’re going to participate in the DSAR process when it comes out.” Well, this was a pretty powerful motivating factor for people to get rid of extra personal data repositories.
A couple of other ways we did that
We took the profile data and we centralized it in one place. We said, “If you can, please rewrite your applications so instead of storing a copy of somebody’s profile data, you look it up from the profile service every time.”
Furthermore, on the profile APIs, they are asking you what is the purpose for which you are going to use this data? The profile service is connected to the consent service, so it can look up what kinds of processing each customer has consented to, and they will only send you the data that you’re allowed to use for that purpose.
Many of our services did that, and then they deleted their copies of the data. We also had many services that maybe had personal data for some kind of fluffy function that we really didn’t care about anymore. So we just got rid of some features. We even shut down entire services because they would have been too difficult to remediate. This kind of dovetailed nicely with this server consolidation project that we were in the middle of.
we have a fastpass for the data subject access requests
It’s for that pseudo anonymized kind of data, those IP addresses, and those internal ID numbers. It wouldn’t be very helpful if you asked a company what data do you have on me and then we said, “Well, what’s your IP address?” No.
In fact, even if we tell them what their internal ID number is, it’s not very helpful to them. It’s more helpful for us to say, as a blanket statement, “If you visited our website, we’ve logged your IP address, and we use your internal ID number across our systems to track your sessions and so on.”
The other thing that makes that easier is that we consider those types of data don’t require consent. We need to keep your IP address in order to keep our systems secure, to prevent a denial of service attack, to respond to fraud or security problems.
ATTEND THE DEVOPS ENTERPRISE SUMMIT
Separation of Duties
This is the practice of having more than one person who’s required to complete a task. Its intent is to prevent fraud and error.
With DevOps, you might not have an operations team to separate your duties to, it’s the same people.
So, separation of duties is required for some sensitive applications by law and by best practice, things like healthcare data who do actually still have separation of duties. But for things like websites, web servers, which is a lot of what we run, it’s overkill.
For us, it’s more about the spirit of the law. How are we going to prevent fraud and errors without actually having many people involved in the deployment process?
- First of all, we want to avoid breaking changes. The most important way to do this in continuous delivery is will really good automated testing coverage.
- Secondly, we have code reviews, as another way that we avoid breaking changes. You have to get another committer/owner of the application to agree to the change in the first place.
- Then we have accountability and traceability. If you remember, we talked about using individual credentials where it makes sense, or functional IDs where it’s automated in a build pipeline, so we can see which ID was responsible for each change. We also don’t allow manual changes to our production systems at all in our DevOps environment. It’s not possible to log into the systems and accidentally make a change, because we shut off remote access. Doing all this gives us traceability because we can see all the changes through our source code management system, which in our group is GitHub Enterprise. You have to make a change to the configuration management code in order to make a change to the production systems.
- Fourth is instead of preventing all errors, we just want to have a quick recovery from errors. We do that in a couple of different ways. One is good monitoring, right? We don’t want to wait until a customer reports a problem before we resolve it. We want to have monitors that are fairly sophisticated.
For example, on our web servers we don’t just have ping tests making sure the host is up, we also have tests that display the page and check for certain words on the page. We have visual regression checker tests. I don’t know if you’ve ever seen those, but it’s actually like an image of the page. It will raise an alert if the page has changed, so you can make sure that was intentional, and so on. And of course, we have our security checks on a regular basis.
Accessibility is the design of products, devices, services, or environments for people who experience disabilities.
IBM takes accessibility very seriously. We follow all of the standards like the World Wide Web Consortium and the Web Accessibility Initiative and then have our own accessibility standards on top of that. We don’t just require that the applications that we sell for government bids are accessible. Our internal standard is that all of our websites will be accessible, and even our internal documentation and training. This is something that touches all of us who develop software at IBM.
Now with DevOps, this is a little bit more difficult, because again we have frequent deployments.
In the past, the standard was that you did accessibility tests pretty late in the product development cycle when the UI was fairly settled, but this doesn’t work anymore. We had to develop processes that made this lightweight and fast.
Fortunately, we have a website that’s publicly available to everybody, which is www.ibm.com/able. I strongly suggest that everybody go out there and take a look at that website. There’s a lot of best practices for accessibility. As you can see, there’s actually a title on that page about how to streamline your agile DevOps processes. We have open source tooling that’s available for everyone to use, where you can check your source code and your web pages for accessibility.
Now automation can’t catch everything
The easiest example is probably screen readers. It’s hard to tell if something’s going to make sense when you read it or hear it through a screen reader unless you’ve actually done it. We do do very intensive accessibility tests. The first time an application is deployed, usually actually right after it’s deployed.
We’ll maybe spend a week where the whole team is finding any accessibility problems and fixing them. We also have periodic manual checks, where somebody will go back and sort of spot check pages and see if they find any problems.
Two other things that we do, similar to our security checks, is we ask people to think about accessibility when they’re doing code reviews.
Part of testing of a change to the user interface should be to actually bring it up with the browser plugins like Chrome DAP plugin. Bring it up in the browser with the plugins that are showing you if the accessibility looks good. Is your contrast good? Can you tab through the page? Etc.
Open source is another fun one with DevOps
Modern package management tools like NPM make it trivially easy to pull open source into your project. They’re very widely used in the DevOps environment.
Furthermore, if you just pull in one package, you’re usually going to end up automatically pulling in several other packages, without even knowing that it’s happening. We actually have different processes for software that we sell and internal use software. Most of what we’re doing continuous delivery on is internal use software. But not all.
For software that we sell, we actually have a complete sign-off, where they have to list for the release every single package and version and what its license was and was that approved by our open source standards committee. For internal use software, we actually allow teams to self-certify their compliance, and then we give them the tools to make that easier for them.
The first step in an open source process
Is to educate everybody in open source, right?
- First, everybody who touches open source in any way, which includes pretty much all of our developers but also their managers and project managers and so on, has to take annual open source training, so they understand the concepts and what they need to be looking for.
- Next, we have code reviews, and we tell our developers that one of the checks that they should go through is to see if there is a new open source package and see what the license is, and check if it is one of the licenses that we like to use.Because not every open source license is okay for commercial use. We have lists of license types that are generally fine, like MIT and Apache, and ones that are yellow flags that we need to have reviewed by our open source experts before you use them. Because whether they’re okay or not actually depends on how they’re used. And ones that are red flags, like proprietary licenses or ones that have gotten us in trouble in the past.
- Then we have a database of all the packages and versions that have been pre-cleared, blacklisted, or require a review.
- Finally, we have a code scanning tool. This can be automated in the builds, and it recognizes several different programming languages and environments. What it will do is is automatically find all the packages you’re using and all their license files, classify each package by its license type, tell you if it already has a license type that it knows is fine, if it recognizes that specific package version, if you need to contact Legal, or what you need to do.Another useful thing about this tool, if you configure it this way, anytime it finds a package it doesn’t recognize, it will automatically request a review of that new package or version via a POST request. Within a day or two, our open source team is able to tell you thumbs up or thumbs down on the package. This way you don’t spend weeks building something on top of a new package, only to find out that you need to tear it out later.
I didn’t talk too much about audit in particular
It was kind of woven in there. One thing that we have for audit specifically is documentation. One thing that’s great is to standardize the documentation for all the areas that we talked about as much as possible.
A very simple thing that’s working for us is Box folders. Boxes, it’s not an IBM company, it’s a third party company we use, that has secure shared cloud storage. We have folders that roughly mirror the organizational structure.
For example, there’s one folder for the Commerce platform, which is my boss’s level. Then there’s a subfolder for each squad in that area. Within each squad’s folder, they’re responsible for gathering up the compliance documentation that they need.
The individual pieces of documentation may have additional access controls if they’re sensitive, but at least this gives our managers and our project managers a very easy place to go, in case they have a request for audit.
Another thing that’s useful for audit is to use GIT for sign-offs.
If you’re requiring somebody to assert that they’re compliant to a standard, you can actually just set up a readme file in a GitHub repository, and they can make a pull request to sign their name to it. It’s a very auditable and traceable sign-off.
There is a pattern to all these different compliance things
- We have to discover what you have today. Get a sense of what’s out there right now. What are your applications and services, and what’s the current state?
- Educate everyone. Educate, educate, educate.
- Share best practices with each other. We had weekly calls about compliance and GDPR, and we learned a lot from each other as we went through this journey.
- Need to identify the responsible parties and hold them personally accountable for doing this.
- You need to plan remediation steps and get them planned as part of your agile process. These are regular stories for us with the known assumption that they were the most important stories for us to get completed quickly.
- Give people helpful tools to make it easy for them to get their jobs done.
- Report and track progress against your goals.
- Remove roadblocks. If you hear people complaining about something that’s too difficult, fix it.
- And automate to maintain compliance.