The following is an excerpt from a presentation by Mary Lee, Director, Security Product and Program Management at Salesforce, titled “Securing Your Software Supply Chain.”
You can watch the video of the presentation, which was originally delivered at the 2018 DevOps Enterprise Summit in Las Vegas.
I’m Mary. I’m a director of security, product and program management at Salesforce, and I primarily focus on security tools for developers. This includes static code analysis, open-source scanning, threat modeling, credentials scanning and security workflows for developers.
I’ve been into software security for 11, 12 years, and before that, I was an application engineer focusing on optimizing codecs for different processor architectures.
last year we had an incident
And our name was up there with malware being spread to 2.3 million users, can you imagine? You have to think that this attack not only attacked the software itself but the whole premise of the company.
It’s not just your software but the motto behind your company. The interesting part there was more research to indicate that this was going to include future features to keyloggers. Imagine the depth of the attack if it hadn’t been discovered at that point. The sad or interesting or exciting news is that this is not going to be the end. This article came out shortly thereafter. This spreadsheet and chart come from the 2018 State of the Software Supply Chain Security Report from Sonatype.
There are so many more attacks that have happened and will happen. Now, the thing that’s different is before if you had a zero-day attack, it’s specifically attacked your particular piece of software and your customers. If you had a phishing attack it would also attack a particular person that uses a particular service.
Instead, this type of attack is not as careful. It is taking out entire swaths of companies by changing one particular open-source component, knowing that a lot of groups will have automated bills that pull down components, etc. So, they’ll build it in and all of a sudden, it’s deployed to millions or billions of machines.
The Software Supply Chain
Let’s level set about the software supply chain and how it’s related and different to traditional supply chains.
I want you to image that it’s almost Christmas shopping time.
When you’re bringing in millions of bits to build a particular toy or product; if you’re onboarding those suppliers to your manufacturing chain, you go through a supplier chain review. You go figure out how they go about doing access controls, how they build components, and how they make sure that the person walking in the door to work that day is actually who they say they are.
We have to do the same level of reviews for the software components that we build into our software as well. It’s not just your software that needs to be secure but all the components you use should be secure as well.
The more difficult part here is scale.
We have hundreds of thousands of components that we build into our different components. For example, if we just pick a number like 800 JARs, all of those will map out to 9,000 transitive dependencies, and there’s no human that can properly review all of those 9,000 components on a regular basis in time for you to make sure that the software you ship is secure.
The other troubling part is that we have infrastructure as software. Not only is the software and the components of your software at risk, but Apache, MongoDB, DNS, are also susceptible to the same types of attacks.
In our Salesforce version of the software development life cycle, what happens is that our team is chartered to integrate the security development life cycle into the normal software development life cycle.
It has to be a partnership from the very beginning because this is not something that you can add on as a Band-Aid later. I called his co-engineering.
I know you understand the importance of open-source software, it is up to 80 or 90% of software. If you make sure that the 10% of that code you generate is correct or not vulnerable, you’re still leaving yourself open to a huge surface.
How does this affect you?
I wanted to emphasize infrastructure, which is the front line of security. This is where you have DNS, Bind, MongoDB. This is the basis of how people connect to our services.
For developers, this is the core of the services that we build and sell to our customers. The unknown security posture of all of these components affects both administrators and developers.
One of our key components of the security development life cycle is that every engineer is responsible for security. It’s not just a security team that says, “Hey, did you check this? Hey, did you design it this way?” But as an infrastructure person, as an IT person, as a developer, you have to take into account the security and legal risks that you take when you incorporate these components.
Let’s figure out what’s really inside
When you talk about food, you may sometimes wonder, “Is it organic? Is it grass-fed?” Some people may want to be vegan or vegetarian, etc. but you care about the quality in the ingredients of the food that you eat.
When you’re building your house, you may ask, “Well, if I’m going to replace this roof, should I get one with a 20-year warranty or a 50-year warranty?” These are all the types of questions that we ask when we build houses.
I’m a car person. I like knowing that I picked up my car because it has rear-wheel drive, mid-engine, two doors. These are the things that I care about.
Yet, during many of the security reviews that I do, I find open-source components that haven’t been updated since the day they were incorporated. Everything needs maintenance— the food that we eat, the houses that we live in, the cars that we drive. Software should be considered the same.
How do these attacks work?
The tricky part is looking at these attacks and seeing how they’re different than other software attacks.
When you have infrastructure as code, this is something that will attack at a very low level. It doesn’t matter if your service is the most secure on the planet, if you can attack at a database level or a web server level, you can basically bypass all of the other texts at a higher level.
When you have code that you’re integrating from these open source components, if there’s a typo, if you have a person that takes over a GitHub account that’s been deleted— these are things that other people can use to pose as attackers and potentially delete components you depend on, and then potentially even cause outages. These are things that you have to be concerned about when you’re relying on third-party components.
The worst case is CCleaner. You’re distributing malware to millions of customers and people, which is a big brand hit. That’s a very difficult price to pay when you’re trying to be as agile as possible.
Another tricky part with open source is also patching. When people find different issues, you not only have to report it to an external party, you have to also make sure that they care about those security vulnerabilities because it’s highly likely that they will not want to update it at all.
As a second step, the difficult part with upgrades is there are breaking API changes between minor versions and major versions. If you update frequently, you have a smaller code difference to update and make sure that it keeps working, but if you have a major version, like I did with a recent security review when they were upgrading from HBase 2 to 3, that wasn’t something that they were willing to do in the short amount of time they had before release.
Not only are you impacting functionality but also security.
Finally, there are so many different types of attacks that you have to make sure that the sources that you’re upgrading from are actually also secure. Again, you can use hashes and figure out that the binaries that you’re downloading are actually the versions that you expect to be installing.
Of course, you have to balance all this out. One of the things that I learned from working in Salesforce is that they prioritize in this order:
Frankly, I’m happy that security is in the top three, but I also have to battle the fact that there are teams that will use whatever programming languages they have available to be able to get code and services out as quickly as possible.
How do we fix the problem?
What are we going to do about this? This is the painful part of what we used to do.
Self-attestation— a lot of teams and companies do this. You have to request approval before you can use these different components. Each approval might take four to six weeks because some security engineer has to search on Google, NIST, MITRE, and look for these different components and potentially to do code review. Then for the ones that are higher risk, potentially even do static code analysis. All of this takes time.
Then we have to hand off to legal. Initially, they’ll wait for security to do these reviews first because they think that there’s more likelihood that security will find issues before legal will. Legal has a pretty straightforward matrix of what licensors are acceptable, but it changes based on whether it’s distributed or modified.
Those are the challenges that we have to face both from a security perspective and a legal perspective.
Our mutual, bright, shiny futures — we’re going to talk about what automation looks like crawling, walking and running, but first, we’re going to slither. This is important because you may not even know the landscape of your software infrastructure and environment out there. The key focus here is on awareness because you are trying to learn why people are using the different components they’re using, how often they update, if they never update, etc. to learn more about their capabilities and their release frequencies.
It’s important to ask them to consider the security and legal ramifications of the components that they’re using before they start developing. The additional risk here for legal is that we have compliance requirements. They have to report this out, and if they don’t know about a component, then we take on other risks as well.
The worst part here is about the workflow for open-source ownership. Let’s go through this.
We talked a little bit about how we get patches and verifications out the door. You have to ask, “Does your open-source component have an owner?” It’s like, “If yes, great, let’s update.” If they don’t have an owner, do you want to own it? This is code that you have integrated into your environment to be agile, and yet if they aren’t willing to take the security ownership of these different requirements, then you have to ask yourself, ‘do you want to take that on? Do you want to create a branch? Do you want to fork it and maintain those patches yourself?’
If that’s not something you’re willing to do, then maybe you can take that open-source component and replace it with something else, or maybe you’ve decided that you no longer need that component. You can get rid of it. These are all best-case scenarios.
At any point along the road, you can have a no that will require weeks or months of research to figure out if this is a risk that you’re willing to take on.
The way we started this was by looking at what was in the source-code repository. Because Nexus IQ Server has a command-line interface, we were able to quickly figure out how we can scan all of this with every single type of source-code repository management system that we have. This was really helpful because it didn’t need any level of integration. We can just sign up as developers in all of these different orgs and start downloading the source code.
Another great feature was the REST APIs. Here, we are able to use this functionality to pull the vulnerability data, the recommended versions to upgrade to— all of the useful information to convince developers.
This particular jQuery vulnerability can cause a directory traversal or remote code execution. It gave them the reason for why we were asking them to do this work, so this capability of bringing the information about vulnerability straight into our bug management system using the REST APIs was really key in helping us close down these bugs and have a 100% fixed rate. It totally surprises me, but I was amazed and happy, so I’m going to give all the credit here.
It’s really helpful to put the bugs where the developers are already located.
Having them log into a separate UI with a separate tool that they have to go into a separate data center to log into isn’t going to work. Every single click that you add is going to reduce the capability of developers being able to fix these things.
One of the things that we were able to do with Nexus IQ Server is now integrate with build.
We’re challenging our build and integration systems to use this on a regular cadence so that they don’t have to rely on people like the security teams to come scan, it’s already there.
Every time you have a build, you are downloading dependencies, or you have dependencies in either your Nexus Repository or Artifactory, etc. you can integrate all of these different steps straight into existing workflows and not have to worry about covering extra steps. You don’t have to worry about people downloading new stuff because it’s already going to be there.
The other benefit here is that you don’t have to worry about integrating with every single repository management system either. All of that you could integrate into your build and not have to worry about it. At this point, we’re walking, and we’re covering more of our software supply chain.
This is also where we can integrate security into the software development life cycle. Instead of having them do extra steps, it’s exactly where they already are.
One of the things that we are able to do with Nexus Firewall is integrate it with Nexus Repository, where everyone keeps all of their JARs and dependencies. So instead of having to have people fill out a request to say, “Hey, I’m going to be using this particular open source by this date” and taking four to six weeks to do this.
As soon as you put it into Nexus Repository, it will do security and legal scanning and give an immediate answer. Instead of saying, “What form was that I needed to fill out? Who do I need to talk to?” They don’t have to worry about that. It’s all integrated into existing tooling that they use.
The other benefit is that we have consistent developer environments. Most of our developers are using Eclipse or IntelliJ, and as soon as I refer to this library, it’ll also do scanning at that point and figure out, “Should I be using this version or not?” We don’t have to ask.
I’ve worked with teams where they are developing three different open-source components, and they asked for this functionality up front. They’re becoming more aware of the risks that they’re taking, and they would rather use a library that is more secure than not.
The other cool part here is integrating with Puppet and Yum. We use a lot of this for our deployment to production.
Because we can scan at this point, in case any other part fails, you have this last resort. I remember taking a kickboxing class at one point, and the instructor told us about a 200% solution. If someone’s coming at you at with a punch, you can choose to block or duck or you can do both. By scanning and integrating at these three different parts, you really increase your coverage of the whole software supply chain.
Now we have better coverage. We’re looking at it from the source code side, the build side, and the deployment side. From a security perspective, we’re closely integrating to have this co-engineering partnership with developers directly.
- 5 minutes to scan via 25 days to manually review: When we were able to do our first scanning, something that used to take 15 minutes per, those 800 JARs ends up being something like 25 working days, and this is the best case scenario. This time will easily increase if you add manual code review, static code analysis, and all these other steps that you can take. Scanning all of these takes such a short amount of time that it was well worth all of this effort. You literally buy back time. Not only that, think about all the extra people that you would have to hire and all the buildings you have to build, etc. It just makes way more sense to use automation like this.
- Comprehensive view of security risk: This is something where not only are you integrating into build, you’ll find that what people were self-attesting to is not quite the complete picture. When you integrate to build, you know exactly what binaries you are pulling in, when they’re doing this, how they’re doing this, which ones, and so you have a much better idea of what your risks look like.
- Continuous scanning: When you do manual code review, we only did review at intake. Again, similar to how when you take in open-source components eight years ago, you never looked at it again. Now, you have continuous scanning that you can do on a regular basis whether that’s daily, weekly, per release. At least now you know. I’ve asked teams to basically build in a user story for every one of their releases to review their open-source components. But make sure they give themselves time to do this work so that they’re not surprised when security comes to them, I come to them, build comes to them and says, “Hey, guess what? You’re building on insecure, vulnerable libraries, and that’s going to cause problems.” Help them plan. Help them be aware of the risks that they’re taking and then give them opportunities to plan for it, so they can also properly staff up, schedule less time, or give them the opportunity to be able to do this in an easier way.
- Single list of legal risk. This is also really important because at some point when you are only scanning or only reviewing at the top-level component, you don’t know about the thousands of other transitive dependencies that there were. You’ll have a much better view of the actual legal risks that you’re taking potentially.
Automate or Die
(This is a joke)
You can’t scale. You can’t expand. You can’t increase your coverage. You can’t reduce return to market. You can’t win customers. You have to automate.
I would say you also have to automate the right tool with the right tool. Our company had tried different tools before. For example, they had been trying to use different open-source detection software for at least six years. They’re really pleased with the type of feedback that we can get in a really short period of time.
- Ask questions: One of my favorite quotes, ‘No one knows everything. You just have to keep asking questions.’ Ask the five whys like, “Why are you doing it this way? What components are you using it? When are you incorporating it?” Just keep asking questions.
- Verify the integrity of your software supply chain: The point is to verify your whole supply chain, from beginning to end, all the components that you use, all the different people that are being integrated into the process whether it’s 5 developers or 50 teams.
- Increase your awareness of the threats to the software supply chain: Start with just awareness at first. You may not know the scope of the problem that you’re facing, so keep asking questions. You want to look at not only your software development life cycle, but your infrastructure as well. There are attacks that are going to come in with open-source components at every single level. For both developers and for IT administrators, it requires partnership. Right now, I am primarily on the software application layer side, but I’m working closely with our teams that are scanning at the endpoints. We make sure that we see the right things, and we notify the right teams.
- Pilot with manual workflow: You can try it with manual and then see what the workflow looks like, but then go build it directly into the developer and administrator workflows, so that everything can be easily automated.
- Everything needs maintenance: My car has got over 100,000 miles on it because I can’t find another two-door, rear-wheel drive, mid-engine car to replace it with, so I make sure it’s in great shape. I change the tires. I replace the suspension. Treat your software the same way. Everything needs maintenance. Look at the components that you’re using. Determine if it has a life cycle. One of the other functionalities that Nexus IQ Server can do is just say, “This thing’s six months old. Maybe someone should look at it.”
Overall, however, the greatest part of automation is time savings. We were able to take 800 JARs and look at it and say, “Okay, only 50 of them need to be upgraded.” Then we were able to work with the research team at Sonatype to figure out, of those 50, these 15 don’t have upgrade paths, what can we do? Those 15 are the only ones that needed manual code review. That’s a huge time saving compared to the 800 that we initially started with.