100M+ Developers, Security, and AI - My Journey from DevOps to an AI-assisted Future

Log in to watch

Las Vegas 2023

100M+ Developers, Security, and AI - My Journey from DevOps to an AI-assisted Future

DevSecOps may be an overused buzzword today, but adoption of the methodology has led to stronger cross-functional collaboration, faster delivery times, and more secure software being released into the market. From day one, agility, fast iteration cycles, ship-to-learn, and DevOps practices have been core to how GitHub ships and operates its products. Like much of the industry, we’ve also been on a journey to build in security at every step. As a developer focused company, our approach to this keeps the developer at the forefront of our decisions, innovations, and product choices. The software development industry is also at an inflection point: it’s estimated that by 2027, 80 percent of all code will be written by AI assistants. As more and more companies adopt AI into their development processes, DevSecOps is more important than ever before. But evolving existing practices to accommodate AI has a seemingly counterintuitive starting point. Hint: it isn’t the security team. GitHub is the home to 100M+ developers and I have the privilege of leading the team that keeps the platform, product, users, and customers safe. But my DevSecOps journey started long before that as a technical leader at NSA. In this talk, you’ll learn a bit about my journey, how my approach to Security and DevOps has evolved through experience leading agency-wide technology initiatives at NSA to protecting the home of all developers at GitHub. You’ll hear about how DevSecOps and Security work at GitHub and how AI is impacting developers through capabilities like GitHub Copilot, how AI is evolving the Security space, and suggestions on how to move security into an AI-assisted future.

Chapters

Full transcript

The complete talk, organized by section.

Jacob DePriest

Thank you, Gene, and thanks to Gene, Marguerite, and the rest of the IT Revolution team. This has been a great event so far, and I'm really looking forward to chatting with you all today.

I'm going to wait just a second until the notes and the slides come up.

Okay. As Gene said, we're going to talk a little bit today about how we got here, the DevSecOps journey at GitHub itself, and then we're going to shift into security and AI a little bit.

I don't normally talk about my story in talks like this, but this is the DevOps Enterprise Summit, and our journeys are so much about what brings us together as a community and the conversations we're having. So I want to spend a couple minutes to talk about that a little bit, and it'll set the stage for what we're going to talk about later as well.

As Gene mentioned, before GitHub, I spent 15 years at the National Security Agency. I started out in computer and electrical engineering, building real-time software-defined radio systems, frameworks, tools, and always had a passion for developers and creating tools that would help them be more productive.

I was able to do a lot of really, really cool things there. And much like the admiral yesterday, most of them I can't talk about today. But I did get to spend a lot of time working in the developer and DevOps space.

I did this software-defined radio thing for the first eight or so years, and then I got the opportunity to do a stint overseas where I got to spend time learning cloud and Kubernetes and containers and service-oriented architecture. And it was a blast.

I've always been passionate about open source software. I released my master's thesis work in grad school open source. This was way before GitHub and GitLab and all those things. We open sourced the SDR, the software-defined radio framework we built at NSA. It's always been part of my journey.

So when I got back from the overseas stint, I started working on making open source software better at NSA. I did a talk on this a while ago at OSCON, so I'm not going to repeat it here. But the gist was I was working with a team to figure out how to make it easier for NSA employees to contribute to open source projects and then how to release open source projects.

And this is more complicated than you might think, A, because large government organization, and B, because there's a lot of intellectual property challenges with U.S. government employees, a lot of equity concerns when you think about the types of things we were developing as an agency. But inside, the devs are just as passionate about open source software as in industry.

Can we scroll down a little bit? Sorry. Awesome.

So when I was working on this, one of the things became really, really apparent. We needed a centralized developer experience team. Without that, improving open source software processes were just a half measure. So that led me to co-founding a program called DevEx Inside. And as Gene mentioned, if you were attending DOES 2021, you may recall Virginia Lorenzano giving a talk on this.

We were intrapreneurs: Virginia, myself, one of our colleagues named Paul. We were doing entrepreneurial things inside a large enterprise. We built a pitch deck. We estimated headcount. We built a financial and contract plan. We developed a five-year roadmap. We went to execs all across the agency and pitched this plan, looking for money and people. And it worked.

It resulted in us building an effort spanning five teams, including the DevOps pipeline, developer security, productivity, and over 60 people. We delivered a full DevEx suite to NSA, and we did it at lightning speed. Well, at least for inside the government, lightning speed. It was like six or seven months, which is still really fast.

And along with this, I was still advocating for open source software as the senior executive sponsor for open source.

From there, I went on to help work on an agency-wide effort to design a new internet-facing IT cloud service platform. This was at the beginning of the pandemic. Telework was a big topic, as you might imagine. And historically, NSA operated on classified networks in air-gapped environments, right? Makes sense.

But the evolving NSA mission required increased access to information, software development, cybersecurity collaboration. We needed someplace we could work with industry in the unclassified space on the internet-facing side.

So I was the technical lead for an agency-wide effort to address this. I was responsible for a lot of the cybersecurity controls, risk trade-offs, and cloud security architecture, and applying the NSA-specific controls to those services.

So why do I share all this? Two reasons.

I'm in security today, but my path to this job has been as an engineering leader focused on DevOps, focused on providing software development services to developers.

And the second reason is the leadership journey. When I was talking with Gene, preparing to come here, we talked a lot about leadership and learning how to make risk and business decisions. I was fortunate to receive amazing leadership training in the government as a GS employee. And then, as a senior executive, I worked for some of the most senior leaders at NSA to understand risk at a global scale and then figure out how to apply that to daily technical decisions that we were shipping every single day.

And I was able to learn a lot from DevSecOps to DevOps to security controls and risk management.

So that brings me to GitHub.

About two and a half years ago, I decided to spread my wings after 15 years of service and joined GitHub as a VP of Security Operations. It combined all the things I loved: open source software, supporting developers, security.

So I ran security operations for about a year and then moved up to run the day-to-day of the security department as the Deputy Chief Security Officer.

A little bit about organizational structure here, because I think it's fascinating to set the context for some of the things we're talking about. My boss Mike is the SVP of Engineering and the Chief Security Officer, and he reports to the CEO. So he's responsible for all of engineering and security.

My peers underneath him are the VPs responsible for Copilot, Advanced Security, github.com, pull requests, Actions, all the things you think about when you think about GitHub. And this has been a really interesting construct because it's allowed us to work even closer on planning security remediation. And it's a joint team focused on creating great security and engineering outcomes for our customers.

So day-to-day for me, usually that looks like running the day-to-day of security, operations, governance, risk and compliance, product security, security research. So we keep the product you see safe and the infrastructure and employees behind the scenes safe as well. And it's a really close partnership with engineering, revenue, and product. And I get to talk to customers a lot too, which is one of my favorite things to do.

Can we scroll again? All right, we're just going to wing it. It'll be fine.

So let's talk a little bit about DevSecOps at GitHub and transition into the security part of this. Before we dive in, though, I think it might be helpful to set a little bit of context about GitHub's journey here. Because a lot of people, when they think about GitHub, they think about the code repository side of it. But let's go in the wayback machine here.

You won't be surprised to know that GitHub has been leaning into DevOps principles since day one. Some things have changed, but a lot of things have stayed the same.

Way back in the beginning, 2011, 2012 timeframe, we did not have managers at GitHub. There was very little overhead planning that was happening. But async was really how we worked then, and it's still how we work today. ChatOps still reign supreme at GitHub. We used something called Hubot for many, many things inside of GitHub, and that was true back in 2011.

We had about two million repositories at the time. And as we talk about DevOps principles and value streams, some of the core principles that were true at the time were still true today.

What are we here to do? It was about developer happiness, both for the GitHub developers that worked there, but also for the devs around the world. Work we did was visible. It was in pull requests, it was in issues, it was in Markdown files.

There's always a question still today at GitHub: if you're in a meeting and somebody asks a question, almost inevitably it'll be followed up with, "Is there an issue to that? Can you send me the link?"

We do it in public, and things like reducing batch sizes are synonymous with GitHub Flow and working with GitHub.

Okay, let's fast-forward a little bit around this time period, about in the middle. Scale-wise, we're at about 21 million repos, six million developers. We really started leaning into continuous integration inside of GitHub, feedback loops, learning culture. We did introduce managers around this time. We're still not managerless. I'm a manager.

But the site and the product were shared responsibility between ops and dev. And that's still how it is today.

On the security front, we didn't have GitHub Advanced Security. We didn't have a lot of the cool stuff we have today. But we did internally start to build our own tooling. We built something called Sentinel, which was an internal static analysis tool, and that was starting to run against how we built and shipped software inside of the company.

Now, our AppSec team was still doing manual reviews and security reviews and things like that, which we still do today. But we were trying to automate as much as possible with dependencies and security scanning. And this is really when the security ethos began.

And it's really focused on three things: high signal to noise; present those info directly to developers in their flow. So for us, that means right in a pull request, right in an issue, in the right repo where the developers are working every day. And then do as much as you can in automation. Run it as part of CI.

Okay, let's fast-forward again. 2018, Microsoft buys GitHub. Many of us thought the world was coming to an end when this happened. Not going to lie. But it's turned out to be a huge boost for GitHub and for developers everywhere.

In this timeframe, we have about 31 million developers using the platform and 100 million repos. GitHub acquires Semmle, acquires Dependabot, and starts investing in secret scanning, which is turning into what GitHub Advanced Security is.

Now let me give you a quick example of the integration of how we think about the developer life, DevOps, and all those things. We believe deeply in using our product in our daily work.

So when you take something like identity and access management, we're like, "Well, how do we do that at GitHub?" Well, we use repository-based access control. It's GitOps. It's a pull request.

So let's say you've switched teams internally to GitHub and you need access to the data logging service for your new team. Well, you find the right folder and the right repo and the right file, and you add your handle to it. You just open a pull request. And then, depending on how sensitive the thing is, it might require more approvals or not. But then it gets approved and you move on. You have that access. It's all automated.

And by the way, our auditors love this because it's all super transparent as well.

So on the security side of this, this is when we really finalize GitHub Advanced Security, what it is today. I'm not going to spend a bunch of time on the product, but the high level is it's code scanning for SAST with 10 languages, dependency insights. Our GitHub Advisory Database is now one of the most popular CVE places for disclosure right now. And then secret scanning with push protection.

And so this is really where we are at in the Advanced Security space, which is all those things are also free for public repositories on GitHub.

So that brings us to today: 100 million developers plus on the platform, 330 million repositories, 3.5 billion total contributions on GitHub. Where do we go from here?

We all know the age of AI has begun. We've had some great talks on this. I'm not going to be able to even come close to Patrick's talk yesterday giving the landscape of AI, but this is where we're at.

And so when Gene and I talked, we spent a lot of time discussing how generative AI can make developers more productive. In fact, I told him I want every developer inside GitHub using things like GitHub Copilot because it not only saves them time, it's the best way for them to avoid simple security errors and spend less time on boilerplate code.

So we're going to talk about AI and security. And because I know it best, I'm going to frame it in the context of what GitHub's doing as a reference here.

Developers can't just build software anymore. It's not enough, right? We have to build it accurately. It has to be done scalably. It has to be done quickly, and it has to be done securely. And AI is helping in this space.

In-terminal pair programmers are changing the game, and this is just the beginning of this. GitHub has something we call GitHub Copilot. It's an in-terminal pair programmer that sits there with you, drawing context from the editor that you're working in, and it will autocomplete and provide pretty complex algorithms, sometimes test cases, things like that.

Another major win for pair programmers is AI-based security filtering. This is still in its infancy, but the way this works is before the suggestions even show up in the editor with the developer, they're being filtered through a security lens to be able to weed out insecure suggestions from the beginning.

And if we fast-forward a few years to think about how this is going to get better and better, I think we're going to see some really amazing security capabilities working to the left of the developer before they even see the code in the terminal.

Okay, beyond the editor, what else is AI doing in this space? ChatGPT is bringing a lot of capability here as well, and a lot of the other models as well. On the GitHub side, we're leaning into the ChatGPT-4 world.

So you can see on the screen, one of my colleagues has some Python code up. It's supposed to parse expenses. It's not working. So we asked Copilot Chat how to fix it, and it went by really fast. But not only did Copilot provide the fix, it explained what it was doing and why it did it.

And then after development, if you get out of the editor even further, our CodeQL and research teams are leaning into AI and large language models as well. We've been applying it to how we build our SAST tools.

In our case, CodeQL needs to be able to scan and recognize APIs as sources and sinks to be able to work when it's looking at your code. And there are thousands of open source packages out there that have to be modeled. We were modeling them manually, looking at what was the most popular and the most used in the marketplace, and modeling those so that it would be supported in the tool.

Well, AI and LLMs are drastically speeding that up, and it's resulting in fewer false negatives.

Okay. But you run the security team, right? Jacob, why are we talking about all this?

As a security leader, I have to invest my time in the products that our customers care about, and that's all our products, not just the normal kind of what you think of as the SCM tools, right?

And right now, I spend a lot of time talking to customers about AI and about Copilot and things like this and about the security around it. And a lot of it is because the numbers you see on the screen, it's really having a huge effect on developers in a positive way. Developers are going faster, they're using it more, and they're feeling more fulfilled.

But what are the things I think about as a security leader? How do I approach this with my security hat on?

So when I'm evaluating a vendor for leveraging AI inside of GitHub that we would use internally, for instance, intellectual property is one of the biggest things I think about first.

Let's start with outbound. This is intellectual property leaving your company. So code that would be sent to the models, for instance. Who owns the prompts that your employee sends? Does the model get trained on your data? How is the model learning? Where's the telemetry? Where's it being stored?

Anyone who's building AI products, just know that organizations are going to be hypersensitive about how you're using their data to improve your models.

Wait a minute, I think I missed something there. Can we scroll down a little bit? Okay.

Before we go on to outbound, one thing I'll say, like on the Copilot side, the way we're thinking about this is that we decided not to incorporate any of the code that our customers send us into the models at all. We're just not doing it. So we don't retain any of the prompts, the contextual info from the editor, and any of the suggestions that we send back. We don't retain any of that.

And that's the approach we're taking. I'm not saying that's the approach every company should take, but that was our approach in how to deal with this particular challenge of how to think about intellectual property and protecting customers' code that they send us.

On the inbound side, this is IP coming to your company. Suggestions from AI. We're called the GitHub Copilot. The autocomplete side uses a model that's trained on natural language, text, and code from public sources, including all the public code on GitHub. And then it uses that to suggest code to the developer.

But what happens if Copilot suggests something that is already in another license or somebody else's use? First of all, this is generative AI. It's new code. So the core concept here is that it's about the code that Copilot produces, not what it's necessarily trained on. But what if it accidentally does produce something that matches?

So we have something called code referencing that is a filter that, if enabled, it will look at all the suggestions that are coming in and the surrounding code of about 150 characters or so, and any matches, along with information about the matched repo, appears in the suggestion for the developer to look at.

So it's kind of putting the power back in the hands of the developer to make a decision. Now, they may see that that reference match was from a license that's incompatible with the work they're doing, and they can choose to say, "Okay, we're not going to accept that." Or maybe it is compatible, or maybe they just want to learn from it.

So that's great. But really, really what happens if a developer just clicks through and doesn't pay attention to that, and I end up with code in there?

Again, this is something we're doing as GitHub, part of Microsoft, but our customers are entitled to IP indemnification from GitHub for unmodified suggestions if all the guardrails and filters are turned on. It's part of our commitment and our belief that this is really changing the way things work and making developers more productive.

Well, how safe are they? What about traditional infrastructure risk?

We have to treat these the same way, with at least the same level of rigor that we treat all the other services that we protect every day. And so for us, Copilot is fully in scope to our security program: vuln management, bug bounty, incident response, things like that. We treat it with the same level of seriousness and security that we would treat pull requests or where we store the code.

What about other things that we hear about? Should we wait until AI is more established? Will this hurt my team? Will they not have enough to do once they adopt all these tools?

Well, I think most of us know, but AI is already a competitive advantage. Not adopting it is going to put teams and products and companies behind. And I've yet to meet an engineering or a security team that has capacity to address everything on their backlog. There's always a huge backlog to work with.

And I want my team to use Copilot. So they're already reporting that they reach for Copilot first when they're working on hard problems, and that Copilot Chat's much faster than going to a web search or man pages.

I'm thrilled they're spending more time on the things that they were trained to do, leveraging their experience and education, versus working on boilerplate code.

Is our job going to change as developers, as security leaders, as security practitioners? I think it will some, right? But all technology change brings change to jobs and roles.

I think that developers are going to start to look more like architects focused on higher-order problems when some of these things like GitHub Copilot start to take away the boilerplate, run-rate things that they were spending time trying to figure out every day. And that's going to help us welcome more people into the software development ecosystem, which I'm really excited about. We need more people.

If we look ahead, what other things are coming? What other things do we think about in the risk space?

Are threat actors using AI? Well, I think they are. Of course they are. Threat actors love to adopt technology sometimes faster than any of our companies that we work for adopt technology, which is challenging for the security folks.

But if you're a security or technical leader, I would recommend some key areas to pay attention to. How's your workforce using this? Where can it make massive productivity gains?

We still have a job to do at GitHub security. It's protecting the home of all developers. And so I want to give my team every advantage they have to go up against threat actors and protect our platform.

What advantages does AI give those threat actors, and how do we need to think about it?

So in talking to Gene, asking help, I'm looking for, I know my team and I are thinking a lot about this, and we're iterating on how to integrate security and AI into the development and operations value streams. What are the risks and opportunities of security and AI?

So if you're also working on these things, I would love to spend some time chatting with you, and we can learn from each other. Because after all, that's what GitHub and open source software are all about.

I really appreciate your time. Thank you for letting me chat with you today and tell my story, GitHub's DevOps story, and a little bit about its security and AI.

If you have more to talk about, we're doing a Q&A breakout session at three o'clock today. I would love to chat with you more then. Thanks so much.