Dear Security, Compliance, and Auditors, We’re Sorry. Love, DevOps.
Stop it with the DevSecAuditComplianceOps buzzwords. Let’s simply talk about Modern Governance.Great software requires governance. Governance stinks because we do it wrong. I promise to give you the means to go from commit to production with 100% no-human-hands. All while meeting visibility, security, compliance, and audit requirements without fail. Modern Governance applies to standard line-of-business software, machine learning, edge, IoT, and any other software artifact.DevOps solved the Developer and Operators conflict. It forgot other essential folks of the delivery lifecycle: Security, Compliance, and Audit.We will talk about Modern Governance. Modern Governance resolves governance toil with a software engineering approach. It is no different than applying Site Reliability Engineering (SRE) principles & practices to the dull, mundane, and toil-ridden governance processes.
Chapters
Full transcript
The complete talk, organized by section.
Bill Bensing
Hey everybody. How's everybody doing today? I'm Bill Bensing. The only rule I have for today, I call my Beyonce rule: if you like it, tweet on it. Also, if you don't like it, tweet on it. We can have an open conversation. I hope to say things that are nice and appealing today; if not, let's have discourse in public.
I want to talk a bit about a love letter. This is probably the only love letter that you can write around your company and not get in trouble with HR. This was generated here actually as a forum paper; I think Topo was one of the authors. As I go through this presentation today, what I want to do is bring this to life. I want to bring this promise between the DevOps community and our compliance, audit, and governance functions to life.
A bit about myself: my background, I like to say, I come from the world of shadow IT. I was at one time what a lot of people probably in here try to stomp out, creating new things. I argue that empowering shadow IT is a way to empower the business and the non-tech companies. But really what I want to do is make the right thing the easy and default thing to do, and I'll argue that with automated governance, and talk about autonomous governance today, that we can make this happen.
I had the opportunity to co-author Investments Unlimited. Who here has read the book so far? That's a lot of hands. Who here is going to read the book? Is it on the reading list? Yes. So go ahead and check it out. I think there's an author signing here today, later around 6 o'clock or so. Come by and let's chat more about it. If you see me anywhere, feel free. Let's talk. I get geekishly excited about this type of stuff, as my co-authors know, and it was an amazing experience. By the way, Investments Unlimited was based upon a lot of industry experience, as well as a lot of content that reaches back to 2015 with the Unlikely Union of audit and DevOps, and going through the DevOps Audit Defense Toolkit and The DevOps Handbook.
Let's bring the bottom line up front: people should not execute the governance process. Machines must execute the governance process. People, though, are very critical. They need to design, develop, and specifically codify the governance process. When I say governance, what I'm specifically referring to today is security, compliance, and audit governance.
I'll argue governance is the current bottleneck for software delivery, specifically in highly regulated organizations. We must modernize our governance capabilities to address this bottleneck. Modernizing is automating the governance process. But here's the big but: it's more than automation. It's autonomous.
What do I mean by autonomous? I love the SRE book. Who here loves this book? There are a lot of people who read this. Chapter 7, "The Evolution of Automation at Google" -- anybody who's using Kubernetes, go read that. It gives you the history where Kubernetes came from, Borg, and why it solves the problem it does. But as they talk about there, automation is not a panacea. Of course, there are areas where you have to use manual processes, but there's a higher-level system design, which is the idea of autonomous. Modern governance is a higher-level system design. It is autonomous at the end of the day. Modern governance is autonomous governance.
In Beyond the Goal, Dr. Eliyahu Goldratt talked about four questions to adopting new technology. When you adopt autonomous governance, you're adopting new technology. You ask: what is its power, what limitations it diminishes, and specifically, what are the old rules you operate by and what are the new rules you need to operate by? We're going to cover this today between old and new rules.
A bit of the agenda: we'll talk about the governance problem, solving this problem, a solution -- governance as a service -- combine this with engineering productivity, and then talk about a governance engineering team. I'm going to rip through a ton of content. I have 103 slides and 25 minutes to go, so here we go.
Thank God you can rewatch and slow me down. In most organizations, what is governance? It's toil. Security, compliance, and audit is toil. What is toil? Again, back to the SRE book. This is the Bible of toil: work that is devoid of enduring value and scales linearly as you grow. What I love, too, is looking at your governance process as a bug in your system. Carla talks about: if a human operator needs to touch your system during normal operations, you have a bug. I'd argue most of our systems are bugged; they're just not technical bugs.
Let's talk about two types of toil: governance toil and delivery toil. Governance toil is simply the humans in the middle of the machine cranking it. As you add more software, that scales linearly; you have to add more people there. Delivery toil is the ambiguity caused when you're doing your governance process. For example, somebody says, you don't pass, or here's why you don't pass, and I have to ask why. I have to figure it out. It takes me a week just to come up with a way to resolve it. That in itself is toil.
Because of this toil, what's actually meant to mitigate risk -- our risk-management processes and our change approval boards -- actually increase risk. Don't specifically take my words on it; I'll give you the proof and the numbers.
Let's say it takes two weeks to get through your governance process. Let's say it takes 16 hours to get one new feature done, and all these changes are independent of each other. Let's assume we're pretty good at what we do, and the probability of failure for any one change is 10%. Basically, any one change, we're 90% successful with.
In most organizations, people start to batch changes and create a release train. They pace the release train to the governance process. The train departs every two weeks, which means we have basically a total of five changes or five new features that go through our governance process every two weeks. Remember, the purpose of this governance process is to reduce the probability of failure. We should expect that the success rate of the batch changes is no less than 90% and possibly higher. Does everyone agree on that? No. I like that because you're right: we're wrong. The probability of success is actually 59%. Anybody who knows normal probability analysis knows where I'm coming with this.
Let's give our governance process the benefit of the doubt and say there's one change in there that we identify and fix. Let's just say we're going to go from a 90% probability to a 99% probability. Actually, what that does is only bring us up in that one change from 59% to 62%. So how do we fix this?
Let's talk about how we fix this by solving the governance problem. Remember, we would talk about automating that stuff. Now it's autonomizing, the judicious application of automation. How do we go about autonomizing governance? I've got five guiding principles that you can take away today.
One: collaboration. This can't be done in silos. Small teams work across everybody: development, operations, security, compliance, audit must work together. Two: enabling constraints. Nobody likes constraints, but the idea and the goal is throughput. An enabling constraint is to build a set of constraints around your software delivery that enable more throughput, especially with governance. Three: requiring explicit evidence and an idempotent process. How many times have you gone to an auditor with a set of controls and evidence, two different auditors, and they give you two different outcomes? Maybe you pass; maybe you fail; maybe you fail and they both tell you different things. That's not idempotent. I should be able to go to two auditors with the same evidence and controls and every single time get the same result.
Four: governance is zero trust. I watched one of the plenaries, and it was zero trust. Here I'm going to talk a lot about zero trust today and the zero trust architecture. Five: the implementation must be ephemeral and immutable.
We need to think differently. We need to go from subjective change management to verifiable continuous verification. But first we have to talk about objectivity: how do we get to objective attestation and controls? To achieve this continuous verification, we need to autonomize our human control gates.
This is a pattern I pulled from the DoD Enterprise DevSecOps reference guide. As you look at the processes, the little diamonds are the control gates. They're automating between each other steps, and it's a high-level idea. How do we go about autonomizing these control gates?
We do this with two specific procedures. First, we'll call this evidence and attestation. You run a scan in your CI pipeline. You collect that material. We're going to normalize that material to something we're calling a governance contract. Then we want to sign, cryptographically sign, all our evidence and material just like we'd sign any binaries for our software. We want to establish non-repudiation and make it tamper-evident.
Second, we'll have the policy enforcement point. If anybody knows NIST SP 800-207, the zero trust architecture, I'm specifically using that because this is zero trust. At any point of your CI or CD, you should be able to put in a policy enforcement point that goes and gets the applicable policy, retrieves your attestations, looks at the attestations, cryptographically validates them, and then audits.
We need a new concept that we don't really have right now: we need a governance contract. What is this governance contract? The governance contract defines the semantics and syntax for our governance primitives. It is how we codify our governance specifications. It specifies our governance. In test -- and I'm just using testing as an example of a gate -- we want to codify what is happening there to determine if the machines will let us pass at that diamond.
What does a governance contract look like? I've got a sample of one. At a minimum, it is ubiquitous language. It is devoid of any specific technology. It's technology agnostic. Notice I'm talking about CVEs, common vulnerability exploits, high and medium severities, but I'm not talking about SCAP or Sysdig or tools and looking at those reports. This needs to be understandable by technical and non-technical folks. This is the ubiquitous language we're establishing, and this is also the data exchange format for how autonomous governance happens.
Let's talk through this very simple governance contract. First, you have the governance procedure you agree on when you're working together. We'll call this one common vulnerabilities and exploits. This is the gating that we have to get by. Second, you have the procedure elements: what are the things that we need to evaluate by policy -- high, medium, low severity, and other things in there. You also have the element value: what is the declarative aspect that we're evaluating?
How do we create these governance contracts? This is actually a SCAP scan from a container. All we do is, in the CI/CD pipeline, transform this into a governance contract. As we're doing this, we are externalizing policy execution. Then you take policy as code and apply it to the governance contract. Ultimately what happens is 100% autonomous commit to production: from the first time I've done that pull request and that pull request has been approved, it goes all the way through and determines if I can go into production or not with no human hands in the middle. That is really the nirvana we want to establish. Let's admit it's not easy.
To give you an example of business outcome, with SOC 2 and SOC 3, we think about our five Trust Services Criteria. We can use this concept to codify and autonomize the proof, the evidence and proof on a continuous basis of our security, availability, processing integrity, confidentiality, and privacy. I use this as one example; it's not the only example. There are a lot of other governance frameworks and auditing requirements. Imagine if your Type 1 and Type 2 audits were done every time before you went into a production environment. If an auditor came in, you could simply show them all the evidence post hoc and prove explicitly that you're doing exactly what you're saying you're doing.
The gates that you can autonomize include, but are not limited to, these types of things. Anything that you can generate evidence from, that you can take that evidence and create a higher-level abstraction or governance concept -- the semantics and syntax -- you can autonomize.
Let's talk about a solution: governance as a service. Golden paths -- I'm loving this. I saw some stuff about Backstage focused on golden paths. A golden path is that enabling constraint, this thing you provide an organization to get there. The golden paths are not just for development and operations; they're for everybody. Golden paths are not golden cages, though. It's not the only way you can do it. You're giving internal product teams the opportunity to take this easier route or go the existing route with the manual processes.
Let's build a golden path to production. Current state: I think this looks familiar to everybody. Different teams do their builds, tests, their own scanning tools, and then there's the mother-may-I: can I go to production? The production environment tends to be this mythical thing. It's sort of Narnia, but just not quite Narnia.
Future state of governance as a service: remove all the scanning and those tools from there. Have an invocation to governance as a service. Your policy and security and compliance are all codified. Team A is saying, can I go to production? If not, tell me what I can't do and why. You have this gate, and this gate is autonomous. When somebody says, I'm ready to deploy, let's go to market, the gate can ask governance as a service: can this go to market?
What you see there is the zero trust architecture applied to software delivery. This control plane establishes a governance control plane that's abstracting our technology and establishing common semantics and syntax for governance. The policy enforcement point is that gate. Your enterprise resource is your production environment, with all the different context around it. You can bring threat analysis and other things into it as well. You're bringing this context into a standardized autonomous way.
What does governance as a service look like? It fundamentally is another pipeline. You have SAST, DAST, compliance. You have an evidence repository and an attestation repository. Governance professionals are codifying compliance, security, and policy as code. I don't expect overnight that a standard auditor is going to start writing Rego. That's not a reality. But this is where the organization can help to codify, and then the gate asks if we're good to go.
Here's a more detailed example I've been working with my customers throughout the industry: SAST, moving all of this into the governance pipeline and governance service. Making the easy thing the default thing to do. A lot of organizations say, install this tool, run this, and have auditors understand how to look at a SonarQube report. Through research in our book, we talked about one team that did the same SonarQube reports six years in a row, and they submitted that every time they got an audit. That's awesome, and I want to meet those folks and shake their hand, because that takes some gall.
I want to bring this in; it's not quite a tangent. I want to talk about governance and engineering productivity. The State of DevOps Report talked about people stuck in the middle. They're stuck in the middle for a lot of reasons, and a lot of it is knowledge. How do high performers work? Of course, they build these internal development platforms. At the American Airlines speech, they talked about Backstage and the developer runway. The idea of internal development platforms is codifying these golden paths and providing workflow abstractions.
Here's an example of Netflix's internal engineering productivity. You see all the different domains. These are engineering productivity teams. I believe in this video they said for all the 2,000 engineers at that point in time, they had 400 working on these teams. Why not have a governance engineering team?
In the modern technology organization, cloud native on the left, that's what platform teams do. They take this traditional IT, which is all artisanal projects, and go to industrial products by imposing enabling constraints and collapsing organizational complexity. That's what they're paid to do. They're engineers paid to help other engineers figure out how to engineer better.
They also focus on old rules versus new rules. Yes, they build tools. Yes, they build techniques. But they're evaluating: how do we change these rules, and how do we bring these new rule sets into our organization?
If you're familiar with concepts such as platform engineering or internal development platforms, this is what the stuff manifests like. We think a lot of this from the developer experience as the developer, but what about the development experience? The developer is only one part of this process. While a significant part, they're only one part. You're forgetting ops, compliance, security, and governance. How do we bring these folks in?
What's even critical is what engineering productivity solves: top-line outcome. Netflix, Facebook, Apple, Amazon -- you see these organizational patterns because this is how they got over the hump to create more top-line outcome and scale. They made the right thing the easy thing and the default thing to do. What this looks like is an internal development platform for highly regulated organizations. It's got a portal, an API gateway, it serves everybody, it's got cross-cutting concerns, and canonical workflows: day-one and day-two workflows, governance as a service, CI/CD, anything you could do, onboarding a new engineer. This is how you codify and bring this, and why I talk about engineering productivity.
That brings me to the modern governance engineering team. Adopt the mindset of engineering productivity. I want to tell you: repurpose your change approval board. Replace your CAB with a governance engineering team.
I'm taking it one step further. What if I told you the site reliability engineering principles are 100% equivalent to governance engineering principles? I don't think people think about this. Governance and governability is a form of reliability. If you truly want to be secure and safe, there's the rubber-stamp secure and safe, and we call ourselves Enron, or there's the real stuff we do, like Volvos running all those tests because they want to prove to themselves they're more safe than the standards put out there.
What if we adopted something like four golden signals? I was thinking on the plane here, what would those signals be? Human touch points: what's the quantity of touch points between commit and production? Audit takt time: how long from the time that audit starts to when it ends; how long does that audit take? Control ambiguity: John and Mike were talking about this, the red, green, yellow, and black; black is ambiguous, what controls actually apply and don't apply. There's this black hole a lot of organizations aren't even thinking about. Then control coverage: of the controls that I know apply, what's my quality? What's my coverage?
This now gets you to governance level indicators, governance level objectives, and I'll keep governance level agreements off right now. Here's how you can start to apply reliability concepts to your governance process. What are your indicators? What are your objectives? Think about bringing in error budgets. What if you had a budget for takt time? What if you had a budget for controls? At the end of the day, it's not always going to be zero. There may be some threshold, like five for ambiguous controls, or maybe one or two for controls that are applicable and not passing.
I went a step further and looked at the modern governance hierarchy. Could I take the reliability hierarchy, which is basically based upon monitoring, incident response as the environment changes, postmortem, and all the way up to product, and relate that to governance? Some monitoring: you have your governance level indicators. You're monitoring your indicators. The biggest thing is I can have an application that doesn't change over time. Guess what? My environment changes, and that's why I think SRE and governance are a one-to-one. For example, in governance issue response, something in my environment changed. A new CVE came out. I didn't change my code, but a new vulnerability was discovered. How do we address this? Then go to governance procedure analysis and design, procedure automation, control planning, and development or product.
Governance and golden paths: golden paths are internal products. The golden path should automate your governance. Also admit your investment happens up front to get your software and your teams onto these golden paths. That's just an expectation, but this is the judicious application of automation. Canonical implementations: how many different ways do you build a Spring Boot application? There'd probably be one to get to production at the end of the day. Everybody's laughing; they know what I'm talking about.
At the end of the day, you're focusing on the 20% of things that drive 80% of heartache, but you still have exception paths, and you have to have these. There will be a process where we need manual evaluation. I have a mainframe application I touch once every 36 months. The ROI is not there. The costs are incurred at each CAB session, and it's appropriate for some situations. We have to admit this and be comfortable with it.
As I end this: no matter the road traveled, just remember you always have to apply the same governance. Modernize your governance with autonomous governance, and autonomize your governance with a governance engineering team. No questions, just conversations. Thank you all very much.