The Operationalism of DevSecOps
Operationalism is a cornerstone of operations management and is based on the intuition that we do not know the meaning of a concept unless we have a method of measurement for it. Percy Williams Bridgman coined operationalism in his book The Logic of Physics (1927). Bridgman's work, specifically around an operational definition, heavily influenced Deming's work. In Deming's New Economics, he said, "An operational definition is a procedure agreed upon for translation of a concept into a measurement of some kind." In this presentation, I'll discuss Dr. Deming's work and give insight into how he would have viewed standard DevSecOps metrics.
Chapters
Full transcript
The complete talk, organized by section.
John Willis
All right. That's me, John Willis. I have this terrible Twitter handle, `@botchagalupe`, which is impossible to spell, but once you get it, you'll remember me. I've done a lot of things and I think there are about twelve books. There was a whole life before I got into DevOps, but I don't list all those books. I'm probably most known for The DevOps Handbook. Gene and I did the audio-only Beyond the Phoenix Project. A lot of what you heard about Investments Unlimited started in 2019 with a paper called the DevOps Reference Architecture, where we scoped out how you can do some of this. I have a book coming out in August, Profound, about a ten-year passion of mine around Dr. Deming.
I want to talk about some of the ideas Dr. Deming thought about and taught, but that are not as well known as the things you may have heard from him. I have also done a bunch of startups: I sold a company to Dell, sold a company to Docker, was early at Chef, spent three years working for Jim Whitehurst at Red Hat, and now I'm at Kosli.
My book has a note from Jim Whitehurst, who hired me at Red Hat, was CEO of Delta, and sold Red Hat to IBM for $32 billion. I sent him the book and said he could skim it, and he wrote that he was going to skim it but could not put it down. That is going to become a plaque on my wall.
So, operationalism. In operations management and operations research, we use this heavily when we build toasters, nuclear power plants, and manufacture cars. The only place we never use it is IT. Why is that? I've been on a mission: first to write the book, and now to ask what this means for the things we talk about and do.
Operationalism starts with Percy Williams Bridgman. He said we do not know the meaning of a concept unless we have a method of measuring for it. He was a Harvard professor, won a Nobel Prize in physics for high pressures, and had trouble working with synthesized diamonds because the gauges kept breaking. That led him to question measurement itself. He was inspired by Einstein's special theory of relativity, where measurement crossed domains. I'm not a physicist, but the point is that this set him on a journey about how to understand measurement. Length might be measured in meters, yards, square footage, or light years. The question is: what is the method of measurement, or what are we actually measuring?
Bridgman also said that to find the length of an object we have to perform certain physical operations; the concept of length is fixed when the operations by which length is measured are fixed. The concept of length involves the set of operations by which length is determined. Keep that in mind for DORA or risk controls.
Dr. Deming comes in here. Most people know Deming through certain principles or familiar Deming cliches, but he was influenced by Henri Fayol, Percy Bridgman, Clarence Irving Lewis, and Walter Shewhart. Fayol wrote about continuous improvement in 1916. Bridgman contributed operationalism. A lot of Deming and Shewhart's PDSA and scientific thinking comes from the American pragmatist C. I. Lewis, not C. S. Lewis. Walter Shewhart contributed statistical process control. When people look at Deming, they usually talk about the fourteen points or the System of Profound Knowledge, but they often miss first-order principles: operational definitions and analytical statistics.
Deming used the example of a sweater specified as 50% wool and 50% cotton. That sounds like a reasonable specification, but the producer could make the front side wool and the back side cotton. That is not what was intended. Deming says an operational definition is a procedure agreed upon for translation of a concept into measurement of some kind. In Out of the Crisis, the procedure might be that every ten sweaters you cut a one-inch piece and send it to a lab to check the mixture of cotton and wool. We need clarity about criteria, tests, and decisions.
My friend Mark Burgess, the physicist who created infrastructure as code, says there is no such thing as 42. A physicist's view of a number is not just that it is deterministic and ends there. If I show you a picture and ask how many items are in it, it is an unfair question if some are broken: do you put them back together and count them, or not? If I show you a restaurant and ask how many people are in the room, is the woman at the edge in the room or in a mall? Is the person standing in the back the waiter, and should I count him? Should I count the kitchen staff if we do not know whether the kitchen is separated from the room? The real question is: why are we counting? Is it for food logistics, weight distribution on a cruise ship, or fire code? Without that purpose, the number is not meaningful.
Deming said there is no true value of any characteristic, state, or condition that is defined in terms of measurement or observation. So what are we doing with DORA? I think what DORA has done for our industry is incredible, and I have been involved with Gene and the group on those papers. But we are grownups now. We need to question elite performance recognition for something we call lead time when we do not have a consistent value or definition.
This applies to a lot of complex-system words. Any metric with the word mean in it: run. Root cause is a stop sign for learning. Zero defects, zero trust, and hermetically sealed deployments should all make us ask what those words operationally mean. SLAs are another example: if the agreement says three nines and gives you eighty cents back while your commerce system lost a million dollars, what did that number actually mean?
If this hurts your head, read Mark Burgess's book, a physicist's view of what we do in DevOps and infrastructure. It is a tough book. If you know what a Planck length is, you will be okay; if not, you will have a lot of search or ChatGPT windows open.
Deming said that to understand a measurement like "no true value," you need criteria, a test, and a decision. Criteria asks why we are asking, what is good or bad, whether there is a standard, and whether teams or the industry agree. For lead time, Accelerate says commit to deploy, but which commit? The merge commit? The first commit? Continuous delivery is different from continuous deployment. Do you use feature flags or dark launches? I do not care what you call the measurement. I care whether Team A, Team B, Team C, different divisions, and the industry are using the same operational definition. The decision is what we are going to do, why we counted, and why we cared.
Donald Wheeler, a premier expert on statistical process control, extends this. He says criteria is consistency. We do not look at one value and say, "it was four" or "it was seventy-eight" and make a decision. Five minutes later it may be twenty. You need analytical statistics. The test is statistical process control, and the decision uses heuristics and control limits such as upper and lower control limits. When I was at Chef, I heard Werner Vogels at an Amazon meeting say they measure everything but care about one variable, order rate, and have years of heuristics for what order rate means. They do not look at one number; they use statistical data over time. A control chart gives boundaries. Sigmas are standard deviations. Means are dangerous; standard deviations are better than means; all statistics are dangerous.
I said DevSecOps in the title, so I have to talk about risk. Here is a control chart I did with anonymized real data. If you want to give me anonymized data to use these tools, come to me; we can tokenize it. This chart shows container scan vulnerability failures by week. At week nine it dips below. Do we fire somebody because they submitted a container image that did not scan well? No, that is not the point. Going back to Wheeler, at week seventeen there is an interesting trend. Measuring is not about a single data point, what Deming would call an enumerated value. It is about analytical value. A control chart is not designed to answer the question; it helps the subject-matter expert see patterns that let them ask better questions. In this case, a new team had been added and was not using the sanctioned hub repository for images, so container scan failures increased for weeks. The pattern was decoupled from the single data point.
Industries that create toasters, automobiles, and nuclear power plants have a hundred years of intellectual property around these patterns that we do not use. Risk adds complexity on top of an industry that does not take operational definitions seriously. FDA 21 CFR Part 11 has more advanced information about risk, digital signature, and digital signing than much of what I have read in banking. Those documents talk about specification, verification, and validation. Specification maps to criteria. Verification maps to test. Validation maps to decision. That is the operational-definition paradigm, and it also maps to Plan-Do-Study-Act. We cannot look at a value and make a determination; we have to look at the next iteration over an hour, day, week, or month.
Working with Bill Bening has pushed me to look at Secure Controls Framework, and it is fantastic. I started playing Deming with it and made a Pareto chart. SCF has 1,169 risk controls across 285 frameworks. That is your life. Instead of only looking framework by framework -- FedRAMP, PCI DSS, SOC 2, DORA -- you can look from the bottom up at controls. Bill suggests there is a cyclomatic complexity opportunity for risk controls. If one control spans five frameworks, with different language and definitions, and you have five products such as Snyk, Sigstore, or Checkmarx, each adds complexity. One risk control might have a cyclomatic complexity of ten. Looking bottom-up, the top controls in a Pareto analysis can show which controls carry the most weight. One compliance control has a control weight around 72 or 73% of frameworks. If you focus on controls with high weight, they likely exist in most frameworks and may let you reduce the cyclomatic complexity of risk controls and collapse redundant vendor tooling.
SCF is a brilliant way to contextualize risk controls. ComplianceForge and SCF have done excellent work. They describe standards, guidelines, procedures, and controls, and I can map those to operational definitions: standards and guidelines are criteria, procedures are tests, control objectives and policy are decisions. Bill makes me smarter on this.
For a concrete example, look at NIST zero trust architecture and service mesh. I am a fan of Istio and service mesh, but it scares me that we have moved layer-three networking activity, which is hard for an adversary to compromise in a large bank switch configuration, into Kubernetes and service mesh configuration. Adversaries do not look for the hard ways in; they look for the easy ways. Many teams run Kubernetes without Envoy access logs enabled. Someone can change configuration and reroute traffic to another domain through traffic shaping, and if Envoy access logs are off, you do not know what happened. Depending on the application, they can move traffic to a proxy domain and use your data.
We prototyped a YAML DSL around governance engineering, a kind of lint for security controls. We wanted to check traffic-control configuration and what percentage used certain checks. The design-and-test loop runs the thing and says whether it passed. Another adversarial tactic is to pick a very high API version hoping it will be seen as the latest version. This could affect Istio service mesh configuration: someone bumps a version, someone falsely uses that version, and now you do not know what is defining your service mesh traffic routing.
Most of you got a free copy of Investments Unlimited yesterday, and we signed it. I also talked about SLSA. We have done an analysis of SLSA, which defines four categories and eighteen controls. For the regulated-space work with large banks, there are probably fifty-one meaningful controls you should be thinking about beyond SLSA. If you want to know more about my Deming work, go to profound-deming.com or find John Willis on LinkedIn. Thank you.