Taking Ops & Infrastructure From Iterative To Functional, Just Like Dev
Cornelia Davis is Sr. Director of Technology at Pivotal, where she helps customers develop and execute on their cloud platform strategies.
Responsible for guiding clients on the technical elements of broader transformation, she helps development and operations teams, and IT executives understand and adopt new systems, platforms and processes, as the central part of delivering greater value to their customers.
Fundamentally, Cornelia helps enterprises shift from a mindset where IT is a necessary evil, to becoming software-driven businesses with IT at the core.
Finally, through these deep engagements with clients, Cornelia also brings valuable insight back into Pivotal, driving advancements in software and services offerings.
When not doing those things you can find her on the yoga mat or in the kitchen.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
Okay. To introduce the next speaker, Cornelia Davis, I'm going to do something a little bit strange. I'm going to talk about me.
So I've been studying high-performing technology organizations since 1999. That started when I was the CTO and technical founder of a company called Tripwire in the information security space. And throughout that entire time, I've really identified myself as an ops person, maybe a security person. And this is a little bit peculiar because I was actually formally trained as a developer. I got my master's degree in computer science with a focus on compiler design and high-speed networking.
But I've always loved the world of operations. In fact, The Phoenix Project, one of the main motivations was to show sort of the heaps of injustice that was always given to operations and to show how wrong it was.
But something's actually changed. Around two years ago, I really started self-identifying as a developer, and I think it's because of one thing. It's because of learning a computer language called Clojure. It's a Lisp-based language. It's a functional programming language, very much in the category of Haskell or F#. And core to it is this concept of immutability, and it's taught me to be a much better programmer. What my experience has been is that almost 90% of the errors I would make have just disappeared.
And so, by the way, here's a request for help. If any of you also like Clojure, ping me in Slack. I want to talk to you. All right, this is great.
So Cornelia Davis, she was actually studying to get her PhD in language design. She's a part of the DevOps Enterprise Programming Committee. She's been doing this for three years.
And one year ago, about five miles from here, we were talking after last year's conference about concepts like immutability, composability, the need for declarative models versus imperative models, and just talking about how that could actually change how we think about infrastructure, just like how these languages like Haskell and Clojure change how we think about developing applications.
So I'm very excited about this talk because I don't think you've ever seen a talk like this before. I know I certainly haven't. So we asked her to teach us about how important concepts that have shown up in functional programming, how have they changed development. It's widely considered to be probably the biggest conceptual breakthrough since object-oriented programming. And then furthermore, show how this could predict how we should be thinking about infrastructure and operations.
Yesterday, many of you were dazzled by Richard Cook's concept of above the line and below the line. I think many of you will not think about incidents in the same way again. My fondest hope is that after seeing this presentation, you'll have an aha moment like I'd had two years ago and not be able to think about programming or infrastructure in the same way.
So, Cornelia Davis.
Cornelia Davis
Thank you.
Gene already gave a little bit of my intro, but I'm going to tell you a little bit more because I think it's going to shed some light on the topic at hand.
So I am also trained as a computer scientist. Went to university, did my undergraduate and my master's at Cal State University, Northridge, which is in the greater Los Angeles area. It's an area where there's lots of industry, and that university in particular was really focused on making sure that I was ready to go out and be productive in industry.
So I learned lots of different programming languages. Pretty much all of them were imperative. And I went out and I worked for a couple of years, and then I said, "Ah, screw this work thing." And I went back to graduate school, and I went to Indiana University, which is in southern Indiana. It's a small college town out in the middle of nowhere that really doesn't have any industry around it. And I pretty much studied theoretical computer science.
So I studied programming language design and theory of computing, and that was where I went from being an imperative programmer to being a functional programmer. And I'll tell you more about those stories as we go along.
Today I work for Pivotal. I work in technical product strategy. I've worked on products like Cloud Foundry and Cloud Cache, and these days I'm working on a Kubernetes-based solution.
And then the other thing that I'm up to these days is that I am writing a book. It's an architecture book called Cloud Native. It has programming examples in it. I'm actually writing them in Java, not functional languages. Forgive me. But it's an architecture book with some working examples, but it's fundamentally about net new modern application architectures.
Now, given that I work for Pivotal, I am exactly the opposite of what Gene just described about himself, which is that I have always, in my entire career, self-identified as a developer. But because I work for Pivotal, I now self-identify as much of an ops person as I am dev. And so you'll see as we talk through some of this material how I've bridged that gap from taking what I'm familiar with, which is the development side, and applied that thinking to ops.
All right, so let's start by cutting some code.
So the example that I want to use is this. My son, who just graduated in math and computer science, just about a month ago was working on a programming assignment in his genomics class. So they were doing genome sequencing. So they were taking sequences of data and they were splitting them up into subsegments, and if the subsegments got too small, then the algorithms didn't work very well.
So what he did was writing an algorithm to merge shorter segments together. So what you see here on the screen are the various segment sizes. So the first segment was from zero through five, non-inclusive. The next one was five through seven, and so on. So if the segments are too short, let's say five or less, we want to merge them into the segment that's immediately adjacent to it. So that's the code that we're going to write. We want to merge shorter segments together.
And so in this particular case, if we set the threshold to, say, six, we end up with an output like this. So we had four segments, now we have two segments, and they're all a little bit bigger.
All right. So I want to show you two different ways, and I coded both of these up. The first one is in Python, and this is imperative. It's iterative, and what you can see in there is probably very familiar constructs. I've got variables like `i` and list length, and I am looping over my list.
Now, instead of looking through the code, let's look at a visual representation of how this algorithm works. So there's our input list, there's my `i` and my list length, and I'm looping until I hit the list length.
So I'm going to take a look at the first segment and say, "Is it too short?" Yep, it's too short. So I'm going to take the start of that first segment and stick it in the start of the next segment. I'm going to merge it into the side. And then, of course, I have to throw that piece away.
And so that leaves me with this. Now I check the first segment, zero through 17. Ah, awesome. Long enough. So I move to the next one and I find another one that's too short, and I go through the same process again. And so now I end up with this list, and I check 17 through 33. It is long enough and I move over one more, and now I hit the list length and I'm done.
Now, what are some of the hallmarks of iterative programming, of imperative programming? They tend to be very sequential, step by step, one at a time, and we'll see where that comes out, even in the way that we do ops and infrastructure today.
They have variables, and most importantly, those variables are side-effected. We have something and we change its state, and you'll see where that comes out in just a little bit.
Now that results in code that is often very difficult to parallelize, and you'll see why that's important as we go along as well. And there's always hairy edge cases. Turns out that the code that I just showed you has a bug in it. It worked out fine because the last segment was long enough. If that last segment had been too short, I would have crashed. So it's hard to come up with those edge cases.
All right, so let's look at a different way of coding this now. This is Scheme. So any Schemers out there? A few, not very many. So this is Scheme. This is the language that I worked in for the three years that I was at Indiana University. It's also, incidentally, the first language that we teach our incoming freshmen at Indiana University. I'll talk more about that later.
So what you can see here is it's fundamentally a recursive program, and so let's take a look at what that algorithm looks like. Starting with the same list, here's essentially the way it goes. What I'm going to do is I'm going to break the problem down. I'm going to take the car off the front, which is the first element, and then I'm going to apply that algorithm to the second element, and I'm going to get out an answer.
Now, obviously, when I apply it to the second thing, there's a process that goes on. But from a programming perspective, I'm just saying, "You know what? The algorithm is sound. I'm just going to use the algorithm to solve the problem instead of controlling every step myself." I'm going to give a little control over to the program itself.
And now I'm going to take the car, the head of the list, and I'm going to merge it together with what I got back as a result. And then I'm going to merge that with the rest of the result.
So arguably, this is a lot simpler if you think about it from a conceptual standpoint. Now, realistically, it's really hard to think recursively. I'll say more about that in a little bit.
So what are some of the hallmarks of functional programming? Well, they're declarative. So if we go back and look at the code, you'll see that I didn't necessarily say step by step by step. What I said is my result is a combination of the first part and the second part. I'm just combining things together. I didn't iterate all the way down to the leaf nodes. I just did a combination.
So it's declarative. There's no side effects in there at all, and I'll show you a picture later to prove that. And it's properly tail recursive, which I won't go into the details of that. We can chat about it later over lunch if you want. And again, I'll explain it a little bit more later.
There's this notion of continuations. Continuations are basically capturing everything that you're going to, all the execution that remains, you're capturing that as a first-class entity. So continuations are super important in functional languages, and you'll see where they come out.
So it turns out that these types of programs are easier to program for parallelization than anything with side effects, and it's because of the immutability. There's far fewer edge cases, and because there's no mutated state, I can actually prove the correctness of these algorithms. So I can prove that they're always going to work. It's not just, "Hey, I've tested it along five different examples, and therefore I have a pretty good idea it's going to work."
So fewer hairy edge cases. And then Gene talked about this a little bit at the beginning. I know somebody who took a 3,000-line program in C, rewrote it the first time, and cut the number of lines of code in half by moving to TypeScript and React, and then cut that down to 500 lines in Clojure. And that's, of course, our own Gene Kim, who said that was one of the most rewarding things and one of the hardest things that he's ever had to learn, but now he writes code that doesn't collapse in on itself.
So the theme right now, and the takeaway from all of that, is that the programming primitives that you're using have a huge impact on the maintainability and stability and quality of your applications. That makes a big difference.
By corollary, the platform primitives, the primitives that you're using to manage your infrastructure, and the actual application runtime primitives make a huge impact on the maintainability, reliability, and scalability of your applications running in production.
So let's talk about that. The question then is: what does iterative look like in the ops sense, and what would the functional alternative look like? And that's what we're going to explore for the rest of this talk.
So one of the first things that I want to do, just so that you're looking out for this in the rest of the presentation, is, again, I've got an imperative program on the left and a functional program implementation of the same algorithm on the right. And I won't go through it in detail. You can see the basics. It's a simpler program, so it fits on the screen.
But here's the thing. It's a base case, and then I have the general case, which, again, it's declarative, and there's a bit of a leap of faith there. So look for that. Look for where you're not going to control every single step yourself, but you're going to actually build a system that you can trust to give you the right answer. That theme is going to shine through in all of the examples that we look at. So let something else do some of the work for you.
Now, I want to not just talk about these patterns in the abstract. I want to talk about them in the frame of reference of the problems that are still plaguing us in ops today. And so we'll go through all four of these, and we'll look at some patterns that can be applied.
So the first one is snowflakes. So we'll start out with this.
Who's this? Well, this is the ops person who says, "Your code is not working."
And here is the developer. And what does the developer say?
"It works on my machine. It works on my machine."
Now, why does that happen? Well, that happens because we have snowflakes. We have one build that we put into dev, and then part of our entire pipeline, we rebuild those artifacts sometimes. So we have different artifacts that we're deploying into these different environments. And it's not only that, it's that the different environments, well, they're different, too. So the boxes are all shaded differently.
So instead of that, instead of mutating state throughout this whole process, what we want to do is we want to depend on immutability.
So I'm going to start, and of course, I'm going to use the Docker logo because that's pretty much one of the mainstays of the way that we achieve immutability, is I build my immutable artifact. I build some consistency across my environments. Now, the environments, of course, aren't identical, but we have to focus very much on the immutable parts of that. And then we carry that immutable artifact through these series of environments so that we can make it out into prod.
So that's the first concept, is the single deployable artifact. It bundles all the dependencies, so you don't have to do rebuilds and create new dependencies or draw from dependencies that are outside of the container, or we minimize that at least, and they have to be properly parameterized.
Now, there's two different warnings that I want to give you. And so, of course, the common theme here from imperative to functional is immutability. But there's a warning that I want to give you.
So I'll tell you a little story. I was working with a customer recently, helping them really look at some of their containerized workloads and helping them get them running on an engine, on an orchestration platform. And what they conceded to me was that they were building Docker images, but they were actually bundling the properties. They were bundling the configurations for each environment in the Dockerfile themselves. So they didn't know how to create the abstractions in the right way.
So just because you're using Docker doesn't mean that you're following an immutability pattern. So you have to be very deliberate about that.
There's another warning that I want to just point out here, and we'll get to that actually in the next section, and it has to do with security.
So what we have here now is we have our immutable artifact that is running in production. Are you going to let this character decide what's going to be in that immutable artifact? You're going to let your developer run this in production? Not so sure about that.
So let's break that down a little bit and take a look at what actually is inside of that container. There's a number of different layers, as most of you probably know. There's an OS image, then there's the runtime, like the JDK, and then there's the application code itself.
Well, there's a couple of different ways that you can realize this. Let's take a look at both of those. The first one is where all of those pieces are created and provided by the application team. So the developer is, in fact, deciding on OS, they're deciding on the JDK version, all of the other dependencies in there. So that is one option, and it's not the one that I recommend.
Another option is we take those same layers and we distribute the responsibility for those different layers to perhaps another organization. That organization I like to refer to as the platform team. So this is the team that's responsible for enterprise standards. So they get to decide what goes into that. And they say, "Uh-uh-uh, no, we're not going to let just anything drop in there because I am responsible for making sure that enterprise standards are applied."
And one of my other customers uses the term, and I picked up this term from them, which is the trusted container pipeline.
Now, the cool thing about the trusted container pipeline... Well, I'll get back to that in just a minute. So I have a warning on this.
So the trusted container pipeline is, in fact, something that you can put together. It's essentially the Jenkins build for your containers, and it has all of the enterprise standard stuff that's getting injected into that container.
Injection. Hmm. There's an interesting concept. We'll see that more again in just a moment.
Now, once you have that single deployable artifact, there's a couple of things that you have to be very aware of. You're going to be deploying that into prod, so what are some of the warnings that you have to be careful of?
Well, there could be, even with those standards, there could be vulnerabilities that are living inside of that container. They're just, we don't know about them yet. So a new CVE comes out, and now we know about those vulnerabilities. The other thing that we have to guard ourselves from is we have to make sure that we keep the bad folks out. Because the bad folks might come in and they might try to tamper with that container and inject some of their own negative stuff into it.
So you do have to address that, and there are ways to address that. You need to secure those assets in the same way that you secure your source code. So you can do things like make sure that you can use things like signing, digital signing, to make sure that somebody can't tamper with that image once it's in the artifact repository. And when other vulnerabilities are found, you can quarantine that to make sure that it doesn't, in fact, get deployed into production.
So the net takeaway from all of this is that you're going to be managing those artifacts the same way that you manage the artifacts that are your source code. And some of the software development life cycle approaches that you're using are going to be equally applied.
Now, I admit that that's not necessarily a functional thing, but it's so key that I wanted to make sure that we covered it here as well. That applies for imperative or functional.
All right. So here again, the common theme is pipelines and the software development and delivery life cycle. That is common across.
All right. So I said security and compliance. So let's talk about compliance now.
Here I've got a bunch of workloads that are running across my infrastructure. I've got four different hosts there and a whole bunch of workloads that are running out there. And I've got my compliance officer who is really interested in making sure that they understand what is happening. Not the build pipelines. All of that was good. We trust that. But what is actually happening with these running applications in production? So they need some line of visibility across those.
Well, what's one of the ways that they can do that? Well, what we want is we want to address these cross-cutting concerns. It doesn't matter whether it's app one or app two, my mobile banking app or my web app. It doesn't matter. I need to be able to get visibility across all of those apps, and we've heard from people this morning that have things like 6,000 apps running in production. How do I apply cross-cutting concerns across all of those?
Well, what we can do is injection. Now, injection is something that is a common pattern. It's existed not only in functional languages but in imperative languages, in things like the Spring Framework. The Spring Framework became popular because you did dependency injection. So we have all of these programming patterns that we've developed over the years to allow us to do these things at the code level.
What we want to do now is take those same learnings and those same patterns and apply them to ops.
So here what we want to do is we want to inject, if you will, aspects into each one of these running environments. What it requires, of course, is it requires an abstraction that allows these containers to be running side by side. We sometimes call those sidecars, and platforms like Kubernetes, that's why they use a pod as the core unit of deployment instead of a container, is so that you can do things like injection, aspect injection into the pod so that the platform team gets to decide what's running alongside of my applications to make sure that I'm compliant.
So this type of a pattern of injection doesn't just exist to attach sidecars to applications to give us visibility and operability. It also applies to the infrastructure level. So that same compliance officer, or maybe a different one, needs to have line of visibility into the hosts that are running, the infrastructure that's running. And what we need there is another way of injecting that same thing and a way of injecting a cross-cutting concern across those. And we can do that as well.
So these are the primitives that you need in your platform to be able to achieve these patterns.
So with cross-cutting concerns, it's all about more than one container, aspect-oriented programming. And again, you want to apply it to both applications and infrastructure. The common theme there that comes from the programming paradigm over into the operational paradigm is one of injection and composition. Again, one that we understand very well in programming and that we want to apply now in ops increasingly and have the primitives to do so.
All right, so I'm going to move on to the final topic, which is around resilience.
I'm going to start with imperative deployments. Just like we started with imperative programming, I'm going to start with imperative deployments.
So what do imperative deployments look like? Well, they tend to be pretty workflow-oriented. So the first step is that I need to provision my machine. And what do I need to do to be able to do that? Well, I need to file a ticket. And then I need to install the operating system and middleware, and that probably follows from that ticket, and there's a script, probably an imperative script, that is run to lay down all the bits onto that machine.
Then I need to install the app, which I probably do by filing a ticket. And then I need to configure the firewall. But what do I need to do to get the firewall configured? Yep, you guessed it, I need to file a ticket.
And this isn't a talk about getting rid of your ticketing system, but I couldn't help but bring that theme in. And so the process goes on. And at some point, we declare victory. We're done.
Here's my warning to you on that. If you ever catch yourself thinking, "All right, after I'm done with all of this, I am done," I want my voice to come into your head and say, "Oh, but Cornelia says we're never done." Because we're never done. And you know that because you've fought a lot of fires. You're never done.
So let's look at the functional alternative, functional deployments. Over here, we have our hosts, and we want to do deployments into that environment.
How do we do that? Well, we're going to do it declaratively. I just want to say, here is what my app looks like. I am going to run three different microservices. I want three instances, five instances, and two instances of those. And I'm just going to hand that over to a system. I'm going to declare what I want.
It's just like making that recursive call in a functional system. Here, just give me the answer, and then I'll combine it with something else. Hand over the control to a system, a sound system, and a provably correct system. And so that does it. It just makes it so.
Cross-cutting concerns, something for compliance. I need to audit. I'm going to say, "Go ahead and deploy this," and I just declare, here's something that I need across my infrastructure. And sure enough, it deploys that on all hosts. It's all declarative.
So it's certainly, by looking at it even conceptually, a lot easier. And how conceptually easy something is is absolutely important and has a huge impact. In fact, this morning, Gene started off the day by reviewing some of the stuff from the State of DevOps Report from last year that talked about some of these psychological factors, right? Makes a difference.
Now, in addition to that declarative, the way that the declarative model works is it's not just about the initial deployment, it's also about resilience, keeping things running.
Remember? At that point, you might have thought, "Oh, I'm done." But you know what? You're not done because this host right here just went down. So I went from a done state to a not-done state. And so do I have to do anything? Do I have to go run a script? Nope, you don't have to run a script because the system knows what your declared state is. This is what I want. And the system and that control loop will in fact make sure that the system comes back into a state that's in alignment with what you ask for.
Okay? So it's all about that.
Now, it turns out that there's another really interesting thing here in that different control loops are designed to deal with different types. So that microservices deployment that you see on the top is dealt with in a different way than the audit use case on the bottom. Because one is making sure that, notice that when that host went away, I didn't create a new instance of the audit because it was already existing on all of the hosts. So I've got different control loops for different types.
Now, if we go back to this point here about the functional systems, it goes back to that notion of declarative and, in this case, strong typing. Let the system do something for you.
Now, there's one final thing that I want to share with you, and this is the part that's going to surprise Gene because this wasn't in the slides yesterday when we were reviewing them. But I think he's going to really like it because it's something we talked about a year ago.
We are now, almost all of us, dealing with distributed systems. I started my career doing embedded systems, not distributed at all. I had a single processor. Everything ran on that single processor. It was even single-threaded. So we're now, come 30 years later, all distributed systems programmers.
Now, the way that a distributed system, if we think of it from an imperative perspective, here's the way that my application runs. I've got my web app, and when the user accesses that web app, I'm going to make a call to a microservice. That microservice is in turn going to call another microservice and get back a response, and call another microservice and get back a response, and finally, I get back the response at the far end.
Now, you'll notice here as those numbers start popping up, it's a very iterative, it's a very imperative style. One step at a time. One, two, three, four, five, six.
What I want to challenge is that programming paradigm is turning around, and we're actually turning it on its head and moving toward event-driven systems.
Now, the event-driven system says, "Hey, if something happens on one of these downstream microservices, let's propagate those events throughout the network of applications that make up my entire digital property."
What's interesting about that is you'll notice that when I label these, I've got a bunch of number ones. I don't have one through six. I have a bunch of number ones. What that means is that each one of those can be done independently. It's an independent control point.
Now, there's a great deal of power in that. All sorts of benefits from an agile perspective and so on. But right now, it still looks a little bit tightly coupled.
Now, in order to take this event-driven paradigm all the way to its conclusion, what we need is an event store. So that instead of those things being tightly coupled to one another, we have an event store that those events propagate through. That's an important part.
Now, where does this show up? How does this look from an imperative and functional perspective? Let's take a look.
Ah, there's my independent control loops again. Let's take a look. So here's the event store. How does that show up in this functional programming paradigm?
Well, this is the call stack for the recursive calls that I made in my functional program. Each entry on this call stack is immutable, and the output, the state, the final state of my system, is going to be derived from the application of a series of continuations that are living on the call stack. An immutable call stack.
Immutability is incredibly powerful.
Well, how does that relate to the event store? Well, if you take a look at some of the stuff that's going on out in the event paradigm, you'll see that that's exactly the concept. Kafka doesn't allow you to change anything once it sits on the event store. It's totally immutable.
So the final takeaway on this example is that you've got this common theme—
Offstage: Got to wrap up in 30 seconds.
Okay, 30 seconds. This common theme of immutability and continuations.
All right, so here's a summary of the key patterns. The slides will be up there.
The last thing that I want to give you in my last 18 seconds is one final beware. And that is that I mentioned that we taught Scheme as a first language at Indiana University. I could always tell who was an imperative thinker because they had programmed before.
This here is Scheme, but with side effects. It is not a good idea. Look at the name of my function: `do-not-merge-segments-this-way`, please.
And then finally, what do I need help with? As Gene said, it's hard to make the transition from imperative thinking to functional. Developers are at least starting to get it. How do we do that for operations folks? I don't know. Let's figure it out together.
Thank you.