Taking Ops & Infrastructure From Iterative to Functional
Cornelia Davis is Sr. Director of Technology at Pivotal, where she helps customers develop and execute on their cloud platform strategies. Responsible for guiding clients on the technical elements of broader transformation, she helps development and operations teams, and IT executives understand and adopt new systems, platforms and processes, as the central part of delivering greater value to their customers. Fundamentally, Cornelia helps enterprises shift from a mindset where IT is a necessary evil, to becoming software-driven businesses with IT at the core. Finally, through these deep engagements with clients, Cornelia also brings valuable insight back into Pivotal, driving advancements in software and services offerings.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
So to introduce the next speaker, Cornelia Davis, I'm going to do something a little strange because I'm going to talk about me. I've been studying high-performing technology organizations since 1999. Even though I was in the information and security compliance space, I was at Tripwire for 13 years. I actually identified not as a security person, and despite the fact that I was formally trained as a developer, I got my graduate degree in compiler design, I actually self-identified as an ops person.
I think that was where the action was at. Who is saving the customer and the company from all the sins of development? Operations. Where is all the action happening? Operations. Where are all the saves happening? Operations.
But something has changed. About two years ago, I actually self-identified not as an operations person, and by the way, that's been for almost 20 years. I now self-identify as a developer. I think it's because of a computer language I've learned called Clojure. It's an immutable functional programming language. It's a Lisp language.
What I found was that by learning this -- and I'll have to say, this was the hardest thing I've ever learned in my career. I probably spent 60 hours reading before I could write one line of code. It was that hard, and yet I found it to be the most rewarding thing I've ever learned. I think one of the manifestations of that is that 90% of the mistakes that I had made in my career just disappeared. It was made impossible because of immutability, because of functional programming, functional purity.
Here's how it's impacted my daily work. I used to say that on a good week, I would spend half the time hanging out with the best in the game -- that's people like you in this crowd, this community -- and then half the time writing. And that's changed. On a good week, I'd like to spend half the time hanging out with the best in the game, half the time writing, and then 20% coding. It's really brought the joy of coding back to me.
If any of you have used Clojure or functional programming, ping me on Slack. I would love to talk with you. In fact, it's because of this that we had to move Scott Havens' talk to today, and I'll explain why. Let's talk about why Cornelia Davis is going to be talking to you today. She almost got her PhD in language design. She's been a part of the program committee for three years.
One year ago in London, we were talking about something, and we started talking about how we both loved functional programming and Lisp. We talked about concepts like immutability, composability, declarative modes, and how it applies not just for us for development, but also infrastructure as well. Because if infrastructure is code, that means infrastructure can be made radically safer in the same ways that it has drastically changed the development community.
I'm so excited about this talk because I'm pretty sure you've never seen a talk like this before. I asked her, "Can you teach us about how important functional programming is to development?" Widely considered to be, in a just world, the next breakthrough since object-oriented programming. And infer how those lessons learned in the development community will be incorporated and are being incorporated into infrastructure and operations.
Cornelia and I have been working on this presentation for almost a year, and I'm so excited about what she has to share. So with that, Cornelia Davis.
Cornelia Davis
Good morning, everyone. Thanks so much for that, Gene. It really has been a pleasure for the last year or so to be noodling on these topics, and I hope you find this useful. It's a chance for me to be a little bit of a geek, not that I don't do that on a regular basis, but let's go ahead and get started.
To tell you a little bit more about me, Gene already alluded to this. I did my bachelor's degree in the greater Los Angeles area at a school that was really focused on getting people ready for industry, very pragmatic. I learned a lot of different programming languages, and we'll talk a little bit about what those programming languages looked like in just a moment. That's where I started, at Cal State Northridge.
Then I worked for a number of years and said, "Ah, screw this work thing and I'm going to go back, and I'm going to go back to school and get a PhD." I left Indiana before I got my PhD, so I'm ABD, all but dissertation. But Indiana University was very, very different in that it was not in the greater LA area. It was not in Silicon Valley. It was in a small town in southern Indiana. It's really a very much academic research-oriented university, and there I studied theory of computing and programming languages, so really the mathematical underpinnings of it.
Today, I work at Pivotal. I work really on product strategy. I work on emerging technologies, and I have the great opportunity of working with a lot of our clients to take emerging technologies and figure out how those emerging technologies can actually bring value to the businesses.
I have been a developer by background pretty much my whole career, and when I came to Pivotal and started working on platforms, I realized that they weren't just about development, they were also about ops. So I actually self-identify as a developer, but consider myself almost as much an ops person now as I do dev.
The other thing is that, because I have so much free time on my hands, a couple of years ago I decided to write a book, and I've been working with Manning on writing a book called "Cloud Native." It's an architecture book, but with lots of code samples. The code samples are in Java, an imperative programming language, leading into the things that we're going to talk about.
Let's start with a little bit of a history of computing, how it started. In the beginning, we had very simple programs. We had single-threaded programs. In fact, I started my career in aerospace, where I worked on embedded systems, on missile systems. We didn't have multiple processors. We didn't have distributed systems. It really was a single-threaded system, and I programmed against that. Then we ended up with multi-threaded systems, and then that's expanded into multiprocessor systems. Even our phones probably have multicores on them. Certainly, our laptops and our workstations have multicore. In the last 15 years or so, things have gotten highly distributed. It's not just that we have multiple things going on on a single computer. Our software, our digital offerings are spread out across the internet, across lots and lots of different nodes.
If we take a look at things from a programming languages perspective, I actually did some assembly language programming, not only in school, but a little bit at work. I also programmed in Fortran, even did COBOL. Yes, I am that old. Then we started moving into higher-level languages like C, C++, Java, and more recently, Golang. What's started happening in the industry is I'm starting to see people pick up other languages like Scala and Clojure and Kotlin and F#. My customers are telling me that their developers are programming in these languages.
If I reflect on that, it's a very interesting thing. If we look at those earlier languages, these are all what I would call imperative languages, and I'll explain that more in just a moment if that's not familiar to you. These other languages that we see at the bottom here are functional programming languages. That's what I want to study in the next 25 minutes or so: the difference between imperative and functional, what are the benefits of the latter over the former, and then, most interestingly, how do we apply this to ops?
Let me explain imperative and functional programming with a concrete example. I am the luckiest mom on the face of the planet in that my son decided to go into technology as well, and he just graduated with a degree in math and computer science. Earlier this year, he was in a class on genetics, genomics, so he was doing machine learning and those types of things. One of the problems that he had to solve was they were doing gene sequencing, and they had to break down these very long sequences of characters into different segments, and then they would process the segments independently and then deduce some results from that. So genome sequencing, if you will.
When you broke up that segment into lots of little small pieces, there was this balance. Sometimes the pieces were too small, and that would skew the results. After running the first algorithm and breaking the sequence down into smaller segments, he went through a process of actually merging the segments that were too small back into some of the larger segments. What we see here are a segment of size five, another one that's about size 12, another one that's size three, and another one that's size 13. What we want to do is get rid of the five and the three and merge them together. Ultimately, we want to merge those short segments together, and the output would be something like this.
Let's see how we would write this little simple program in a couple of different styles. This first example is Python, and this is in the imperative style. By imperative, what I mean is that you as the developer are controlling every single step in the process. You are figuring out the sequence of events that you're going to program the system to do.
What you can see here is we've got variables, we've got side effects. I'm changing the values in those variables. Without stepping through the code in detail, roughly the algorithm looks like this. I've got a counter. I'm taking a look at the value of the nodes. I'm checking to see if it's too small. If it's too small, then I write over some of the values, and then I get rid of a node, shrink things together. You can see on the right-hand side, the list length is changing in size. Eventually, at the end, my counter goes to the end, and I'm finished, and I get the right answer here.
The thing that I'll tell you is that that program, that short program that we saw in the previous slide, actually has a bug in it. That's, in fact, one of the things that's a hallmark of this type of programming language. It's sequential. You control every step. We have variables with side effects. Because of those things, one of the things that happens is that we end up with hairy edge cases. That program that I showed you worked out only because my last segment was big enough. If the last segment had been small, I would have thrown an error. Those are the types of hairy edge cases that are hard to find. It's also difficult to parallelize, which if you remember the picture that I started with, everything is in parallel now. Distributed systems are in parallel.
An alternative is written here in Scheme. That was the language that I did everything in when I was at Indiana University. This is a functional language. What you'll notice here, without going into the details, is let's step through what that algorithm looks like. We start with the same input, and basically, the way it works is that I declare different conditions and different outcomes based on those conditions.
The way that I solve the problem here is I break it down into two smaller sub-problems. I take the first part, I solve the problem for the rest of the list, I get that answer, and then I merge together the first part with the answer that I got from the last part. That's functional. The hallmarks of functional languages are that they tend to be declarative. The way we program is declarative. That is, I'm not specifying every step that goes, I'm declaring my intent. We'll see where that comes back in just a moment.
There's no side effects. You'll notice that there was never an equal statement in there that assigned a new value to some memory location. And it's recursive. It's not looping, and there's a big difference between recursion and looping, and we'll see what that looks like. The outcome of this type of programming is that it tends to be easier to program in a way that allows for parallelization. There's fewer edge cases, and here's an important point: we can have systems do more for us. We can actually prove certain things about the correctness of our systems, and we can have inference happen on our behalf.
Our very own Gene Kim described how he went through this transition himself. He started programming, and he has a particular program that he shared with me that started out when he first wrote it years ago in Objective-C as 3,000 lines. Then when he rewrote it in TypeScript and React, it reduced to 1,500 lines, and now he's got it running in 500 lines of Clojure. So the point is, what is the model that we use to reason with? How can we solve these increasingly complex problems using a model of thinking? That's what I want to talk about.
If we look at the difference between imperative programming languages and functional programming languages, there's really four categories that I would list out. On imperative, we go from iterative, I specify every step, to declarative. On imperative, I go from mutable state, side-effecting state, over into immutability. In imperative, I'm looping, in which I'm changing that mutable state, and on functional, I do recursive. Finally, maybe it's not so much purely functional, but on the imperative programming languages, we tend to compile libraries in. In fact, in some of those imperative languages, we've moved into a much more modern way of doing things, like in the Spring framework, and we saw Mick, who did some of his work in AOP, aspect-oriented programming. It's not purely functional, but it is really relevant when we move over to the functional side.
As badass as she is, I am suggesting that this 82-year-old software developer, yes, I found her online, is she not amazing? Thank you. She should not necessarily be programming every single element of her software anymore. Let's let the software do more of the work.
Here's the thing. We cannot reason about every detail in our heads anymore. So we need a model for thinking that also allows the computers to do some of our work for us.
With that, let's turn over to operations, or as I'm going to call it, systems programming. With systems programming, in the past, we've had distributed systems. We've already had distributed systems for some time. Certainly, they started getting huge when the internet came about. The languages that we've been using to program these systems -- and yes, I mean program, so operations is about programming systems -- are things like Bash, and Puppet, and Chef, and Ansible, and Salt. All of these are imperative in style. They're very sequential.
As these systems get more complex and more broadly distributed, if we think about the systems programming, what are some of the more modern tools that we can use? At Pivotal, I work on our Cloud Foundry platform, our Cloud Foundry platform PAS, CF PAS, and an underlying system that's underneath the covers called BOSH, are different styles. Kubernetes is something that now has really drawn the attention up around this new style, and we'll see what this style looks like in a moment, but it really is more functional in nature.
If we look at imperative versus functional for systems programming, on the imperative side, we have scripted deployments, very imperative. On the functional side, we have declarative deployments. SSH, we replace that with immutable infrastructure. Long-running contexts that stick around for a long time that we make changes to are replaced with ephemeral containers. Finally, middleware that has all of this stuff bundled in it is replaced with this notion of sidecars. I want to study each of these four things in a little bit more detail. Again, the point here is that we cannot reason about all of the details of our complex systems anymore. We have to have software assist us with that. We need a different model for thinking and designing these systems.
Let's start with declarative deployments. Starting from here, we have a whole bunch of nodes out here that are going to get some of our deployments into. The way that we do that now with things like Kubernetes and Cloud Foundry is we declare what our application topology's going to look like. I have, for example, here, a declarative manifest that says, "Hey, I'm going to have three instances of one microservice, five instances of another microservice, and two instances of a third microservice."
I tell the system, "This is what I want." I don't say, "Deploy the first one on this node and deploy the second one on this node and deploy the third one on this node." Why do I have to decide that? Why can't I just say, "Here's what I want," and some intelligent system makes it so? They evenly distribute across my availability zones. They know what those are. Let the system do that for me. And then if I've got another workload that's around compliance or auditing, can't I just say, "Hey, I've got this compliance need, and I need that on every single one of my nodes. Please put it out there." That's what I mean by declarative deployment. Instead of me deciding where I'm going to deploy things, let a system do that analysis.
The way that this works is in Kubernetes, there's a constant control loop. I joke around that Kubernetes is fundamentally a whole bunch of infinite loops. You declare what you want, and there's a whole bunch of intelligent controllers that are constantly comparing what you want to what you got. That's the way that works. What that means is if I were to lose a node, for example, one of my worker nodes goes away, that intelligent control loop will recognize that there's a difference between what I have asked for and what the actual state of the world is, and it will repair that.
That's pretty significant. That's the difference between imperative programming. I have to now go decide, "Oh, what nodes did I lose? What workloads did I lose? Now I'm going to go deploy this here and deploy this there." No. My model is, here's what I want. Let the software do that work for me.
Of course, as Kubernetes, I just said there's a whole bunch of infinite loops, a whole bunch of different controllers. If I take that back to the functional programming model, here's a very simple functional programming model that just sums up the numbers up to some value N. The way that we think about it from a functional perspective is there's a base case. I declare my base case, and then I just say, well, if I'm not in my base case, I'm going to break my problem down into, I'm going to assume that I can get the right answer for some rest of the problem, and then my job is just to put together the start of the problem with the answer from the rest of the problem. If you remember from your computer science education when you took any kind of theory class, this is the inductive proof. There's a little bit of a leap of faith. I'm going to assume that the answer comes back right from the rest of this, and I'm just going to declare how to combine those answers. Pretty cool stuff.
Now let's move on to the second one, which is immutable infrastructure. Sure, immutable infrastructure is, we all know this, pipelines, not humans. But if we don't change the model that we're thinking in, challenges remain. That simple pipeline that I just showed in the previous slide kind of took the general approach that we've always done: provision a system, configure it, deploy my application, configure it. It's a very imperative way of thinking.
Some of the problems that remain with this type of approach, of just automating the way that we've been doing things in the past, is that it can be slow. Provisioning even a virtual machine, even with that automation, can take five, six, seven minutes. If you are doing that in a SDLC, if your developers are doing it as a part of the development cycle, waiting five or six minutes to see what the output is from some small change in code is just plain old too long. Or if I am running in production and I've got some type of an issue, I'm not going to redeploy. I'm going to SSH into that box and make a quick change because every moment of downtime is a problem. So we end up with snowflakes. Another thing is that dependencies are very, very complex, and those complex dependencies can also result in snowflakes.
If we go back to this picture of the pipeline, which just automated an imperative style, we end up with these challenges that we saw. So let's try to challenge that thinking. Let's look at the first part of this pipeline and change it from that whole provisioning and then doing some configuration. Let's create a platform, a self-service platform, that's going to allow me to just consume when I need to deploy applications. That's key number one. We start to build some of our pipelining into the platform itself. Then the other thing is that if we make that platform a container-based platform, containers don't provision in five or six minutes, they provision in milliseconds. So my development life cycle, or if I've got a problem in production that I need to alleviate, I can simply deploy a new instance, and I can do that in sub-seconds. That first problem of the slowness has really been helped. We're talking here about a container-based platform.
Now this pipeline is oversimplified. I say deploy and configure. Really what you're doing is a whole bunch of deploy, configure. We've got very complex systems. Doing all of that series of deployment and configurations is really this configuration nightmare that I talked about. Coordinating all of those pieces together to create my application deployment can be a challenge.
What I want to do there is take that complex multi-node deployment and deployment topology and create a container, or maybe a series of containers, that are now described in a Helm chart. For those of you who don't know what Helm charts are, it is a way of describing a topology of containers that all work together as a cohesive whole. I want to pipeline those things, but instead of pipelining those things on deployment into my production system, I want to pipeline those things into immutable artifacts that are going to be deployed very quickly at the end.
You'll notice that the final deployment pipeline got super simple, and that's because I've effectively shifted left some of the things that used to be in the deployment life cycle back into the development life cycle.
What does all of this have to do with imperative versus functional programming? If we go back to this imperative model, what you see here at the bottom is you see the state. Now watch it. What we see here is as my algorithm is moving, my state is changing. Now I've got my state store. It's mutable state, so it's constantly changing and everything is getting updated. Now how do I know if those state changes are in fact correct? I have to worry about the correctness of my code, which we already established that sequential code, iterative code, mutable state is hard to reason about and it's hard to make sure that we get right. Lots of hairy edge cases. But then all of the complexity in our systems is that there's lots of other influences: influences that can influence my program, which is going to influence the output, and other influences that can influence the state directly.
On the other hand, on functional systems, the way that this works, and this is the difference between looping and recursion, is that I have my call stack. When I do recursion, each one of my recursive calls isn't replacing mutable state, it's actually creating a new frame that captures the information that is required to generate the state. That call stack captures those series of events. The frames on the call stack are immutable. They cannot be changed. In order for me to get my output, I simply play through the call stack and I get my final state. Now if for some reason I lose my final state, no problem. I can just replay my call stack. That's really interesting.
If we take a look at that, now your containers, those containers that I am building by shifting left, are a little bit like my immutable stack frames. And the controllers that I talked about earlier are a little bit like my interpreter. You see the parallels. Really pretty cool.
We're seeing this pattern in another place, and that is with event logs. Everybody's heard about Kafka, right? I was at Kafka Summit last week, and probably the most common use case that I saw talked about at the Kafka Summit was this. I've got a legacy store that is this monolithic thing. I want to build all these net new applications, but the cadence of evolution on the legacy store is so slow that I cannot get what I need in these microservices that I'm building over here. The way that they're doing things there is they're taking change data capture from the legacy store, putting it in the event store. The events in the event store are immutable and sequenced, just like the call stack, immutable and sequenced. That allows us then to take that information and hydrate a whole bunch of independent stores for my microservices. Pretty cool stuff.
Let's move on to the third thing, and I'm going to speed up a little bit here, and that is, let's talk a little bit more about those ephemeral containers. If we go back to what we've been looking at before, before I had long-running state, and on the other hand, I had these many disposable stack frames. The stack frames don't write over each other, and so therefore, as I play through them, I can dispose of them, or like in the Kafka case, I don't dispose of them. I keep those things around.
Now let's go back to our picture here of our host, which is, let's say, a VM, and the container, which is the little dotted-line box there. In the container, I have my root file system, runtime layer, application layer, and so on. I'm going to paint a scenario here. A bad guy comes along and injects some malware into that system. We know that these types of vulnerabilities have caused all sorts of breaches. Now, the bad guy goes away. It's easier to detect a breach when somebody's in there. But the way malware works is they come in, they inject the thing, and then they go away.
If I've got ephemeral containers and I've got a system, and I'm thinking in a model around ephemeral containers, I'm designing my stuff so that I can, on a periodic basis, just simply get rid of that container and rehydrate it with an immutable new instance of that container. Remember, I've created those immutable resources in my pipelines, and I can just deploy them in milliseconds. I can even do this at the host level. What if I replace my hosts on a regular basis and can get rid of that malware?
What I'm suggesting here is that you can repave your environments, and this is increasingly a popular thing that folks are using. I can repave the entire environment because I'm thinking about immutable, ephemeral containers. You can do that very often. I have one customer who does this, who repaves his entire environment, that is every single instance of every application is getting refreshed every three days. He does it twice a week, and he wants to move to doing it daily. I'm talking several times a week often.
Let's move on to the last thing, which is sidecars. Before we talk about the details of sidecars, I want to talk about personas for a moment. We've got individuals who are building applications that are going to be used by your customers, so the end-user applications. They're responsible for rapid iteration and getting the value out to customers. Now, the platform team is providing them the platform that's going to allow them for that rapid iteration. That platform team, however, is responsible for things like security and compliance, making sure that the company stays out of negative headlines. Let's keep those two personas in mind as we look through these pictures.
Here I have my workloads running in production, and I have my auditor or my compliance officer who wants to keep an eye on those types of things. How will they do this? Again, a system like Kubernetes did something very, very smart. Their smallest unit of deployment is not a container. It's something that they call a pod, and a pod can have multiple containers. What this allows you to do is it allows the app team that's doing the rapid iteration to provide a container with the app. It allows the person who's responsible for compliance to provide a container, a sidecar container, that sits right next to it that is concerned with the security and compliance. Those things work together.
What that allows us to do then is it allows us to take this picture and attach those sidecars, it's a cross-cutting concern, to every single component in there. Kubernetes was smart. They did that even more. There's another concept. This auditor is also interested in what's going on not only in the containers, but on the machines themselves. Kubernetes also has a notion of a daemon set, and the daemon set allows me to inject something else. I'm starting to get the hook. I'm almost done, I promise, Gene. It allows me to deploy those. On this picture, it allows me as an auditor to make sure that I've got a workload running on each VM.
It's about cross-cutting concerns, injecting those cross-cutting concerns. What we have here is we can inject those cross-cutting concerns both into the platform and into the pipelines that are creating the containers that we deploy.
Just to quickly sum it up, if you remember what we had here around systems programming, the parallels are there. We have declarative, that was on the programming languages side, immutability, recursive, and we've got aspect-oriented programming. Thank you. We cannot -- give me just 30 seconds, 30 more seconds here. We've got functional systems programming.
I don't want to leave without saying, and I'll skip through this slide, it's all about the toe turn, I do want to say encourage your developers to look at functional languages. Encourage your platform teams to look at the functional approaches. Finally, what I need help with, and I'll skip through the warning. The warning goes back to what Topo said yesterday: don't use old patterns and try to automate those. Are you doing this? I would like to hear stories about who's doing this and also how we can achieve the transition from imperative thinking to functional. Thank you. Thanks for the extra few minutes, Gene.