Functional Programming for (Dev)Operations and Infrastructure

Log in to watch

Las Vegas 2019

Functional Programming for (Dev)Operations and Infrastructure

Functional programming had mostly been relegated to academic endeavors until recently. What?s changed is that our apps are now distributed systems and are simply too complex for us to reason about without help. Programming intent and ceding some of the control to an interpreter takes makes creating solutions to very hard problems doable.

We've seen an increase in the uptake of functional programming for application development using languages such as Kotlin and Clojure, and now we are seeing some key tenets of functional programming seeping into operations.

Through a series of demonstrations, in this talk I'll draw some parallels between functional programming and tools that are increasingly used on the operations side of the house. If part of Devops is about building empathy between developers and operators (and it is), then shared core tenets help.

Chapters

Full transcript

The complete talk, organized by section.

Cornelia Davis

My name is Cornelia Davis. I work for Pivotal, VP of technology there, which basically means that I get to work on emerging technology and help our customers and our own business tie that to business value.

So that's what I love to do, is play with new, shiny objects. I worked in our PaaS product, our platform-as-a-service product, Cloud Foundry, for quite a number of years, the first three or four years. Still work on that, but for the last two and a half years, the shiny object I've been focusing on is Kubernetes, and I do love it. We'll talk more about that.

I'm a computer scientist by background. The reason that I go all the way back to Cal State Northridge and Indiana University is that those two different experiences really shaped who I am as a technologist. As an undergraduate, I went to a greater LA-based university, where I learned lots of pragmatic things, like software engineering and programming languages like C and Fortran and COBOL. Yes, I am that old. Then I went and I started a PhD, never finished it, but started a PhD in the middle of nowhere in Indiana in a small university town where I did research, and I did theory of computing and programming languages research. And there I became a functional programmer and really came to understand the mathematical foundations of functional programming, and I'll talk more about why that comes back and why that's important now.

I already told you what I do at Pivotal. I came there as an application developer, and then I very quickly learned that while we talk about PaaS being about developer productivity, platforms are at least as much around operations as they are developer productivity.

And earlier this year, at the beginning of June, I published my book, which, yes, that's the T-shirt that I'm wearing. And I'm going to do a little bit of crowing and jump out of my slides for a moment and come over here to "The Unicorn Project" offer. You heard Gene talk about it this morning, the 11 books. It's on there. I could not be more proud to be part of the 11 books. And Gene actually wrote the foreword to my book, and he has just been a super supporter. So, thank you, Gene.

All right. So let me jump then in. Sorry, should have skipped through that slide. Let's see. One more click. All right. With the Mac, if I go one too far, then I can't go back.

So I want to talk a little bit, very quickly, about the evolution of computing. When I started my career, I started at Hughes Aircraft, working on embedded systems. I wrote software for a single-threaded, single-processor machine that was going to be embedded on a missile system. I was doing image processing there. And so that was a relatively simple model.

Then we started evolving as an industry into having multiple threads. And then we started creating multiple processors that were creating multiple threads. Now, as that evolution went on, we continued to evolve, and now we've got not only a whole bunch of different processes that might be running on a single machine, which allowed us to make assumptions and take advantage of shared components on that machine, but now we're in this highly distributed environment. So I like to say that we're all distributed systems programmers now.

Thirty years ago, when I was in school, there were a few courses on distributed programming, distributed systems, but not that many. It was pretty niche, and not a lot of people were doing it.

Now, in terms of the languages, the languages have evolved as those physical architectures have evolved. Yes, I did a little bit of assembly language programming, and once in my 30-year career, I actually found a bug in a compiler where I went down to the assembly level and found it in a machine instruction that was incorrect. Got that fixed in the compiler. So we did assembly language when it was pretty simple, but I'm sure everyone would agree that there's no way that we can do assembly language programming for distributed systems. Simply too hard.

So we've had all these different languages that have evolved as the hardware has evolved to give us a better model in which we can think through the programming. Now, all of those languages that you see at the top are what I call imperative languages. So they tend to be very control-oriented, and I'll talk a little bit more about what that means.

I mentioned that when I went to Indiana University, that was where I discovered functional programming, and the interesting thing is that when I left Indiana and I went back to industry, I pretty much left functional programming behind, except for hobbies. But in the last five or so years, maybe a few more than that, we've started to see these languages start to get more popular and to actually get used in industry. Languages like Clojure and Scala and Kotlin and F# are being used.

Now, I just finished reading "The Unicorn Project." I did get an advance copy. And for those of you who haven't read it yet, when you read it, you'll see, and you've heard Gene talk for a few years about how in love he has fallen with functional programming. That carries through the entire book. It's really, really very cool. So he actually goes through programming examples in the book where he has the protagonist of the story, Maxine Chambers, convert some code from imperative into functional and gain resilience through that, get rid of some of the edge cases.

So if we look at some of those hallmarks very quickly, the hallmarks of imperative programs are that they're very control-oriented. One of the most important ones I'm going to talk about today is that they allow side effects. They allow you to take a variable, set the value, and then they allow you to change that value anytime you want. Now, most of us are so familiar with that, you might be thinking, "Well, yeah. Of course, that's the way it works." It doesn't have to work that way.

If we start moving over to a different model, which is functional programming, one of the biggest hallmarks of functional programming is no side effects. Variables are fine. You can set a variable to a variable and then use that variable, but you can never change the value once it's set.

Whoa. For those of you who have never worked in an environment like that, you might be thinking, "How do I ever program?" I'll give you an example in just a moment. There's some really great things that come around that: it allows us to do things like prove correctness of an algorithm, it eliminates hairy edge cases, and so on.

So what I'm really getting at here is that what we are doing is I'm challenging us to look at the models that we use to think through some of these hairy problems that we're solving today. Now, I won't go through this in detail. I'll just build out the slide, and I'll leave it to your pleasure, because I do want to make sure that I've got plenty of time to go through the main content, including my demo, which may fail spectacularly. We'll see.

But these are some of the kind of comparisons between imperative programming and functional programming. And my whole point here is that while she is really badass, this 80-year-old programmer, she doesn't necessarily need to program every single control point of the applications.

So what I want to do is talk about having machines do more of the reasoning for us in certain scenarios. So let me give you a very concrete example here. Here's an example where I have a list, and I want to filter out all of the things that are less than 30 and more than 20. And this is the way that I can program it really, really simply. And then what I want to do is I want to get the second element out of the resulting set. And so the way that this would work and the way that I think it through mentally is that I do the first filter, I do the second filter, and then I pick out the value.

If I take that exact same code and I have hallmarks like immutability, then the compiler can actually do some optimizations for me, and the compiler can, instead of doing it that way, I don't have to program differently. The compiler is making the decision to do this instead. Is the first element less than 30? Yep. Is it more than 20? Yep. There's my first answer. Is the second element less than 30? Yep. Greater than 20? Yep. And now I get my answer without processing the entire list.

So those types of optimizations are possible if you follow certain mathematical principles, mathematical principles that exist in functional programming.

Now, I said that I'm always thinking about business value, not just shiny tech. Sure, that was some great code, but what's the business value in this? Well, for that small list, not that big of a difference. But if I can have my computer, my processor, my compiler, apply optimizations like this, I might bring my product to market faster because I can process my big data faster.

We heard Gene this morning talk about an example that he cited in "The Unicorn Project" book, where if something took 48 hours, they knew they were doomed. If you can shrink that time, you can shrink the time to market, and you can actually win out in the marketplace. That's real business value tied to some pretty deep tech.

Okay, so all of this is how does this relate to ops and infrastructure? Well, if we look at the way that we're programming systems, and by programming systems, I mean let's take an example, like an application, and deploy it. That deployment, we all know that we're using infrastructure as code. We all know we're using code to automate things. That's the kind of programming I'm talking about, is the programming to deploy and keep our systems running.

Now, in the past, we've used languages like Bash and Puppet and Chef and Salt and Ansible, which all tend to be more used, even if you can do it differently, used in an imperative style. But as these systems get more complex, we have a need for some new programming models. And the new programming model that I mentioned that I've been spending a lot of time with is Kubernetes.

And this programming model, which is much more functional-like, exists in a number of other places, like in the Cloud Foundry platform, but most popularly recently in Kubernetes, and I'm actually going to show you some live demo with that.

So going back to that slide that I showed you earlier between the imperative versus functional programming, there's a number of different examples left to right that kind of parallel what I showed when I talked about programming applications. Today, I'm going to focus on two of those, declarative deployments and immutable infrastructure. And that's what I want to jump into more in detail now.

So let's talk about declarative deployments to start. I want to use a very concrete example, and this is an example that runs through my entire book. And by the way, the book is written for application developers and architects, so it's really about the patterns that you need to understand, implement, and sometimes you don't implement them yourself, you just leverage these patterns to create more resilient software that runs well in the constantly changing and highly distributed environment that exists in the cloud.

So here's an example. I've got three microservices. Two of them connect to a back-end database. They're doing simple things like connections are who are the users and who follows who. Posts are their tweets or their posts. Those are stored in the cookbook database because I love to cook. And then on the left-hand side, we have an aggregator that brings those things together.

And of course, there's multiple instances of all of these. So for example, there's seven instances of the connection service, four of the post service, and five of the connections post service, the aggregator. So if I've got this topology and it comes time now for me to deploy this, how do I decide how I'm going to deploy this across, let's say, failure domains?

Is this something that I, as a human being, should be worrying about? The answer is that for the last several decades, the answer has been yes. We've gone through and said, "Oh, well, I'm going to go ahead and distribute this across availability zones," if you were even using availability zones, because a lot of people still even don't do that.

So how do I do that distribution? I'm going to suggest that we humans shouldn't be doing that at all. We should not be making those decisions. So let's let a machine do that for us. And in fact, this is the place where I'm going to do my first demo. Now, this is just a placeholder to remind me to go over and do the demo. You'll see in a moment this running live.

So I'm going to need my glasses here to get started. I'm going to jump over. And the first thing that I'm going to do, I'm going to spend a little bit of time. Oh, okay. That was my glasses. It is sharp. I'm going to spend a little bit of time just showing you the structure of my application. So the first thing that I want to do is show you what the application is structured like.

So what I have here are deployment manifests for Kubernetes. What you see on the backing services, and I've already deployed those for speed, is a MySQL database. Remember, that was on the far right-hand side, a token store, which was tied to the aggregator, and another Spring Cloud Services component there. Those are already deployed.

And then in those YAML files on the top are the three microservices, and those are pulling containers from my Docker Hub and deploying those as a set of applications. So let me show you what that looks like. So currently here is my console.

So I have deployed what you can see here, and I'll read some of these to you, is I have... Oh, that's Memcached, which is something else. This is that Spring Cloud Service that I talked about. Down here, I have MySQL and the Redis services. So those are both deployed.

And these two icons here... Let's see if I can increase. No, it doesn't really increase too much. These two bubbles that you see in the middle, those are the actual Kubernetes nodes. So that's where workloads are running. Okay?

So what I want to do first is I want to go ahead and deploy my workload. Now, I'm going to deploy my workload in a very interesting way, and I'll explain this in just a moment. I'm just going to get it started, is that I am using an open-source project called Flux that comes from the CNCF. What Flux does is it allows you to do deployments via GitHub.

So I'm not going to actually deploy anything directly into this Kubernetes cluster. I am going to declare what I want deployed, commit that into GitHub, and then the system will take care of doing the deployment for me. You see what I mean? I don't do that. I don't make those decisions, which means that I am out of the business of making errors as well.

So I'm going to go ahead and create that. Then I am going to do a `kubectl apply`, which is going to actually deploy that agent, that Flux agent, into my deployment. And now I need to do one other thing, is that I need to allow this agent, which is going to be checking the GitHub repository and orchestrating the deployment for me. I `fluxctl identity`. I need to give it privileges to actually do writes back into my GitHub repository because not only is it going to look at what I've done, it's also going to record the things that it does in the GitHub repository for me.

So I'm going to take that and I'm going to come over here and add it into my GitHub repository as a new key, and I'm going to give it write access. And the big test, do I remember my password? Yes. And what should be happening now is it should go ahead and do the deployment.

Now, I may have to give it a little bit of a bump. Yes, I did the key. There we go. So you can see it on the right-hand side. Let me go over to my thing here. What you can see here is all of those instances are starting to pop up. All of the instances have been created.

Now, there's also something called a service, which is a load balancer across those services. And so what we see here is we see the post service. I'll move it over here on the right-hand side. What we see here is the connection service. Actually, I'll put the connections up top, just like the diagram that we saw below. I'm going to move that there, and the aggregator.

So let's see if things are starting up. The containers are creating, and hopefully they start running here very quickly. I'm going to do a quick check. Okay. So those are doing downloads. So I am now beholden to the wireless speeds. It's downloading my containers, and my demo just failed spectacularly.

I'm going to very quickly, and I promise I will just go back to the slides because I don't want to waste your time, but I am going to do a very quick check. Yeah. Okay. I think I ran out of memory, is what I ran out of. So I apologize for that. My demo's going to fail spectacularly, as I had half-expected it to.

But part of the reason that I wanted to show you this was... Part of the reason I captured this is that what you'll notice here is that this is the topology that should have been deployed. And as it started deploying, as I said, I ran out of memory, and actually, my Kubernetes cluster crashed. The reason for that is that I'm actually running it all on my laptop, and I wasn't able to get it up running in the cloud because of some very specific technology that I'm using.

So I apologize for that. But what has happened here, and this is what the deployment should have showed, is here's the aggregator. And you'll notice that across the different machines, I've got the different machines here. It has evenly distributed those nodes across the different machines. Okay? So it's made that decision for me.

Now, let me go ahead and go back into the mode here and talk about another thing. I'm going to come back to some more business value. Okay, so now I've done this deployment, and now we've had a spectacular failure. One of those availability zones has gone away.

Now, does that mean that I should be paged, I as an operator should be paged immediately? Is that something that I need to be woken up for in the middle of the night? Maybe, but maybe not. Maybe because there's still additional nodes that are running and it's going to recover that system, maybe I just come in the next morning and I get notifications that something's happened, but the system has repaired itself.

Now, what's the value in that? Well, resilient systems, which means happy customers, which when you read "The Unicorn Project," ideal number five that you saw up on the screen today was customer focus. So happier customers. Another one, less stressed staff. Ideal number two is about joy, focus, and flow. It's about employee and developer satisfaction. And we know that those things tie to business outcomes.

All right. So I want to now turn my focus to immutable infrastructure and talk about immutability a little bit more. Now, if my demo hadn't failed so spectacularly, I would've been able to show you a demo that what I just showed you was ops for the application. Right? I was deploying my application and doing operations around that.

What I want to show you, and again, I won't be able to demo it live, but I'll show you what I would've done in the demo to show you that you can do that for your infrastructure as well.

So what we had running was we had a Kubernetes cluster with two worker nodes. That's where the workloads are going to be distributed across those two worker nodes. Now, when those worker nodes get deployed, when the content on those gets deployed, I have all of my application code and my runtime dependencies running on an operating system inside of a container. That container is that little dotted line box there.

Now, if something goes wrong or if I need to update something, I can just do this. I can throw out the container, create a new one. If I need to make a change to something, some configuration value in there, do I SSH into the container and make that change? Absolutely not. I'm going to throw out a container and create a new one with that new configuration or that new source code, and I can keep doing that. And there are so many values, ramifications, business values that come from being able to do that simple thing of throwing out a container and creating a new one.

One of those is when a bad actor comes in and installs malware on the system, and then the bad actor disappears. Now, while the bad actor is in your network, they're easier to detect, but malware can sit on a system for four months and be collecting information, looking for additional ways to exploit things.

And so how do we combat this malware? Well, we can try to get better at detecting it, but honestly, that's pretty hard. So there's another approach that we've started to see folks using in the industry, and that is they proactively throw out the containers. They proactively say, "You know what? On a regular basis, I'm just going to throw away this container." If there was no malware in there, there's still no malware. But if there was malware, then it only lived in there for a week, three days, maybe only a day.

And we can do this at the host level as well, at the infrastructure level as well. If there was malware, it's now gone. And so what I'm suggesting that you do here is that you repave the environment. Don't wait for the potholes to get really bad. Just go ahead and repave very often.

We have a customer at Pivotal who does repaving every three days, and I can tell you who it is because he speaks about it publicly. It's Wells Fargo in the financial services space. Cleans up the containers and the hosts they're running on, throws them away and recreates them every three days with zero downtime. And he's not satisfied. He's working on doing it on a daily basis.

So what's the mental model for humans to do this? Well, how about we just do a repave? And so in the last couple of minutes, I'm going to show you what that looks like to do a repave on this.

So the demo that I wanted to show you, and I seriously doubt that anything's come back. Nope, nothing's come back. It's still pretty hosed, but let me go ahead and show you over here that what I was going to do in the demo was here's the manifest for my Kubernetes cluster. You can see that I have one master node running here at IP address 0.2. I have a worker running at 0.3, and I have another worker running at 0.4.

What I'm going to do, or what I was going to do, was I was going to take this next worker and I was going to add a new worker to it. Like I was going to throw out a worker and start a new one. So I can do this, which is I'm going to uncomment that out.

And then over here, I was going to do a `git diff` to just show you that I added this new machine. Okay, so the new machine was there. And then I was going to do a `git add`, `git commit`. And I'm going to put in a comment which says repave now. And then I was going to do a `git push`.

And that was going to be all that it took because again, I had that GitOps agent, that CNCF Flux project, watching the GitHub repository, not for my application code, but for what my infrastructure topology looks like. And that was going to spin up the new node, and then we were going to do the repave, and it was going to move workloads onto that new node as we threw them out.

Okay? So you see how... Oh, and let me show you one last thing before I get back to my last couple of slides in the last minute that I have left, and that is I want to show you the log. So everything that I have done and the things that my agents, that the computer's done for me, are recorded in this log.

So we're leveraging a component like Git to not only be the place that I put source for both my immutable infrastructure and my applications, but also to keep track of the things that these automated agents are doing for me. I have it in a single log, everything that's happened to my infrastructure. That's pretty darn cool.

All right. So let me just close up, and it just went to zero, but let me just very quickly close on a couple of notes. Gene is always asking what we still need help with, and so I do want to spend just the last 30 seconds on this slide.

This first comment is a reference to how difficult it can be for us to change our mindset from imperative thinking to functional thinking, to allowing the computer, to letting go of needing to control every single detail and programming in a different model. And that, I refer to that as learning the toe turn in snowboarding. Those of you who have learned how to snowboard, you know what I mean.

But more interesting is that that programming model that I'm referring to is a native thing in Kubernetes. If you think that Kubernetes is all about container scheduling, that's only the first use case that has become popular with Kubernetes. Kubernetes actually has this deep programming model that I'm referring to here that allows you to do things in this eventually consistent and more functional style of programming your infrastructure.

So that's someplace where we need help as an industry, is to really understand that programming model and leverage it. And then finally, it gets very complex in managing temporal dependencies. This is an area where I'm spending a lot of time thinking about those things, and if anybody is thinking about those things, please let me know. I'd love to chat with you. And with that, I thank you, and I apologize for going a minute over, and I apologize for the spectacular fail of my live demo. But thank you so much.