Functional Programming for (Dev)Operations and Infrastructure
Functional programming had mostly been relegated to academic endeavors until recently. What’s changed is that our apps are now distributed systems and are simply too complex for us to reason about without help. Programming intent and ceding some of the control to an interpreter takes makes creating solutions to very hard problems doable.
We’ve seen an increase in the uptake of functional programming for application development using languages such as Kotlin and Clojure, and now we are seeing some key tenets of functional programming seeping into operations. Through a series of demonstrations, in this talk I’ll draw some parallels between functional programming and tools that are increasingly used on the operations side of the house. If part of DevOps is about building empathy between developers and operators (and it is), then shared core tenets help.
Cornelia is Sr. Director of Technology at Pivotal, where she helps customers develop and execute on their cloud platform strategies. Responsible for guiding clients on the technical elements of broader transformation, she helps development and operations teams, and IT executives understand and adopt new systems, platforms and processes, as the central part of delivering greater value to their customers. Fundamentally, Cornelia helps enterprises shift from a mindset where IT is a necessary evil, to becoming software-driven businesses with IT at the core. Finally, through these deep engagements with clients, Cornelia also brings valuable insight back into Pivotal, driving advancements in software and services offerings.
When not doing those things you can find her on the yoga mat or in the kitchen.
Chapters
Full transcript
The complete talk, organized by section.
Cornelia Davis
Good afternoon, everyone. Thank you all for being here in the afternoon of the third day; your tenacity is impressive. I will introduce myself in a moment, but first I have a question for you. I admit that I am going to be talking a little bit about Kubernetes today, and I will tell you how that came about. How many folks here are working at an organization where you have a plan to do something with Kubernetes? Just about everyone. How many of you know what the business value is that you are going to get out of your Kubernetes? Fewer of you. That is exactly what I want to talk about today.
First, a little background. I am a trained computer scientist. I studied computer science at university, did my undergraduate work at a pragmatic-oriented university in the greater LA area, then worked for a few years and went back to graduate school. I started a PhD and never finished; ABD is All But Dissertation. I went to Indiana University, a research-oriented school, and studied theory of computing and programming languages. That comes into play when I talk about imperative languages and functional languages.
Now I work for Pivotal doing technical product strategy. I spent my first three years or so at Pivotal, since the spinoff, doing Pivotal Cloud Foundry, our PaaS product. I spent about a year doing cloud caching, thinking about caching in a new cloud-native way. For the last two years I have been working on our PKS product, our Kubernetes offering. I have been working in the Kubernetes space for about two years. About a month ago my book came out in print. It was three years of nights, weekends, and vacations, and next week I am on vacation for the first time in three years not writing a book.
When I say I work on product strategy, I mean that I am a technologist. Most of my career I have worked in emerging tech. I am a change junkie and always need to play with the latest, greatest, newest stuff. Emerging tech just for propeller-head stuff is fun, but what really gives me a charge is taking emerging technology like Kubernetes, this beautiful shiny object, and figuring out how we are actually going to generate business value from it. That is what I want to do in this talk.
I am going to talk about imperative versus functional, so I will start with a quick summary. Early in my career I worked on embedded systems, algorithms, and programs that operated single-threaded on a single processor. Then the industry moved to multithreading, then multiple cores or multiple computers. Today we have cloud-native systems: highly distributed programs running in an environment that is constantly changing. That is the essence of cloud-native.
From a programming languages perspective, when we were working on single-processor, single-threaded systems, we could use assembly language, and sometimes we did that to eke out the most performance. Then we used languages like Fortran, C, C++, Java, and now Golang. As the complexity of the computer architecture increased, we were able to do more in our programs by using higher-level languages.
When I went to Indiana University, I programmed in Scheme, a functional language. Then I went back to industry and pretty much left functional programming behind. But these days we are seeing industry use languages like Clojure, Scala, Kotlin, and F#. I believe the reason is the highly distributed nature of our systems, and the fact that we operate in environments where lots of failures occur and we have to adapt to that. That has led to the popularity of these languages.
Imperative programming languages are sequential. We control every step. Their hallmarks are that the programmer controls each step and uses variables. Variables themselves are not the problem; the problem is side-effecting those variables, or mutability. That creates hairy edge cases. Those programs are difficult to parallelize because as soon as you have shared state, you have to decide what to do with it.
Functional programs, by contrast, are declarative. I say what the relationships between things are, but I start to cede some control to the programming language itself. I say, programming language and compiler, you figure out the most optimal execution of this algorithm. There are no side effects, and we tend to do things recursively instead of in a control-loop way. The result is programs that are easier to parallelize, have far fewer edge cases, and can be provably correct. That provably correct part is important because it allows an algorithm to optimize execution.
The real punchline is this: do you want to use an imperative model or a functional model to think through your problem? I am asserting that in today's world, where we have far more distributed systems in a complex, constantly changing environment, the functional model is a better model for taking on the cognitive load and solving those problems.
To make that concrete, imagine a program that starts with a list of six elements. First I apply a filter to keep only the things less than 30, getting rid of 40. Then I filter that list for only the things more than 20, getting rid of one and five. Then I look up the second element and get the value 27. That is very controlled; I controlled every step.
Now think about what the computer could do to optimize this. Instead of processing the whole list in stages, it can take the first element and ask, is it less than 30? Yes. Is it also more than 20? Yes. That becomes the first element in the output array. Then it can take the next element and ask the same questions, and there is the answer. It is far more performant. It depends on things like immutability, and I did not have to process the rest of the array. With six elements it may not matter, but if I am doing human genome sequencing and looking for the first place I have a match, I might save a boatload of processing time. The business value is that I can process my big data set faster and be first to market.
How does that relate to ops and infrastructure? Last year at this forum I said that in systems programming, meaning deployment of applications, we use code. The code we have used in the past includes Bash, Puppet, Chef, and Ansible, things that tend to be more imperative. As systems have become more complex, we have seen systems like Kubernetes, Cloud Foundry, and BOSH take a much more functional approach. Last year I compared scripted deployments to declarative deployments, SSH-ing into a box to immutable infrastructure, long-running contexts to ephemeral containers, and middleware to things like sidecars. Today I will focus on declarative deployments and immutable infrastructure, show demos of how some of this works in Kubernetes, and tie it back to business value.
For declarative deployments, I will start with an example that runs through my book. It is a simple deployment with a couple of microservices: a connection service, a post service, and an aggregator called connections-posts. The connection service knows who follows whom; the post service lists blog posts; the aggregator lets me say, I am Cornelia Davis, who do I follow, and I want to see all the posts from the people I follow. It has databases: a token store for logins and MySQL for blog posts and users. The left side is request-response; the others could be event-driven, but that is not today's talk.
For cloud-native software, we need multiple instances of these services: multiple instances of connection service, post service, and aggregator service. In this example I have seven, four, and five instances. I will spare you the YAML, but there is a declaration of what the topology looks like: I want five instances of this, seven of this, and four of this. It is just declarative. I will show what happens when I tell Kubernetes this is the topology I want.
The Kubernetes cluster is distributed over three availability zones, or failure domains. The question is how to distribute the workloads across them. Are humans the best people to make those decisions? I say no. Let's look at what the machine can do for us.
In the live demo, I have three machines representing three availability zones. The services include MySQL, Redis as a token store, and a Spring Cloud configuration server. My application is not deployed yet. When I deploy the apps, the deployment tells Kubernetes, here are the workloads I want to deploy. The little blue things are containers being spawned. The orange dots represent the services, effectively load balancers: all of the connection-post services, all of the connection services, or all of the post services.
Kubernetes does a pretty good job evenly distributing those workloads across the machines. That is not something I had to worry about, and it should not be my concern anymore. I should not decide where to place those workloads. Of seven instances, two go to one server, three to another, and two to another, across failure domains. The same happens with the other services.
Next I show what happens if an availability zone goes down. I go into the GCP console, into Compute Engine, and carefully kill one of the nodes. Because it is part of an instance group, I delete the instance. That availability zone instance is going away completely.
Kubernetes will reallocate those workloads to the remaining nodes. In the dashboard, the server that is going away still exists briefly, so workloads are still attached to it. But Kubernetes recognizes that it cannot reach those instances because they are starting to shut down. The load balancer nodes automatically update. I do not have to go change load balancer settings somewhere. That is something infrastructure and ops should not have to worry about anymore. As soon as the server goes away, the workloads start getting reallocated back into the remaining machines, as do the load balancers. The dashboard is an open source project called Cockpit that visualizes what is running in Kubernetes.
When that availability zone goes down, do humans get paged? Maybe, maybe not. If the workloads are still operating fine and recover, maybe it is enough for the administrator to come in the morning and see a notification that something went wrong overnight. It should not go unheeded, because it might be a symptom of an underlying problem, but does it mean we need to get woken up in the middle of the night? Probably not.
The business value is that if our systems are resilient and applications remain up, customers are happy. There is business value attached to happy customers. There is also employee engagement and employee happiness. If you do not get woken up at 2:00 in the morning, you are going to be a more engaged employee, and research shows that engaged employees result in a better bottom line.
The second topic is immutable infrastructure. Containers have made this far easier and more efficient. The container is the thing inside the dotted-line box, and because containers are immutable, anything I do inside them is expected to be lost. I can have local state, but only for the single execution coming in.
In the availability-zone demo, the containers did not get moved. The old container got thrown away and a new one was put in its place on a different host. I threw away another one, and it came in; I threw away another one, and it came in. I can quickly recreate those containers. Resilience is one use case for recreating a container or creating a new container.
Another use case is malware. Many breaches we hear about, such as Equifax and others, involve unpatched known vulnerabilities or malware making it into a system and going undetected for months. I have been in the industry for 30 years and remember when keeping a system up for 187 days was awesome, like the number of days since an incident. But if the system has been running for 187 days, that is 187 days that malware might have been sitting there gathering information.
One of the best ways to combat malware is to throw things away and get new ones. Bad actors can plant malware inside a container and on the host, then go away because they are easier to detect than the malware. Instead of holding things stable for 187 days, I can throw out the container and get a new one. If it did not have malware, it still does not have malware; if it did, the malware is gone. We can do that at the container level and at the host level. At Pivotal, we call that repaving: constantly repaving before the roads are completely trashed. I suggest repaving very often, like several times per week. One Pivotal customer repaves every three days; containers never live longer than three days in their environment, and he wants to do it every single day.
For the final demo, I add a node to the Kubernetes cluster, one node per availability zone. It takes less time to add the node than to take it away. Before the demo completes, we can already talk about business value: stronger security posture, keeping customer data safer, no breaches, no negative headlines, and no damage to the brand. These technical elements give us a leg up in realizing business value.
In closing, I tell a snowboarding story. I was a skier for 25 years. When my son was four, I wanted to be the cool mom, so I switched to snowboarding because that is what he was doing. I sprained both wrists and my shoulder before I got it. It is easy, if you are a snowboarder, to turn on your heel. Toe turns, turning on the front end, are really hard, and it took me a long time. But from the moment I got that toe turn, I can tell you exactly what run I was on, exactly what turn I was making, right around a group of trees; after that, I could toe turn. That is what it means to move from imperative thinking to functional thinking. It takes work. It takes spraining your wrists and your shoulder. But once you get it, it is super fun and super valuable.
For the last demo, the cluster is stable and there is a new machine, but no workloads assigned to it. I show a repave. The repave is simply taking one container at a time and saying delete this container, delete this container, delete this container. It does not do exactly what I expected: none of the containers go to the new server, perhaps because the server is not quite up yet. When I practiced earlier, the instances were supposed to get redistributed across the new node. Nevertheless, you can see the repave happening: I am deleting an instance and recreating it, and the instances are getting tied back into the load balancer. Whatever malware was there before is no longer there.
That wraps up the talk. It is really about tying the technology back to business value; that is where things change for our organizations. I suggest that functional languages allow us a new mental model within which to process these things. Thank you for your attention. I will stick around, and if you have questions, I would love to chat.