Where's the Value? What does Kubernetes have to do with DevOps?

Log in to watch

Las Vegas 2023

Where's the Value? What does Kubernetes have to do with DevOps?

Various, sometimes conflicting, almost always confusing definitions of DevOps aside, we all agree that the aims are to minimize (or eliminate!!) friction from the process of getting ideas implemented & pushed out to users, and to support needed software upgrade cycles, all while keeping everything stable & secure. While DevOps is not about a specific tool, contemporary tools are needed, and I argue that Kubernetes has done more to enable DevOps benefits than any tech before it.

In this tech-for-executives session I will explain Kubernetes from the perspective of the DevOps outcomes it supports. Container orchestration is not the value, but the reduced application downtime it enables is. Declarative configuration is not the headline, rather how that allows you to drastically minimize the impact of malware attacks is. I’ll cover the key elements of Kubernetes and tie each to a set of values that your organization is surely aiming for. You will come away from this session understanding the technology in a manner that will allow you to support and guide your organization in delivering a Kubernetes-based offering that delivers on your DevOps agenda.

Chapters

Full transcript

The complete talk, organized by section.

Cornelia Davis

Okay. I am going to start right on time because, for those of you who might know me, I tend to always have more content than I have time for. So I'm going to get going right away.

I'm going to start with this, and I'm going to read this word chart to you out loud. So, of course, like many other people, I asked ChatGPT about Kubernetes. This is what ChatGPT had to say. Bear with me. I'll read it very quickly.

Kubernetes, often abbreviated K8s, and I will use that acronym throughout, is an open source container orchestration platform designed to automate the deployment, scaling, and management of containerized applications. Originally developed by Google, now in the CNCF, it provides a powerful and flexible framework for managing containerized workloads, making it easier to deploy and manage applications at scale.

Ooh, now we're maybe getting somewhere. There's a little bit of a DevOps element there: in a highly available and resilient manner.

Now, I asked a follow-on question around declarative configuration, because this is an architecture talk. By the way, if some of you were among the people who clapped for more engineering talks at the DevOps Enterprise Summit, you've come to the right place, because we are going to build the architecture diagrams.

Please, if you're an executive who's like, "Oh, wait, that's not what I signed up for," hang on, because they are architecture diagrams for you. So do stick with us.

I followed on with a question around declarative configuration, because it didn't mention anything in the summary before, and that's one of the primary architectural tenets. You'll see.

So I asked it, and it said declarative configuration is a fundamental concept in infrastructure as code and configuration management practices. All true.

Now, I also asked ChatGPT about Kubernetes, its main principles, and about DevOps, and asked it to correlate those. It kind of did an alright job. It came up with a list of some of the fundamentals of Kubernetes, which included declarative configuration. It came up with some of the fundamentals of DevOps in terms of automation, rapid releases, and all of those types of things. But it didn't do a good job bringing those together.

That's what I'm hoping to do in this talk today. So ChatGPT certainly leaves an awful lot as an exercise to the reader as we go along. That's what we're going to talk about: where's the value?

I also need to give you a little bit of a backstory. Earlier this year, I participated in another event that Gene puts on, the DevOps Enterprise Forum. There were about 50 of us, and we were learning from each other. There was this running joke throughout the three days where every time somebody was up in front of the room and was talking about a particularly hard problem that we didn't know quite how to solve, some snarky person in the audience would yell out, "Kubernetes!" as the solution.

So we in the room were collectively rolling our eyes at Kubernetes, like, "Kubernetes is not the solution." And myself, and I'll tell you a little bit more about my background being a Kubernetes person, I was like, "But wait, there really is a connection." That's where this talk came from.

My hope is that I will show you that connection as we go through the slides.

A little bit more about me. I'm a computer scientist by background. I did a degree in computer science a long time ago, 30 years ago. I wasn't an operations person until about 10 years ago, when I started working at Pivotal. I was part of the Pivotal spinoff, and I worked on Cloud Foundry.

Cloud Foundry was a developer platform, so I thought, "This is totally my wheelhouse." Then I realized, and this was the early days of DevOps, that there's a huge operational element to development as well. Now I consider myself as much an operations person as a developer person.

The rest of it you can see there. I've been doing web architectures and cloud native. The other thing that I'll mention is that I did enough cloud native where I wrote a book. It's a book. It has code samples and all that stuff. It's architectural patterns that make your software run fine in a constantly changing environment.

So what's my target here? I'm not going to belabor a DevOps definition, but in the very simplest terms, these are the constituents, and this is the scenario that I'm trying to address with any of the solutions that I'm working on.

I want to enable application teams, happy-go-lucky application teams, to be able to just crank out their stuff, be able to release frequently, and be able to operate their applications. Notice I say application teams. I'm not prescribing that developers are necessarily doing their operations, but I am prescribing that the application team is doing the operations for their applications. There are a number of different models that can make that work.

So when I say developer, please hear me say also application team.

We want to allow them to really, really be efficient and crank. But at the same time, I also want to support these other individuals who are responsible for things like security and compliance, for resilience, making sure the infrastructure stays stable, and cost optimization, so that costs don't go crazy and out of control.

We often refer to those people these days as the platform team. What I'm going to talk about is: what's the platform that assists in that? It acts as a bridge across those things.

I'm going to post the slides so you can read all of these things, but these are the categories as I go through those values. Where's the value? Well, the value is in security. It's in resilience. It's in fiscal responsibility. And it also is benefiting the application teams.

Those are the categories of values that I'm going to go through as I take you on this architectural journey. I'm not going to give you a list ahead of time. We'll summarize it at the end.

What's the first value? Well, there's value in keeping your software that's running in production free from vulnerabilities. We've seen lots of talks about that. In fact, we just saw a fantastic keynote from the I Am The Cavalry guy, who was talking about ransomware and vulnerabilities and those types of things.

So there's absolute customer value, like lives of patients and hospitals, and business value in making sure that your software is free from vulnerabilities.

Let's look at how Kubernetes specifically supports this.

I'm going to start with containers. Of course, containers have a number of different layers. I like to categorize it into three different things: the root file system, application dependencies, and then finally the application itself. Of course, containers have many, many more layers.

The cool thing is, I think everybody knows that there are great tools out there to scan those different layers for vulnerabilities. That's the first thing. And that, of course, results in a software bill of materials. But then you have to make sure you take that software bill of materials and vet it against the list of CVEs that are out there. So that's still work that you need to do.

That's a theme I'll come back to over and over again. Sure, Kubernetes does container orchestration, but let's tie this picture more into the Kubernetes architecture in just a moment.

So, of course, we can verify those. We've got these container images, and they're going to sit in a registry. The first thing is that you can point your scanning tools against that registry. But hang on a second. Sure, it's interesting for me to know what the vulnerabilities are of containers that I've built, but I'm really even more interested in knowing about the vulnerabilities in the things that I'm running. And I'm not running anything yet in this picture.

The other thing that we need to be cognizant of is that we need to make sure that we have secure build pipelines so that we know that we're hopefully not sending things into that registry that have known vulnerabilities. We need to have some security there and make sure that other bad actors aren't getting into that supply chain.

Now here's where I'm going to introduce Kubernetes. Kubernetes, of course, is going to run all these containerized workloads. But how do they get there? That's where I want to start talking about things.

Of course, there's an API to send things into Kubernetes. What do you send into Kubernetes? Well, you send in a description of the things that you want running and the context that you want them running in. I'm going to describe my deployment. Those deployments have pointers to the things that are sitting in the registry.

Now, starting to get into the Kubernetes architecture here, there is a state store in Kubernetes, etcd. And there's also, inside of Kubernetes, the implementation of these APIs. The way that APIs are implemented in Kubernetes is they're implemented as eventually consistent engines. They're reconciliation loops that are constantly looking at what we're trying to do and what is actually happening in there.

So I've got an implementation of this API. The API says, "Here's what I want running," and then I have an implementation of that API that is going to make it so. I'll talk more about that in just a moment.

Notice that that API implementation is running inside of the cluster. It's running inside of Kubernetes. I almost said Cloud Foundry. Yes, I spent a lot of time at Cloud Foundry. It's running inside of Kubernetes.

What is it doing? Well, it's pulling from the state store and pulling from the registry, and then it's getting those things running. There is a security benefit right there, because if you set up your Kubernetes environment so that nothing can get into the Kubernetes environment except for the things that the Kubernetes environment is pulling in, now you don't have to worry about bad actors coming from the outside and pushing things into the Kubernetes cluster.

So you see how that pull model is helping me address this concern around vulnerabilities. By the way, it also allows you to now point that scanning software to the actual running containers in the environment, because it can see what's running in the state store.

Now, after I talk about each need, I'm going to have a slide in there that gives me some concrete guidance. It talks in the upper left-hand corner about what it is about Kubernetes that's making this value available to you, and in the lower right-hand corner, some of the things that you need to do. I'll leave that up there. That's kind of the legend for the rest of the talk.

Now I'm already almost halfway into my talk, and I'm going to talk about six of these different things. I'm going to speed up as we go along, because you'll see that some of the architectural elements that I'm talking about serve multiple business cases. We'll be able to speed up as we go once I've described those architectural elements to you the first time.

So that first architectural element is about pulling things into the environment.

Alright, resilience. Ensuring uptime. What does Kubernetes do uniquely to help us in this space of ensuring uptime?

There are a couple of things that I want to point out here. These are either statements made by other industry experts or things that have come out in analyses and reports. I'm going to read this quote to you from Charity Majors. If you don't know who she is, please look her up. She is amazing.

In a recent blog post, and the link is there, she says, "In theory, all software is debug." However, there are lots of things that can chip away at that bodacious goal and make your software less than a mathematically, less than 100% debug.

What I'm saying there is bugs sometimes happen, and they're going to crash your software. It turns out Kubernetes can help with that. I'll talk about that more in just a moment.

Then from a study here, the Enterprise Management Associates reported that 60% of availability and performance errors are the result of misconfigurations. So we're releasing a new version or we're changing some setting somewhere, and all of a sudden things go belly up.

How does Kubernetes help in those two particular cases? What's the magic?

Well, the magic is, I already alluded to it, this declarative configuration. What is declarative configuration? It's back to that picture. I declare what I want, and then there's an engine that says, "Make it so."

Yes, I'm a Trek fan.

But beyond make it so is, "and keep it so." When Jean-Luc Picard says, "Make it so," the whole staff of the Starship Enterprise starts working away to make it so and keep it so. That is exactly what's happening in Kubernetes.

So we had this picture before, and remember that little deployment loop. The reason I have it in a circle is because it's constantly checking. I'm going to fade away some of the details that aren't relevant in this part, and I'm going to show you the details that are important.

The way that that controller, and we call those things controllers or reconciliation loops, works is it's constantly looking at the desired state that's in the state store, and it's constantly watching what's going on inside the cluster. Then it's reconciling those, and it's bringing them in alignment.

I'm spending a little bit of time talking about this because sometimes you hear people say, "Oh, declarative management," and maybe you didn't understand: why is declarative management so important?

It is a foundational element of the Kubernetes architecture, of eventual consistency. I'll tell you that more than 10 years ago, before I worked in Cloud Foundry, Cloud Foundry had this pattern as well. I worked in the EMC CTO office, and maybe two years before the Pivotal spinoff, this concept of eventual consistency started being something that we talked about. I worked in the architecture and emerging technology group. It was nascent. It was very unusual.

This is something that Kubernetes has made ubiquitous. Now, if you're not running on Kubernetes, you're still grappling with this. Automation, which is one of the foundational things for DevOps, if automation isn't eventually consistent, if it's, "Hey, I'm going to make a change and then some automation is going to do something in a runtime environment, and then I'll be done, wipe my hands with it," then you're actually not as resilient as you could be, because there's always something that's going to change.

So that's the first element. If you've got software that crashes, it can bring that application back online automatically. That's one element.

Now the other element that I talked about that affects uptime is configuration changes. The cool thing is that those declarative configurations, because they're not step-by-step instructions, it is a, "This is the state that I want," I can check those into Git. I can have them version controlled so that if that configuration change does happen, I can revert back to the stuff that I knew was already working.

That's another huge value of this architecture of Kubernetes.

Again, here, summary of the Kubernetes mechanisms and some of the other patterns that you need to implement to make this happen. So you don't get this all for free. These are architectural patterns that make these capabilities available to you. You sometimes still have to do work to make them happen.

Alright, now let's talk about infrastructure sprawl. This has, of course, fiscal impacts. There are fiscal benefits to the business. This is also the first time that I'm going to talk a little bit more directly about application team operations.

This comes from the HashiCorp 2023 State of the Cloud. A couple of quotes there: almost all respondents in their survey reported avoidable cloud spend for a variety of different reasons. The two top reasons were that we overprovision our resources, and the second thing was that we have idle and unused resources sitting around.

Now, those aren't one and the same. You'll see as we go along how that drills down.

This is just a chart from another report here that I found, which shows in visual, you know, Black Friday. So here we go. We have to have major capacity. I've been doing this long enough where we didn't have the cloud, and we had large data centers that were hugely overprovisioned for 11 months of the year, only to be right-sized for the 12th month. That overprovisioning is still happening.

The other thing that happens is that developers, or application teams, tend to hang onto their environments, especially people like me who have been around for a while. It used to be hard to get those environments, so we would hang onto them, and then we would forget about them, and you'd have a whole bunch of underutilized resources.

So here in, let's say, a dev/test scenario, we have this: okay, I'm going to have my test environment, I'm going to load my test data, run my automated test. By the way, that load test data thing can be really expensive.

But then here's the key. Notice that at the end, we want those life cycles to include discarding the environment at the end. Now how can I get away with that? We'll see in the architecture diagram.

So remember we were here. We were at this particular diagram. One of the things that Kubernetes has baked in is it has the notion of autoscaling. So in addition to that little deployment controller, there's a controller that's all about scaling.

Now, you as an application team don't have to decide ahead of time what your scale needs are. What you say is, you put some boundaries in place, and you say, "I am going to potentially need capacity up to this level, but I want you, Kubernetes, to right-size it for me."

So part of your declaration in this particular case isn't saying, "I need 25 instances of this service." It is saying, "You know what? Leverage your knowledge about the CPU utilization, the bandwidth, latency, and all of that to make a decision on my behalf on those scaling requirements." That's what the autoscaler does.

The other thing is, not only does Kubernetes allow you to declaratively model your applications, but it allows you to declaratively model everything: your datasets, your network connectivity. So when you need to provision that new environment for your next test run, it is just a matter of saying, "Hey, provision me one of these things. Make it so," and it'll get stood up for you.

That's all available because of declarative configuration and a very rich model of resources that are modeled in Kubernetes.

There again are the callouts.

Alright, one of my favorites is combating malware. What does this look like? Here I've got my running environment, I've got my container running in Kubernetes, and some bad actor comes along and drops some malware in there and then disappears. Because once they've disappeared, it's harder to recognize that there's malware on the system.

We've heard about these breaches, right? We've heard about the breach at Home Depot, where the malware came in through a network where the heating and cooling system was on the same network as their point-of-sale systems. Okay, bad idea.

So the malware got in there. Even worse was the fact that the malware sat there for nine months collecting consumer data. So that was a bigger problem.

How does Kubernetes help with this? Well, Kubernetes allows, remember, you've got all of the definitions, you've got all the container images in the registry, you've got your desired state sitting in the state store of Kubernetes. And I've got that controller, right?

Do I need to write scripts to take care of malware? No. The only script you need to do is you need to say, "You know what? On a periodic basis, just throw that container out." If it had malware on it, yay. It doesn't anymore. If it didn't have malware on it, no harm, no foul. It still doesn't have malware on it.

That deployment controller is going to automatically resurrect it for me.

All of those capabilities are the foundational capabilities in Kubernetes that allow you to set up systems to deliver these business values: getting rid of that malware, getting rid of that ransomware, again, like we heard about this morning.

I'm suggesting that you and your organizations set up these repaves and do it all the time. One of my customers with Cloud Foundry was a guy named Lance. He was working, he still works, at Wells Fargo, and I can talk about it publicly because he did lots himself. He used to repave his entire environment every three days, and his goal was to get it to the point where he was doing it every 24 hours. Huge value.

Again, here it's super simple. Declarative configuration enables this, and you should absolutely implement repaves.

Now another one: operating systems. This picture was a little oversimplified because the operating system is actually sitting in the Kubernetes environment itself.

So what does Kubernetes offer here? Kubernetes has a very extensible API, and one of the most mature extensions to Kubernetes is also sitting in the CNCF, and it's called Cluster API.

What Cluster API allows you to do is it allows you to declare what you want your Kubernetes environment to look like, the Kubernetes itself, and it has a controller, which then, what does it do? It's looking at the desired state. So what's the state of my clusters? It's looking at that desired state in the state store. It's watching what the cluster itself has in it, and it's reconciling those.

By the way, that cluster has defined inside of it the operating system version that's running in that cluster. So if you need to, you update the operating system version in that, and the controller will update itself.

The cool thing about that is, if we go back to malware, remember that picture: malware doesn't just show up in containers. It also shows up in the operating system itself. So of course, as you can imagine, the way this works is we're going to throw away entire Kubernetes nodes, and Kubernetes itself will recover itself free from malware. Pretty damn cool stuff.

Now I'm going to quickly go through this because I'm at time, and I just would like one minute to go over the summary slide.

There's a handful of slides here that are talking about policy management. Really what we're saying here is your policies that you want, you also declare that configuration. You've got controllers that are implementing those policies in your environment, and there are things like OPA that integrate into Kubernetes directly.

These are all of the different business functions, and you can see how they're categorized. They all have some impact on the application. Now notice this is mostly infrastructure, which is going to take me, and by the way, this is the final architectural diagram. So if a few executives in here think you can't do architectural diagrams, you just did. This is it.

But what I need help with is the most important part of my talk. You'll notice that most of what I talked about so far was where Kubernetes was helping the platform engineer, and inadvertently and tangentially helping the application teams on their way.

I want to do more to directly support this application team. So if you have successes, failures, ideas, asks, help me. I'm in a role now where I'm building that application layer. I'm building that set of services. Help me build my backlog. I want to hear from you what you need on top of Kubernetes for your application teams, because I want to build that.

So with that, I thank you very much. I appreciate, and I apologize that I went over by a couple of minutes, but I appreciate your attention. I'll be around the rest of the day, so I look forward to chatting with you. Thanks.