Next Generation Infrastructure for Managers

Log in to watch

Las Vegas 2018

Next Generation Infrastructure for Managers

John Willis

Vice President of DevOps and Digital Practices · SJ Technologies

Next Generation Infrastructure for Managers

Chapters

Full transcript

The complete talk, organized by section.

John Willis

Next generation infrastructure for managers.

So, I've been involved in this conference from day one. I think Gene asked me maybe four years ago to do "Docker for Managers," right? It was about the time where everybody was incredibly confused about Docker, particularly managers. And "manager" has a lot of variation in what it means, but at the time there was really a lot of confusion. That's fine; everybody is kind of okay on the whole Docker thing.

Then at London the conversation came up. Gene kept hearing about service mesh and stuff like that, and he said, "John, do you want to do kind of something similar?" I did it in London, and this is an updated version, so we'll go through it. It's kind of a catchall. I've got 30 minutes, it's like 29 minutes now, to give you as much as I can tell you about the broad stroke of what the ecosystem looks like and what most people are thinking are the primitives that they're using.

That's me. If you know who I am, then you know all this. If you don't know who I am, I've done tons of startups, probably a lot of failed startups, but a couple of successes in the last five or six years. One is a company I sold to Dell, and the other was a company that I created with a couple of friends called SocketPlane. We sold that to Docker, and I was at Docker for about two years almost, I guess. Then I left, and now I'm with a company called SJ Technologies, doing transformational consulting, which has been a lot of fun. I wrote the Handbook and all sorts. I'm on the selection committee and whatnot.

I went back and I've actually authored, if you saw Snover's thing, he was IBM, there used to be a really cool thing called Redbooks. Back in the day, most of you probably never even heard of them, but back in the day it was a big deal. I'd forgotten that I'd written seven Redbooks, so I've written 10 books. This is the one I did about a year ago. I think we're giving this out today or tomorrow maybe. It's audio only, and me and Gene did it. It's very geeky, 11 hours. I would say one Audible credit to beg you to do it, but you're going to get a free one tomorrow afternoon.

I am working on two more projects, and this is the last shameless plug. I'm actually working with Shannon Lietz and James Wickett, who is speaking tomorrow. Do not miss that presentation. Anytime you get to see Shannon Lietz, you go to that presentation. She's incredible. I don't know when it's going to be out, hopefully in the next year. And then this is something even my close friends don't even know about, but at KubeCon I'm doing something with Alan Shimel from devops.com. We're calling it "Digital Anarchist." Think Netflix for geeks. We'll see. I don't know. We may burn and crash all over the place, but it could be very interesting. So, KubeCon.

All right. Spoiler alert, right? Okay, you came here to learn about next generation platform. It's containers and Kubernetes. We're done. I'll see you later, right? Let's go get a beer, right?

But the thing is, it gets really interesting from here. One of the things, I'm not very organized as a person, but when I see things that are disorganized in the way we talk about things, it drives me nuts. First of all, let's talk about Kubernetes for a second. I actually stole this from a Google presentation. Kubernetes is a container management system, and they said, "Oh, no, it's not. Kubernetes is a container management platform. Oh, it's actually not." The truth is, Kubernetes is really this kind of cyber-incredible-scale event loop, and we'll talk about it at the end. In general, Kubernetes is really just an application on this thing. We'll talk a little more about that. If you know what it really is and where the future might be going, it helps you a little bit. That might sound a little obscure right now, but hopefully near the end it'll make more sense.

Here's the thing. Four years ago when I did the Docker manager, everybody was, "Clear. Good. We get it, thanks." Right now, it is the Wild Wild West. It's a mess. It is incredibly messy. And not even the technology side of it. I go into clients and I say, "What type of container are you using?" Notice I don't ask "what Docker." They go, "Docker, idiot." They don't know me. They're like, "Docker, you idiot." I'm like, "Okay, what Docker?" They'll say, "The open source Docker." I'm like, "Well, respectfully, there's no such thing as an open source Docker anymore." "Huh?" "It's Moby." Okay? We'll talk about that later.

Then you get to, "Okay, now can I ask you again what container system you're using?" "Oh, yeah, I just called our guy. It's basically Docker Community Edition." Okay. That's the free one. Is that how you're going to run your bank strategy? Probably not. What are you going to do? I don't know. We're going to wait to see what Google does. It's a mess. And then if you're OpenShift, I don't know if you are right now, but you will be CRI-O. Again, if these things don't make any sense to you, I'm going to cover them all.

The second question I ask is, what container orchestration are you using? Kubernetes, you idiot. And I'm like, "Well, okay, hold on. What's Kubernetes?" There is a whole load of Kubernetes distributions, and it matters which one you might pick or use and where you're going. I guess what drives me probably insane is when I see a presentation or somebody just talks about Docker as if it's a Frisbee or a Coke. In some sense that's okay, and four years ago that was really the only way you described the system. But today I think we have to be a little bit better in having a conversation.

What I'm going to do in this presentation is cover four categories. I want to talk a little about the foundations that are in place, basically OCI and the CNCF. Then we'll talk about the container ecosystem, and I'll try to put some names on it where we just don't say the word Docker. We actually start decoupling what it really means to run containers in a container ecosystem. Then I asked this in London: how many people know or have heard about the service mesh as it applies to Kubernetes? Okay. There were three, and I don't count John. You don't count, buddy. He counts a lot. Watch his presentation Wednesday. But it's a very low number, right? This is really important stuff for you to know about if you think you're going down this path of next generation, which today looks like it's Kubernetes. The distro, not sure. Then if you don't know about that, wait till you hear about the thing that really is what all the cool kids are working on right now, which is called Kubernetes API extensibility. I'll tell you, it's very nascent, but it could be some of the most important stuff we should be paying attention to right now. It could be. I'm going to give you a little bit of flavor of what that is and why it exists.

What's out of scope quickly is no introduction to containers, no introduction to Kubernetes. If you came here for that, there's just a gazillion of those online. I can't talk about storage, network, and ecosystem systems. I've got 22 minutes left. Not that they're not really important discussions; it's just going to be a miracle if I can finish this thing on time covering the other topics. The only thing I'll say to serverless is you've got Lambda, and you've got this new thing called Knative, although that's another nascent thing. I get the sense from some of the people I talk to who I think are way smarter than me that what we have as serverless right now will be completely different in maybe three or five years. That doesn't mean don't go out and use Lambda for a digital property or something like that. But anyway, I'm already talking about it. No, we're not going to talk about it.

So the foundations: there are really two that you need to know about. The OCI, and really OCI is primarily about the standards for container technology. Probably the most important one there is runC. For the most part, that has become the standard. If you think of what happened, Linux containers had been in the kernel for quite a while. In fact, it was a collaboration, IBM, Google, and I forget who else. But what Docker did, their brilliance was, it was really hard to get it in a way that was usable unless you were a geeky developer shop like Heroku or somebody like that. What they did is they put an abstraction around LXC, and then they actually kept building that abstraction. They had something called libcontainer. At some point they felt this was very commodity, and the industry was starting to split up on containers, so they donated what's called runC. For the most part people run runC, and the ownership of that is part of OCI. In our industry, if we don't argue, what are we doing, right? We love to argue, especially when we try to spec things and foundations. One of the big arguments about the image specification and all that is actually starting to get some grounding, where there's still some downstream discussions between Red Hat and Docker about image. But ultimately everybody agreed that the foundation of images are going to work a certain way.

The other project--oh, shoot, I'm missing a slide. Anyway, that's okay. I thought I updated it, but it's not really an important slide. This is CNCF. The CNCF is very important here. They're the primary stewards of Kubernetes. They are Linux Foundation based, which is good. I'm trying to compare the mess of OpenStack, right? The tragedy, the Titanic called OpenStack, right? Sorry if you're running OpenStack right now. Is this happening all over again? For the most part, I don't think it is because CNCF, first, it's Linux Foundation based, and the people that are running this have their head together. They're doing some things like patent control, and if you join, there are certain rules you have to play. So again, follow this. A lot of activity, very interesting stuff, a lot of training. You go to Kubernetes, and there are actually some interesting projects you should know about. I'm not saying the other ones are not important, but containerd, and we'll talk about containerd and why it exists. Oh, Envoy, yes, of course, Envoy. We'll talk heavily about Envoy. But the one I won't spend a whole lot of time on is Jaeger. Then there's OpenZipkin. This is distributed tracing. If you go in greenfield, get your teeth into distributed tracing. I've got a couple of clients that have built head-to-toe composable data center infrastructures, and everything is running Jaeger. The stuff and the value you can get out of distributed tracing is off the chart. And a ton of other projects. That's the foundations. Again, we're running through it quick.

So the container ecosystem. This is the thing where I ask people what they're running, and they say, "Docker." Then we have this kind of decoupling of a conversation. If we want to decouple the conversation, we really should be talking about container runtimes, engines, and orchestration. If we want to get it right, not everything is Docker, and you may actually not be running Docker.

First, the runtimes. Basically, I told you it is runC. That was what was donated. I think most of the different distributions of the engines and whatnot are running runC. runLXD is interesting. It's actually from Alibaba, and what they did is abstracted runC and runV, which is actually a KVM implementation. The other thing they're trying to solve, which is interesting, has been a thorn between Red Hat and Docker forever. The idea originally of a container was it's a single-process or single-PID mindset. You literally start the container, and the first process that's running is the application. Then those knuckleheads, right--I'm sorry, wait a minute--wanted to put systemd in there. Okay, I'm just kidding. I don't know what's best for your source. But it has been a little bit of how do you mitigate, should containers have process control and stuff like that? One of the things that the Alibaba people did is they put a nice lightweight thing that gives you best of both worlds, and I think that's interesting.

Then container engines. If I said it's all Kubernetes, then any discussion about running containers outside of Kubernetes is a moot point, right? If you believe that, then Kubernetes, its interface for running containers, is called Container Runtime Interface, CRI. Under that, basically the flavors of engines are containerd. Docker at some point wound up contributing not only runC, but they contributed their engine, which is their daemon and all this really good stuff. That became something contributed to the CNCF. Today Docker and Google GKE both run under containerd. But Red Hat has gone down another path where they've created something called CRI-O, which is their own implementation. If you have OpenShift, you're most likely to be going down, if you're not already, into the CRI-O path. That's a convergence. At the end of the day, they all run containers and follow the container spec, and they run runC. It's not the end of the world, except there is a little bit of divergence here.

When we talk about Docker, what Docker really has is the open source upstream, something called Moby. What they did is basically they wanted to protect the brand. It was somewhat like the Fedora for Red Hat and stuff like that. They decided they were not going to rip out all this open source contribution. They basically took what was GitHub docker/docker and called it GitHub moby/moby. Is anybody from Docker in the room? Good, I can really talk bad now. It was a terrible idea. It was a terrible idea. But they did it anyway, and then they took the brand to proprietary. Basically you have non-open-source Engine Enterprise and Engine Community. It is the upstream, but I just did a check. I've been checking, is this thing working? The activity on Moby is not--it's existent because it is the upstream for the Docker stuff, but the point is the only contributors, as far as I could tell, are the Docker maintainers that work for Docker. In the past, it was a gazillion people contributing to Docker. What they absolutely did is close the gate. They put the Suez Canal in there. Again, if we think about community and build and how we're doing things, it was a terrible idea.

The cloud ones, just going through these quick so you know what you don't know. There are as-a-service first-base containers. Amazon has ECS, Azure has ACS, and Google has GKE. Interesting about Google is you get Kubernetes and the container thing together. They were doing that long before some of you were born. By the time they were ready to make a service, it was like, "We don't need a container service. It's really just Kubernetes running these container things." ACS, if it's your stack, and then if you followed some of the announcements where Google announced at Google Next about the things they're going to start putting on-prem. In general, there's this migratory path to being able to run both of those as-a-services to a certain extent, or some variant of that, on-prem. We'll see where Amazon ends up in that war.

So orchestration, I said, is Kubernetes. Docker originally had something called Swarm, another terrible idea, where they literally decided at one point they were to combine their engine and their orchestrator together, and then within less than a year later decided to go with Kubernetes anyway. Actually what you do get in the Docker enterprise solution is Swarm and Kubernetes together, but there's really no story for Swarm in my opinion. Although it was a great product, it just got swallowed up. Then you have Mesos, Mesosphere. There's a whole bunch, and I'll show you some of those later when I go to some of the CNCF projects. It's funny: in the Mesos era, a lot of the vendors out there, Docker, Mesosphere, they all were holding out like, "No, this Kubernetes thing's not real." But then one day, "Okay, we're all going to support Kubernetes." I think Mesos might be one of the last ones because, apologies to anybody at Mesosphere if I'm using a little bit of literary license, they were arguing that, "Yeah, that's all great, but we're the only one who do stateful clustering containers." Well, by the way, that's probably the primary problem that service mesh solves today, and certainly the API gateway stuff. I'll talk about that. So that's like, okay, we've lost.

The only thing I do want to mention, this was not on my list even at the beginning of the year: HashiCorp is just that kid. I met him like 10, eight, nine years ago, Mitchell Hashimoto. Nomad is an interesting story. First off, it's incredibly easy to implement, and it can run your containers just fine. It's very lightweight. If you're doing ephemeral batch and you want to build something that's going to go away, do you really want to build a very complex Kubernetes cluster? That's one. Recently, actually it's kind of already public, one of the guys I know that's building Samsung's version of Siri, the whole data center is built on Nomad, Vault, and Terraform. It's completely composable, and he cycles search by the hour. He could build a data center, like a Siri-like data center, in 30 minutes. So I'm like, okay, better start paying more attention to this thing called Nomad. Because the beauty is, Terraform's a great product, Vault's a great product, and if you want to run those two really well, guess who's the really cool solution? It's Nomad. In general, I will stick with Kubernetes and containers; that is where all the mindshare is right now. I just think you might want to put Nomad on your radar.

There are a lot of ways too. Those are a couple of distributions. I'll list a whole bunch more in a minute. There's a classic Kelsey Hightower blog article called "Kubernetes the Hard Way." If you want to really learn how Kubernetes works, it's a GitHub project. Some people tell me, I'll say, "What Kubernetes distribution?" and they say, "the hard way." We get it from GitHub, we manage it, we build it. There are all sorts of risk-reward discussions we can have about distributions of Kubernetes, who add a lot of enterprise stuff, but some of it is actually really not open, and then others where you're going to have to do a lot of work yourself, but you have a complete path to openness. There's a little bit of mitigation there. Here's the list: there's like 42 certified Kubernetes from the CNCF perspective. You can see Canonical, CIS; I mean, everybody's got one. Pizza Hut, I think, has one. That was a joke. Come on. Docker, Google. Heptio is interesting. They are some of the original developers of Kubernetes. They call it kind of an un-distribution. It's an interesting thing to look at. Of course, you've got Mesosphere, Red Hat. These are the ones that I've kind of played with, not so much that they're the best on the list, but they're ones I've played with. Then orchestration, like people run Kubernetes as a service. You've got Amazon EKS, Azure, and GKE on Google. Again, we're just kind of giving a landscape survey.

So let's talk about service mesh, which I think six people answered that they'd heard of it, so I've got to go probably a little quicker now. The service mesh has a broader definition, but in the context today as we talk about Kubernetes, we talk about it as an infrastructure layer for service-to-service communication. It gives us the ability to have lightweight proxies through deployment. Ultimately what it does, it's basically a proxy, and I'll show you a little more detail here. The idea is, if you're going to run these, they're called pods if you don't know, you'll run containers in a pod and they're ultimately clusters. What you do is put another container in there that is designed conceptually as something called the service mesh. From there, it sees all ingress and egress from that. In it, there are rules and constructs that allow you to do observability, monitoring, traffic control, load balancing, service discovery, resilience, and we'll see some of this here in a minute.

It's based on a very software-defined architecture. If you know anything about software-defined networking, that's layer three. It has a control plane and data plane. Data plane is the packets. Control plane is how you abstract the intelligence out of those packets. This is layer seven. This is proxy-based, but it's got the same architecture at layer seven. The data plane basically is really the proxy itself, and the control plane is all the policy and the metadata to make it work.

This is where we introduce the word Istio. How many people have heard of Istio? Oh, so a lot more. That's interesting. Maybe I'm asking the question the wrong way. Still probably less than only a third of the room, if that. Here's the deal. The way this has played out is there's an open source technology that was developed by Lyft. They had a problem with scale, and they decided that NGINX couldn't solve their problem, and it wasn't built in a way they needed to run containers and Kubernetes and all that stuff. So they wrote their own kind of proxy called Envoy. Today, basically, the data plane is this thing called Envoy. In fact, at KubeCon they actually have an Envoy Day. It is really where all the work is going on: service discovery, load balancing, TLS termination. It has built-in circuit breaker patterns. It has deployment strategies like staged rollouts, and you can augment it with canarying and stuff like that. It also does fault injection, and it's got chaos, kind of Chaos Monkey-ish things in it.

On the right side, you have the control plane, which really is just all the policy. At the end of the day, Envoy was this thing written by Lyft. They didn't really create a GUI or a way to manage it. Google basically defined this thing called Istio, and think of it in a sense as configuration management for Envoy. Although NGINX now is trying to create, or already has their own version of, a data-plane proxy. But right now Envoy is taking all the oxygen. All right, so we just finished the first three subjects. Again, we're racing through a lot of stuff.

The last piece is where all the cool kids are hanging out today. Sometimes the cool kids are hanging out someplace where you're like, "Who cares?" and sometimes they're hanging out in a place where we might actually want to care. I'll go through this reasonably quick. It's very complex, it's very nascent. It's a really small set of people that actually can do this right now, it's so complicated. But it might have incredible impact if the theory is that Kubernetes might be the 10-year winner from a platform structure. There are a lot of people that believe that. Now, there's a big "if" there. We might be running all our clusters for the next--some people put maybe Kubernetes could be that abstraction of the kernel that just may be part of our way. In a world where everything changes in three years, that does sound kind of silly, but I'm buying into it for now.

If that's the case, then getting a jump start, at least understanding where this stuff is going. By the way, the people who are going to give you the best intel on this are your vendors, because they're all doing this right now. Any of your vendors that are playing around with Kubernetes, you could pull them in to help you understand more how this works, because they know and they're learning how to do it. Joseph Jacks, Gene actually put a thing on him this morning. He's doing an open source fund, but he's been heavily into Kubernetes. In his quote at the beginning of this year, he said, "All complex software delivered as a service or behind a firewall should be implemented as a set of Kubernetes API extensions and controllers. Radical efficiencies will abound." So his notion is, if you're SAP or you're Workday or whatever, and the world is going to be clusters on Kubernetes, then this is an event loop that you can sit on. Maybe you should.

If you're doing greenfield development right now on some new project and you think you're going to be on Kubernetes, this would be a nice time to do a little bit of investment on it. I'm not telling you it works for you or it doesn't work for you. I am telling you that I have enough confidence that if you think Kubernetes is going to be in your ballpark as something that's going to be around for the next--and I think when we say next generation infrastructure right now, most people would tell you it's containers and Kubernetes--then the question is, should you get into this game way before anybody else is? Maybe not be an expert, maybe not re-architect stuff, but literally just start some POCs and projects to do some research on it.

The Kubernetes API, basically--I'm going to skip this slide because I want to go to this. Basically, the documentation is confusing as all get out. There are what you can call custom resources and aggregators, but custom resources are really the most important thing. Let me see something, make sure I... I must have scooted around with my slides too much because I wanted to... yeah, I'm missing a very important slide. Darn it. Oh, there it is. Custom resource and control. The aggregator is interesting, but right now this is probably the most interesting. You might hear them called CRDs, custom resource definitions. Basically, when you want to create your own--I know I'm sounding confusing here. If you think about it, Kubernetes has some core resources, like nodes and replica sets and services and pods. Really all you're doing is defining your own resource controllers, which is the execution logic for what you want to do, and then the resource definition. A really simple example might be you want a stateful MySQL database in a cluster. You might want to create that as a custom resource. In fact, Oracle has already done this and has a sample.

Here's why. Remember I told you earlier that Kubernetes is this thing that actually is much bigger than Kubernetes? It's basically an API that sits on a control loop that sees basically all ingress and egress, inbound and outbound traffic for everything that happens on a cluster in Kubernetes at a millisecond level, and it's Google saying it scales at Google scale. I mean, if anything I'm going to run on there, I could get in an API and sit on an event loop and be able to do anything I want from an operation and observability. I hope some of you get the picture. I think go back to Joseph Jacks: radical efficiencies abound. I agree with him on this.

I will say it's very nascent. It changes very quickly. The documentation changes quite quickly. Because what you have then is you have to create a custom resource. I talked about that. These create the custom rules, basically, in the events and what you want to do. Do you want to monitor? Do you want to change a cluster? Do you want to scale? Do you want to autoscale, change pods? Basically, one of the primary benefits of doing this is giving you the ability to do stateful applications. Here are some of the examples that are out there, but there's a lot more. Also, if you want to see all the community activity going on here, it's just insane. You can list all the people that are actually pretty active in this custom resource thing.

The last thing I want to say, which I'm going to steal about 30 seconds, is this is a movie from 1966. It's actually French, but there are subtitles, called "King of Hearts." Basically, it's the metaphor: the inmates are running the asylum. So now I've told you Kubernetes and all that stuff is the shit. I'll tell you right now, I'm scared to death that the people who are making all these decisions for our industry might be the inmates in the asylum. They're young kids that are brilliant, that are moving incredibly fast. Things are changing really fast. So I don't know what the right answer is, but we need to put the temperature gauge on and try to figure out how do we run a bank when every three months there's a bunch of smart kids adding all these new extensions. Anyway, thank you so much.