Docker for Managers
Containers are all the buzz today. In this session we will look at the history of containerization and explain why there is so much discussion and activity around this model that has been around for years.
We will look at the extended echo system of container which will include Docker as well as some other interesting alternatives. We will also demystify the subject matter of orchestration of containers looking at popular models such as Docker’s Swarm, Google’s Kubernetes and Mesosphere.
Finally we will expand on some popular use cases and discuss the future of this phenomenon.
No prior experience in necessary to attend this session.
Chapters
Full transcript
The complete talk, organized by section.
John Willis
All right. So before we begin, we have to perform a ritual. We must all say, "Docker, Docker, Docker." Everybody ready?
I'm not kidding now. All right. One, two, three.
Docker, Docker, Docker. Okay, now we can begin.
All right, so my name is John Willis. I've recently been moved into what's called Director of Ecosystem Development, which is really just a fancy way of saying we're in BD. For those who don't know me, I go by a Twitter handle called @botchagalupe. I'm not as smart as Jez Humble. I would come up with a way to game you to become a follower. I'm going to figure that out next time. So that was pretty clever of Jez this morning.
But anyway, 35 years in IT. I've kind of been through it all. I say I'm an IT ops junkie because I love IT operations and infrastructure. I was fortunate to start off at mainframes at Exxon. I'm skipping a whole bunch of stuff.
And then I was an early cloud evangelist for the first public cloud, which was at Canonical. It was called Ubuntu Enterprise Cloud, based on Eucalyptus. I was an evangelist for that. And then I was early in at Chef, helped build their customer-facing business. And up to that point, I was about 10 failed startups over 30 years. My wife still stuck with me. Good thing she did, because the last two years, the startup gods have been very nice to me. I sold a company to Dell called Enstratius, and seven months ago, I sold a company called SocketPlane to Docker. So now I work at Docker.
I left that in. It was my 30th DevOpsDays. This is my second DevOps Enterprise Summit, so maybe in 30 years from now, I'll have 30 on that. But DevOps Cafe, we interview amazing people. Jeffrey Snover in the front row, I've interviewed him if you want to hear an amazing... But we have a lot of other interesting people.
And I am one of the organizers, so if you got rejected, blame Damon.
So, DevOps Cafe, and I am one of the co-authors of this handbook that someday I think actually will get released. I'm not sure. It's four years working we've been going, so I don't believe Gene. That will be out in, I guess, March now. Good.
All right. So what I thought I'd do is, the idea of this presentation was to not sell you Docker. I was hoping, when I talked to Gene about this, is last year at the conference, one of our goals was to get more... A lot of people said at the conference last year, "I wish my manager would've came." And I think one of our goals this year was to try to make sure that kind of happened. In our postmortem, we'll find out if that happened.
And I was being presumptive, thinking maybe there'll be more managers, and in between all this head-bursting stuff that they might be hearing, they might say, "Oh, yeah, I've been hearing about this crazy Docker thing. Let me try to find out why everybody's screaming and hollering about it." So this is, I think if I'm successful, why is everybody crazy about Docker all of a sudden?
And so it starts with some real basic stuff about different types of virtualization. Most of you probably know type one, which is really there is no operating system. We'll come back to why that might be important later for a lot of different things. Then type two is where the hypervisor exists on top of an OS, right? So your usual suspects there. I think KVM is probably important from an infrastructure. Most people who run OpenStack private, they run KVM.
A lot of talk about Amazon hiring all the KVM talent. I was just over in China last week, and the Alibaba folks, AliCloud, they're building a serious cloud over there, folks. All KVM-based, so something going on there.
And then the real juice behind all this really is, when we talk about Docker, we'll talk about what some people refer to as OS-level virtualization, where it is a process. So there is not just an OS like type two, but the virtualization is actually a process. It's not a virtual machine, if you will.
A lot of confusion in how we distinguish what compute means today. Not easy. I started doing this, I got myself in big trouble, right? This list started with like four, and then during lunch, I talked to some more people, and I added like five more, and then I was talking to Jeffrey, I added a couple more. And so the accuracy is about as accurate as Wikipedia. Some of them I actually experienced, some of them I didn't. So if you have a beef with some of the dates, take it up with Wikipedia.
But I will tell you that I actually started in this business working on an IBM 370. So the first form of probably documented virtualization was by IBM, the IBM 360. A lot of mass accounting machines. The 370 was really the most interesting one. This was where most people say hardware virtualization, right? IBM had to figure out how to emulate other operating systems, or actually other hardware systems. So they actually created this virtualization model of emulation for these old accounting systems.
370 got really interesting because it actually started introducing this kind of logical partitioning model, and we'll come back to that.
I think most people probably know about chroot. It's interesting when you think about the history. chroot was really quite simple: the ability to create a process and change the root file system. All right, so it happened somewhere in '79, '82. There's some interesting research on why it happened, what it was used for. Okay, some people found some nice vectors.
I think what would be really interesting is to get a graph of how all these things interconnect, which I didn't have time to do.
VMware. I remember teaching a Tivoli class around 1998, actually, somewhere where I was teaching Tivoli, a class, and this one student showed me his laptop and showed me this thing where he could run multiple operating systems on his laptop, and I was like, "That's going to change training forever." And VMware. It was VMware.
And then you had BSD Jails. BSD Jails was kind of your first kind of real OS-level virtualization, if you will. It kind of combined a few things.
The only reason I added type one, type two, and OS-level in the timeline, hoping this would be interesting, because when I did research, you could easily find a timeline for regular virtualization. You can now find lots of timelines for OS-level virtualization. I didn't really find anyone that combined a lot of these things. So I think there's an interesting intersection of how things came out and grew. What happened when.
I think Xen, of course, over in Cambridge. What happened over there? Ian Pratt and Simon Crosby now have an interesting startup called Bromium, which actually, this is why I run out of time, but that might be something you want to look at.
But Solaris Zones. I did some mainframe stuff, but the mainframe virtualization was so kind of deeply embedded that you didn't see it from anybody who interacted with it. It was there. It was like oxygen.
But what was amazing for Zones, for me, actually, my first Zone was interesting because I actually ran a monitoring system on a bunch of AIX machines, and I went over to, and it seemed like in London is where everybody was buying these large Solaris boxes, and we used to do little dump maps of the amount of processes that were on a machine. And a really active AIX machine, maybe you'd get 100, and you'd kind of monitor. And we threw this monitor on one of these Sun 10Ks, and I think that it was like 15,000 processes. I'm like, "What in the shit type operating system is this?" And then I come to learn that there's a model of a global zone and local zone, and they'd built a very interesting kind of, off of the FreeBSD Jails model, of delivering OS-level virtualization.
Again, being able to treat virtualization as sort of a Linux process. There's some beauty in that, and we'll get there.
OpenVZ. Anybody see Jody Mulkey's presentation yesterday? Yes? No? I love Jody. Actually, I met Jody because I was running around talking about private clouds, and he said, "Can you please come into my shop at Jobzilla and evaluate our cloud?" And they'd completely built it on OpenVZ, and it was an amazing cloud at the time. It was way better than Eucalyptus at the time.
Of course, Amazon Web Services comes out. I put in Btrfs, AUFS, these union file systems, because these are interesting stories in the timeline when it comes to Docker. Because a lot of this really is just kind of leading up to how we get to Docker. Some of the virtualization has nothing to do, but the OS-level stuff does. So we'll come back to that.
Namespaces are important. At the time, I don't know as much Linux history as I should. I get the feeling that namespaces were one of these things that's just waiting for something magical to happen. The ability that you could isolate network or user or PIDs and things like that. And again, not to sound like a Docker fanboy, which I actually am, but I'm going to try not to sound like it. The namespaces became amazing opportunity for Docker, along with, actually, the union file system.
So people say, "Well, haven't people been doing Linux since, let's see there, 2000?" Or let's go all the way, move up to 2007, where Google got pretty interested in this, I guess, at least from what Wikipedia says. But everybody says Google's been doing containers for about 10 years. In fact, about a year ago, I saw a presentation from Google. I'm talking really fast because I only have 30 minutes. I heard a presentation from Google. They said that they run 2.2 billion containers a week. 2.2 billion containers a week.
And then you say, "Well, that's just Google, John." And then I ran into the guys from Yelp a few weeks later. They run 15 containers a second. So this is web scale for everybody, folks.
So cgroups. cgroups allowed people to put a set of process and put resource constraints on them. Your classic resources. All right, that gets interesting.
KVM, interesting. IBM finally wakes up and figures out how to put that stuff they had in like 1970 into AIX. That's kind of fun, I guess, for IBM.
Just talking to Jeffrey before we got started, I think there was some interesting research. Actually, you can do a lot of fun research on early Microsoft stories, even CAP theorem. Again, here I go to right field, but if you actually wanted to go deep dive on what everybody talks about, CAP theorem and distributed things, like all this fun distributed database stuff. Actually, some of the original research was done at Microsoft, which is kind of interesting. But anyway, so I digress.
Except Hyper-V in 2008. And then Linux containers. All right, so now the story gets really interesting. And it sounds like IBM and Google have this passion: let's get this thing in the kernel.
Parallels actually along the way have done some really cool stuff with OS-level virtualization. They're called Odin now. They've renamed themselves.
And if you see there, there's a lot of things that are sitting around, and then the two questions that I get about Docker, one is always either, A, "But John, we've been doing this forever." And I used to get that with cloud, too. Like, "Oh, I've been doing cloud for 40 years." Well, did you work on an IBM mainframe 370, buddy? No. All right, shut up. Because I did.
But I don't do that with containers to people, and I don't say shut up to people too often either, but why? How come in 2008... Well, it's 2015. If my math's right, it's seven years. Why all of a sudden? And arguably, it was around in FreeBSD 2000. And I did run into people who said, "Oh, yeah." In fact, when I was first selling Chef, I remember it was a really large company told us, "We can't run Chef until you support BSD Jails." I'm like, "I have no idea what you're talking about, bud."
But the question is why? And then I think I'm going to get into that.
So Docker starts out as a competitor to Heroku and Engine Yard. That's a race to the bottom. There's no money in public PaaS at that time. The founder, brilliantly, Solomon, realizes they've got some really incredible IP. Why don't we open source it?
And oh, by the way, the dirty little secret at the time was all the PaaSes were using containers. So the rest of us mopes, we didn't know that, right? And in fact, the only reason I know, a friend of mine, Adrian Cole, told me, "Hey, you need to talk to these guys." And I'm like, "Why? Public PaaSes are about as boring as it gets in 2000, basically, '12." And he's, "Oh no, no. It's way more interesting because it's actually what CloudBees and Heroku, and they've all built this really cool infrastructure based on containers."
And so I'll tell you more of that. And then we introduced Docker in 2013, and then CoreOS introduced rkt. There's a lot of variable. There's a longer list here.
But then, so okay, let's forget about virtualization. I would say for me, I'm bought into OS-level virtualization. Why? Well, it provisions in milliseconds, right? So I'll go into some of the more poignant use cases in a little bit.
But in general, just OS-level virtualization, whether it's Linux containers or it's Docker or it's rkt and any other variant, you provision in milliseconds. Depending on what other types of tools are built around it, it could be 100 milliseconds, it could be 400 milliseconds, right? But still, 400 milliseconds, even with a bunch of things around it like network configuration and namespaces and all that, it's pretty damn fast. Because I will tell you, on a great day, your best virtualization is going to be a minute, minute and a half, maybe two minutes, right?
Most people will say it's wire speed. You could argue, is a process wire speed? I don't know. But if you think a process is wire speed, then OS-level virtualization is wire speed.
Is everybody with me when I'm talking about a compute model that... If you listen to Spear, raise your hand. I don't get it. If you don't get it now, it's fundamental to ask that question, what I'm talking about here. I'm talking about running a virtual instance on a process that looks like a real VM. And VMs actually look like real hardware, right?
So if you don't get that, obviously come to me later, but now's a good time to shout out. Hopefully, I made the point.
So they're VMware-like. That's the thing about what really gets cool about when you start running Docker anytime, or container. It's like when you actually, depending on the flags you set, you actually go in a shell and you're like, "Wow, this is really cool." In Docker, you say, "docker run," and you set up a flag and you're in this shell, and it's like, this is another compute. But am I still in that old compute? Did I SSH in? What happened? And Bryan Cantrill, the author of DTrace, says, "Containers are like up is down, left is right." I tend to agree.
They're lightweight, and I'll show you some visuals of that in a little bit. You get this opportunity because they are process-oriented. The cleverness of this is we're sharing a kernel with the host operating system. So we can create a virtual compute model, or a compute resource, where we don't have to replicate the operating system for every one. That adds speed, it adds memory density. It gives just all these things. And because the brilliance of getting LXC into the kernel just made it really interesting.
I have 15 on that thing, but so I'm going to cut...
Paper? Yeah. Well, that... All right. Yeah. All right. Well, we're going to have to negotiate then because I've been watching. And I got 15 on mine, too. So two versus one.
Anyway, the interesting part about this picture here is you're looking at type one, type two. Both of those, you're going to replicate the OS, right? So you start seeing this kind of incredible density opportunity.
This is actually from an IBM white paper, or a paper that was done. Really good stuff in there. You could just spend a lot of time just reading the details of this paper. But the point here is you're running processes and you're sharing the kernel, and you have your speed, your density.
Early on, I would say you could get a 10-to-one density. I've had people tell me they get 30, 40% density buyback from going from virtual to containers. Large telcos, really. So that's quite interesting.
So then, okay, all that sounds great, John. That sounds we've got a whole bunch of cool stuff going on here with Linux containers. Why Docker?
Well, Docker, you get the Linux container isolation, you get the lightweight, you get speed. I will say where that ends then is prior to Docker, to put Linux containers on multiple platforms was very hard. I tried. If you want to put it on Amazon versus this, versus that, you had to worry about the kernel. There was a whole lot of work to do. And even when you were done, there was a whole bunch of missing things that unless you knew, you had to add like a Linux bridge, and you needed to go ahead and set up some file system and some bind mounting to the host. It was just a tremendous amount of stuff.
When Docker came out, and I first used Docker basically about two and a half years ago, the README says apt-get install two programs and says docker run, and you are off to the races. And that changed a lot because it was hard. It was hard just to get Linux containers to run on one platform. But then I need to replicate it on, say, Amazon, or then on VirtualBox on my machine. Literally that model works consistently across basically any supported platform.
But here's where it gets really crazy. Remember I listed Btrfs or AUFS? One of the brilliant things that the Docker founders did is not only did they add namespaces and cgroups and isolation and network connectivity, like, hey, if I'm going to really use this, I'm actually going to need a bridge to talk to. I'm going to need IP addresses. I'm going to need all that stuff. They did all that.
But there's two things I think make Docker very adopted. One is speed, but you get that from Linux container. But the second part is they added a union file system. And the union file system created this... I think it was unintentional. I don't think they really saw the behavior pattern that they were going to create. And the behavior pattern was just like what Chef and Puppet created with sharing cookbooks. This was that on steroids.
Because now you actually created binary artifacts and were layering by architecture. So if I wanted to build something on a Canonical-based Debian image, there was an image out there I could pull, I could build my copy of it, and then I could then say, "You know what? I'm going to add my stuff onto this, and I'm going to distribute to the 15 groups in my organization," and they could add the layering on theirs.
So the sharing of these binary artifacts has now become what you call Docker Hub. And we see, very much like in the early days of Chef, the thing I loved about Chef in the early days and still today is if I needed to install Hadoop or Nginx or this, I literally went to the repository and I pulled it down, I ran it, and I had things installed. And now, I would say that this is even better because it's an immutable model. It's now installed. It's installed bit for bit. So when I run it over here and I run it over here, I am getting the same binary bits that I get and the same thing.
We've got thousands. In fact, I'll show you a metric later. In October, we just hit our billionth download. So in two and a half years, we get our billionth download of images from Docker Hub. Taking nothing away from Facebook, and they're awesome. Their IT infrastructure is as amazing as you get. They took five years to get to a billion users. Not the same thing, but a billion downloads. So there you go right there.
The numbers are off the chart, but I think the container download in October, we hit our billionth download. I don't even think that's as amazing as... This is a hockey stick story. In January of this year, we had 100 million downloads of the Docker engine. January of this year. At DockerCon, we had 500 million downloads. That's a hockey stick, folks. I don't even know what it is now, of the engine.
There's some footprint stuff. This will go up. We're going to have to negotiate our time, so I'm going to skip this slide. And that should have been '16. Sorry. Crossed it out tight. That's what you get for writing a deck on a plane ride coming back from Singapore.
The 2016, this gets really interesting now. Microsoft has basically created a Docker API-compatible container implementation that will be shipped to Windows Server 2016. And it is completely API-compatible. That means our Docker client will be able to sit here and manage both Linux and Windows containers. It means that Windows will now be able to participate in a workflow very similar to most of the amazing stories we've heard over the last couple of days.
I think this is a linchpin for phenomenal growth and opportunity in Windows. And there's something called Windows Server Container, and then you can run containers under the Hyper-V architecture. That's going to get really exciting.
The couple more things, there are some open initiatives, something called the OCI, the Open Container Initiative. You can see some of the people that are involved with it. IBM, Oracle, just about anybody you can think of. Of course, us.
And so we have a thing called runC, which is our open source contribution to the standard of how to run containers. We also have, obviously, Docker Engine.
The other big thing is... So I've told you a little bit about the history, where we got to, why Docker has some uniqueness to it. And then we talked a little bit about some standards that you should understand. It's nice. Docker's not going to own the world by themselves. We're playing nice with everybody.
And then we have the Cloud Native Computing Foundation, which really, in general, is going to address the hardest problem we have right now, which is orchestration and orchestration of containers. Not really a fully solved problem.
We have a solution called Docker Swarm. That's what we do. It is an orchestration tool. It allows you to start thinking about how do you run containers across multiple hosts, how do you create connectivity through IP addresses and networking and all the other things that would require multi-host connectivity.
Mesosphere are pretty smart dudes and women over there that are doing some amazing stuff. Kubernetes is what all the cool kids are using today. Kubernetes comes from Google. Pretty amazing stuff out of there. The theory goes that Google's been doing this for 10 years, and now they're starting to open source some of their stuff. Kubernetes is a good example.
I list this Nomad, which is interesting. How many people have heard of HashiCorp? A couple. How many people have heard of Vagrant? Well, HashiCorp are the people that make Vagrant. They make a lot of other stuff. They just announced this... They're like this, "We're going to give you something free, cheap, and easy." I call them the Atlassian of DevOps, really. And I'm sure the Atlassian ones would get furious next day with that word DevOps too.
But they're creating little things that are easy to implement that people love, and they're hitting. Every one they offer is a hit. Vagrant's a hit. Consul's a hit. You just go down the list and now they've kind of jumped into the orchestration game. And most people would laugh and say, "Oh, come on kids, you're not going to be able to play with Kubernetes." But they haven't lost the battle yet. So I would take a hard look at Nomad.
And then I told you I was acquired by Docker. What we were doing is SDN. Actually, our Open vSwitch integration with Docker, SDN stuff for Docker. Our vision was that containers were not going to scale. If you start putting 10,000 containers on a host, that might go with really short time-to-live compute model. Networking is going to... Linux bridge ain't going to work.
So we started down that path. When we got acquired, we took on a much bigger project. We built something called libnetwork, which is an open source project by itself, part of the Docker thing, that is the foundation for scalable network with containers. It's an open source decoupled... You don't even have to use it with Docker. It works perfectly well with Docker. It has a plug-in architecture. Companies right now like Cisco, VMware with NSX, Juniper, Arista, are all working on plug-ins in this pluggable architecture. You should take a look at it. It's an open source project, a lot of stuff.
As I wind down, tomorrow, I'm doing a presentation with Josh Corman. All right, so we've got into the Docker's good, everybody loves Docker. I think you get the speed and all that. Why would I care?
And tomorrow, I'll spend a little more time on this with Josh, where I'm going to talk about what I call immutable delivery. But what you find is, and this is an excellent book written about Toyota supply chain, actually pure Deming stuff. And we talked about the velocity. You get this incredible speed.
So developers can sit on their laptop in a VirtualBox environment and build a service stack of five or six machines, little compute instances. Four or five of them might be owned by other people. That's okay, microservices. And then I've got mine. I do my testing, I find out something's wrong. I need to rebuild the environment.
Damon Edwards and myself do the podcast. We get this common thing that once they move to Docker, the developers get frustrated when they can't converge and build a new service stack in less than two seconds. They get mad. Right? And think about how you'd have to do that with infrastructure as code and virtualization, and it's a cup of coffee, it's a phone home, it's a maybe go out to Starbucks, right? It's a context switch. So the velocity.
I love the variation. I'm not going to have enough time to tell you this story. I'll have a little more time tomorrow. But the variation is, to me, the beauty of the immutability is that at scale, if I think I'm going to get to this Yelp status of 15 containers, that means I'm going to have a lot of containers. And as I deliver, how I build things is going to matter.
There's a famous paper in 2002 that said order matters, and what it really means is variation is going to matter. And if I build my infrastructure incrementally at every stage of the pipeline... And again, this isn't a knock on infrastructure as code. It's just a point I want to make. And that is, if I build my infrastructure through an infrastructure as code model on my desktop and I build it up, and then I move it into CI and I build it through infrastructure as code, and then I go to production infrastructure, there's a form of entropy there. If I switch platforms along the way, if I go from AMI to VMDK to bare metal, there's a form of entropy.
I will tell you that the people who are doing immutable delivery models, like Gilt, and Yelp, and a whole bunch of others right now, are creating immutable binaries on the laptop. If they go green through the CI process, they're immutable. If they hit production, they're the same exact bits.
And one last quote I think I'll make before I end here is, I said Java lied. Java said build once, run anywhere. What they didn't tell you is you had to worry about the runtime environment. You had to worry about the framework of where it was delivered.
This is not a lie. Your kernel, your middleware, and your application are all going to be bundled and can be tested with their interdependencies on your laptop. There could be some latency variation, but there's other ways to figure that out. When you throw that into a CI situation, it's the same immutable binaries, and if you put it in production, it's going to be the same. At scale, that will make a difference.
This paper, which is a mathematical piece that will prove it, will tell you there are some gaps. Not everything's perfect. It's still early. Networking still needs a lot of work to go. We need to skill up on this way of delivery and models.
And my name is John.Willis@docker.com and I'm done.