Log in to watch

Log in or create a free account to watch this video.

Log in
San Francisco 2014
Share
Download slides

DevOps Road Blocks

DevOps Road Blocks

Chapters

Full transcript

The complete talk, organized by section.

John Willis

I've got to tell my Gene Kim story really quick.

About four and a half years ago, we ran the first DevOpsDays here in Mountain View. It was the first one in the U.S. I was at the original one in Ghent, which actually is next week.

So I'm sitting on a panel, and one of the jokers, one of my friends, makes fun about how old I am. Gene, who I didn't know it was Gene at the time, says, "Oh, you don't look that old." I'm like, "Well, thank you, buddy."

I get off the panel and Damon Edwards, who you saw yesterday, comes up to me. "You know who that was?" I'm like, "Panel guy number three?" And he's like, "No, that's Gene Kim."

"Like the Visible Ops dude? Holy shit, I've got to get his autograph." Sorry, I'll try not to curse, but...

Good luck, John. Yeah, good luck. Thank you there. You guys know me. The right-hand side over there: troublemakers.

Anyway, DevOps blind spots.

I've got one quick plug. I just did a startup. I'll talk about it in a minute, just for a second, but it's SDN for Docker. That sounds obnoxious, but if you actually want to learn about it, give me a holler. Botchagalupe is the best place to get ahold of me.

I was on kind of the organizing committee for DevOpsDays, so we're good to go here.

I'm going to give a quick run-through of my life. Actually, I spent five decades in this crazy muck we call IT. In high school, we actually had a mainframe, an IBM mainframe. Imagine that. It was like 1978, and I actually hoofed tapes in high school.

How many people have heard of the product called TMON/MVS or TMON/CICS? One. Just two. That's it? Really? Three. Okay. Yeah. Showing your age. Thank God the people aren't raising their hand.

I was actually one of the three authors on that product. I've done a lot of stuff. I spent a lot of years in Tivoli, doing Tivoli stuff. I wrote seven Redbooks for IBM with Tivoli. I was actually one of the first cloud evangelists under Simon Wardley. How many people know Simon? If you don't know Simon, you should know him. He's awesome.

I got to be the ninth person in at Chef. Hopefully, this isn't boring you. I actually spent a fair amount of time working at Opscode, building a customer-facing business. I have this pleasure working with Damon Edwards on DevOps Cafe. Actually, I also worked with him at DTO Solutions for a little bit in a startup, and DevOpsDays.

My last thing was I went to work for a cloud management platform company called Enstratius, and we sold it to Dell. My current startup is something called SocketPlane, and I will say there are two myths that are baloney. One is you don't have to live in the Valley to do a startup, and two, you can be over 30. I'm 55. This is my 10th startup.

Okay, let's get into the story here.

How many people know, I can't really see, how many people know who this guy used to work for? One. Knight Capital, right?

It's a cliché DevOps story now, but I figured, you know what? If I'm talking about blind spots, let's talk about the worst freaking IT blind spot I've ever heard.

Knight Capital was an HFT, high-frequency trading, what they call latency arbitrage. Great book on this called Flash Boys. They basically had a $1.5 billion corporation. They had 17% of the market share for the NASDAQ, NYSE, and NYSE MKT.

Guess what happened one day? They were distributing a new program to eight servers. Guess what? They weren't using Chef or Puppet or anything. I'm pretty sure. Nobody's talking about it, right? Everybody wants to know what the real story is, right?

But they couldn't have been using Chef and Puppet, because guess what the technician did. Anybody? Forgot to get the eighth server.

If you understand high-frequency trading, what it is, it's like fishing. You do a trade, right? You try to buy 100,000 shares of Microsoft, and they spread that out to like five different exchanges. If the first one gets there four milliseconds, it sends a signal to the next one. What they do is they do this fishing. They'll actually start algorithmically buying to catch the pattern.

This test program that was basically disabled on the eighth server got activated, because guess what else they did? They repurposed a flag. Now all of a sudden, this buying algorithm fires off. End result is they lose $400 million in three hours. They're out of business in 24 hours.

Blind spots can be, in any flavor, catastrophic and deadly. The good news is the blind spots I'm going to talk about today are hopefully not that bad. I just think the Knight Capital story is a really good hygiene story for our industry.

I happen to know the gentleman after me is going to talk about software-defined infrastructure, or SDDC, and I think that's an important DevOps discussion.

I'm going to do a little scorecard. We think about the holy trinity for the idea of what we do: compute, storage, network. Some people call it converged infrastructure.

In general, compute, we're green. I think based on the sessions we've seen over the last day and a half, we're doing a pretty good job. At WebScale, we're doing a really good job. It sounds like enterprise, we're doing a pretty good job. We do infrastructure as code, things like that.

Storage, not really my area of expertise. I'm going to give it a yellow, not going to debate it with you. But network, I will debate, is fire-engine red. It is a blind spot.

That's pretty much the discussion I'm going to talk about. I'll give you some data points.

There's a large bank in New York that 10 years ago, if you walked into them, and actually I did, and you asked them what their server-to-sysadmin ratio was, they would say it was 1 to 100. One sysadmin to 100 servers.

If you asked them what their NetOps-to-network-gear, let's call it a switch, just for lack of a billion ways to describe it, it was about 1 to 100.

Today you walk into that shop, they use an infrastructure-as-code product. They boast that their sysadmin-to-server ratio is 1 to 10,000. Anybody want to guess what their NetOps-to-switch ratio is? 1 to 100.

That's a blind spot.

In general, if we're doing compute reasonably well today, we can get compute resources in five minutes. Ten minutes, and we have to converge them. Two minutes if we don't give a rip what's on it.

In some cases, in companies I've talked to, and I just spent the last year going around the globe talking to companies that are trying to do networking, or new network, or greenfield networking, I hear stories of three months to get network provisioned. The minute you have to do complex network things, in some cases it can take up to three months.

Best-case scenario is a couple of days, unless you've actually optimized for this way of thinking.

In general, there are still large organizations, the way they do networking today is cutting and pasting spreadsheets and running diffs to tell what their network configs look like across the spectrum. Big, large institutions.

One of the things that we've done in this industry, I think pretty well, certainly in compute, and the reason I would say compute is green, is we've been shortening the gap between a developer and an IT resource that a developer needs.

Like 10 years ago, I don't know how many years, we went from bare metal to virtualization. The gap was shorter. Cloud made that gap even shorter. I would argue, and I will argue later, that containers will make that even shorter.

But in general, in the compute resource, we've done a pretty good job as an industry getting the developer closer to the IT resources they need. Anybody disagree with that? No. Okay.

But the thing is, I will still argue that in network, we still have that same 10-year gap. If you look at most network presentations today, they will actually talk about the network. Here we are still, 15 years later.

So what changed in compute? Why is compute green from my scorecard?

Compute is green because of a couple of things. The disaggregation of hardware and software was a major contributor. Eight years ago, I got to meet Luke Kanies at Puppet. I begged him for a job for two years. He wouldn't give it to me. I begged Adam once, Adam Jacob from Opscode Chef, and he gave me a job.

Eight to 10 years ago, I was trying to prophesize this way of doing this kind of infrastructure that a lot of people accept and love today. One of the things that has happened somewhere along the way, definitely before, but it was a major impact on people accepting this model, was the disaggregation.

Back in the day, you got AIX and you got software from IBM. AIX, and then you got the hardware. You got HP, you got HP-UX. You got Sun, you got Solaris. Then all of a sudden, we actually just got Linux and we went out and got our own hardware. It changed a lot.

Then also web scale. Back in the day you had Amazon. When I went to Opscode, some of the original guys were actually ex-Amazon guys. They had been doing a different way of infrastructure with open source, things like CFEngine and things like that. The big guys like the Facebooks and the Amazons, and I don't know what the hell Google did, but they were running infrastructure as code. As they left, it leaked out. We started using those technologies.

Public cloud changed the concept of over-provisioning. We talk about elastic computing. Well, the truth is, that was overhyped, but it forced a reset on the industry about having to over-provision.

Then, Andreessen says software's eating the world. Software has eaten our world. If you've been in ops, software has eaten your world. And if it hasn't eaten your world, you probably need to find a new job.

So why is this thing about the network all of a sudden a reality?

Well, guess what? If you've been paying attention, there are some real interesting things about the disaggregation of hardware and software for switches. There are companies now, and I'll list a few later, that will actually sell you just software, and they say, "Go buy this particular hardware," that has ASICs, commodity ASICs on it, and bang, you're in business.

Very similar. It's happening now. It's happened within the last couple of years.

We're starting to see large cloud titans starting to release how they're doing networking. The Googles. Google's released Kubernetes. We're seeing that stuff is spreading out, very similar to what we saw in web scale and open source for compute.

Private cloud computing has totally disrupted the enterprise from a network perspective. We're spinning on our heads. We thought that we could just get a cloud, plop it in, and everything was going to be hunky-dory. And then the network people came: "Whoa, whoa, whoa, whoa, what's going on over there?"

Again, if we think about software-defined networking and SDDI, SDDC, software-defined data center, software is also eating every part of our industry.

Here's some other meta points. If you hear people talk about SDN, they'll talk about northbound and southbound traffic, or east-west traffic. These are general numbers.

In 1990, 95% of the data left the data center, and 5% basically was east-west. In 2010, some people would argue that 75% stays in the data center. In large container-based infrastructures, those numbers are going to be as high as 95% east-west, 5%.

These are things that are definitely changing or become the potential blind spots if you don't recognize that this is happening.

Another important point is what we call the edge has moved. The edge used to be a physical network device. It used to be a switch, maybe a top-of-rack switch structure. But now with hypervisors, and certainly with virtual switching fabric, the edge now is a host. It's a bare-metal host.

Here's the thing. It was easier back in the day to say the network people did the physical stuff and the ops did the host. But now, that host, who owns that host? Who owns vports, virtual ports? Who owns logical networks? Who owns iptables and Linux bridging?

It's gray. In fact, I'm not going to trash OpenStack. I happen to like OpenStack, but if there's problem areas, don't look too far. It's the network.

Of course, everybody take a drink, I said Docker.

Docker's going to make this worse, or better, depending on what way you're looking at it. Now there's no hypervisor. The host is just a bare metal, and hundreds become thousands. Virtual port hundreds become virtual port thousands.

If you're not getting the whole DevOps angle yet, it's an OpEx story. It's a blind spot if you're not paying attention.

Guess what? If you want to go to OpenStack and see what a compute node looks like on OpenStack, this is the last release. That's your new edge. Now, whether it's an OpenStack compute node or it's going to be a Docker host, that could be, depending on how you do your network and how you think about network and how you apply software principles to networking.

By the way, it's nine hops to get from the VM out to the egress of that box.

So I talked about disaggregation of hardware, software, network devices. Arista, very interesting company, really is a software company. They happen to give you hardware, but there's this move for software and certainly bare metal, or in this case, Linux-based switches.

The thing about it, if you haven't played with Arista, it's really cool. You go in and you run Bash, and you're basically in a Bash command. All the things that you work with it are basically Python scripts. So all of those, if you've ever done any network stuff on a Cisco device... Well, Cisco's going there, too.

Cumulus Networks is interesting. They actually are a real disruptor in that they ship you just software and you've got to go buy your own hardware. Sound familiar? Then there's the OCP, Open Compute Project. Facebook is driving a lot of that.

So let's talk about SDN, the elephant in the room. The blind men and the elephant, everybody.

SDN is a buzzword. It's like DevOps. Everybody wants to own it. But let's talk about SDN.

A traditional definition to SDN is, if we look at traditional devices, basically you've got a box, not very malleable. It had control plane and data plane. Data plane is the packet in, packet out, wire speed. Control plane is the distributed brains of that data plane: router protocols, things like that.

Somewhere along the way, some people credit Martin Casado from Nicira, who was bought by VMware for $1.2 billion, as the person who invented this. Other people would argue. I don't care.

The idea was to separate the control plane from the data plane. The point being that you can make the data plane programmable. Instead of doing this wizardry with router protocols.

I say that in order to extend router protocols, you basically had to be born in 1950. You had to be a computer scientist who wore really thick black glasses, and then you could basically extend BGP or OSPF.

But Martin said, "Hey, any old fool can do a matching table on a data plane." And that's OpenFlow, and there's other variants of that. In general, that's the kind of traditionalist view of an SDN.

I think it's a little better. I think there's a history here. One of my guys that works with me, he calls it retro SDN. We were doing really cool stuff with router protocols. Router protocols are really smart shit. Second time. No more, I promise. And I haven't said any really bad words, neither, so I'm okay. We talked about this last night at dinner.

The idea is that you start building a programmable data plane. Not only can I now say, "Packet in, go to port 48. Packet in, go here," but now I can do really clever stuff. Like I can say, "You know all those firewalls and iptables I have spread all over the place? I can put that service profile or policy logic into this data plane," and it becomes a programmable interface through an abstraction.

The reason why we refer to it as kind of retro SDN, and please don't use that as a term, is that what we did with pure SDN is we basically made this assumption that this was the right way. It solved a big problem, but then it threw the baby out with the bathwater. Because the router protocols are pretty damn good.

I think there's a new convergence of SDN, of taking a little of the old architecture, some of the new programmable data-path logic, and coming out with something new. Hint: that's something I'm working on.

But if you think about that programmable data plane, what do we do now? We go in, packet in or frame in. We go out to iptables, we go out to HAProxy, we go out to any function. The idea is not only can we actually direct traffic, but we can actually put application services, like we can put firewalls, we can put load balancing.

VMware calls this micro-segmentation, where you start consolidating that logic in a programmable way of the data plane. I think that's pretty cool.

All right, so moving on. Those are a bunch of blind spots and opportunities, and if you're not paying attention, they're going to spur up in your organization. Hopefully now you have a general idea, or you already knew this, but you have a general idea of what to look for.

There's something else bigger on the horizon, and it's basically what I call consumable, composable infrastructure. Here's the Wikipedia definition of composable infrastructure. I'm not going to read it. It's Wikipedia. Bang.

Here's the best example of consumable, composable infrastructure. It's called the Jon Bentley's challenge in 1986 to Donald Knuth. Basically, the challenge was to write a program that reads a file of text, determine its most frequently used words, and print out a sorted list.

Everybody wrote programs and C programs and this programs, and one person basically took six Unix commands, put them together, and solved this problem.

The interesting thing was that none of those commands, like the uniq or the sort, were designed to solve this problem. When we were done, they all got thrown away from the application standpoint.

If that makes sense, you now checkbox understand microservices. Because that's the basic idea. Somebody talked yesterday about Conway's Law and programming, like microservice is a way to flip the way you think about things.

Anyway, again, we could spend a whole session on this. I can't. I've got eight minutes.

So what happens? Why is this being driven?

If we look at the history here, we got bare metal in eight weeks. We got virtualization in two weeks. All generalizations. Infrastructure as a service, two minutes. That's being real generous. And PaaS, maybe one minute.

But here's the thing. We start talking about containers, like 500 milliseconds. So let's say infrastructure as a service was eight minutes. Let's say that in order to Chef it up or Puppet it up, it takes more than two minutes to get a virtual instance.

If we think eight weeks to eight minutes was industry-changing, and oh my God, it changed everything, blah cloud, then I will tell you that containers are making the same significant change in the way we think about time.

Again, there's a lot more to be said about that. If you actually want to see some brilliant presentations, Adrian Cockcroft over there, just Google him if you don't know him. He is what I call the brains of the data center future.

Docker. How many people in here have heard of Docker? Yeah. You're not as bored as I thought.

Docker is a commoditized version of containers. People say, well, Linux containers have been around forever. The thing that these guys did, just like cloud was around forever before Amazon, right? But they commoditized containers.

A good friend of mine, Stephen Nelson-Smith, wrote a book, Test-Driven Development with Chef. He described how to do LXC on Amazon. I thought, oh my God, I'm going to do that. So every night after I put my kids to bed...

By the way, I'm not even a horse in the unicorn horse thing. I'm basically a pony, right? A big pony.

But the thing is, I tried to do that and I gave up. After I put the kids to bed, finished my day job, no, it's just too much. Four blogs open, and this guy says to do that.

I got an early copy of Docker before it went out. In less than eight minutes from the README, from the install, I had a Docker run going, and I was running containers.

That by itself was amazing. But what's even more amazing is they took a copy-on-write file. So they combined a whole bunch of stuff, not just containers. They created this way of now sharing artifacts, images.

You say, well, virtualization does that. Well, not really. Because Amazon, you have your own image. VMware, you have your own image. GCE, Google Compute, you have your own image.

These guys basically put it in a way that I can run my binary image on my laptop, running something like Vagrant. I can move that up to Amazon for some intense smoke testing. I can then put it in production on vSphere. If all went well, the binary never changes.

I have no entropy of converting this one to this one and doing that. There's a lot more to that story in terms of how they do artifacts.

Last but certainly not least, they added a Git-like flow, which again, was I think probably the most brilliant thing they did. You do pull requests. You do push. They created this artifact that's universal. Build once, run anywhere. Ha.

And then, by the way, an amazing way, what better way than to use a Git workflow to...

Here, I'm going to kind of skip. I've got five minutes left.

The big thing about containers in general and Docker is that you're not replicating the operating system every time. For those of you who know Solaris Zones, you knew this story already.

Adjustments required.

I think that compute is going to change. We're going through a paradigm shift in compute that is going to be insane, and I think it's going to happen faster than the last couple of... Even private cloud.

I think consumable infrastructure. There's a perfect marriage between containers and microservices, and both of those are reality. Yes, they're buzzwords, but they're realities.

So when you start thinking about composable infrastructure, or microservices in a way that you can build these things that are interchangeable, and you marry that with a compute paradigm that basically takes 500 milliseconds to come up and come down...

By the way, if you haven't read anything from Dave McCrory, who's over at Basho, who talks about data gravity, then get your bingo cards out. You add on IoT, which is going to create an incredible amount of data. Talk to somebody at Nike and ask them about the Fit Band and where all the data's going.

Then you put all that together, and now you've got this unbelievable perfect storm between, guess what? Data gravity is this concept of we used to move data to the compute. Now we move compute to data. Because guess what? Data's gotten too big to move.

Now you have this concept of really composable infrastructure. You can compute-swarm around the data. Microservices as your architecture. So it's very composable, and you can build things quick. You can build them, destructure them.

Again, if you're not thinking about this, there are people in your labs. Everybody I'd ask, every enterprise, "Are you doing any Docker?" Nobody has told me yet, and I probably spoke to 50 people already at different enterprises. Everybody has got Docker in the lab.

So what do we do here?

We have to rethink configuration management. I will tell you, Glenn next will probably hopefully talk a little about this when he talks about SDI. Glenn is awesome.

We are myopic about compute. We need to level up the other parts of the infrastructure: storage and network.

There's a concept of what they call ready-state networks. Most of you are probably already aware of ready-state compute, right? Things like Razor or just PXE booting, like, "Oh, we can electrically turn our computers on."

Well, there are companies now that basically pull in racks, drop down wires, put them in, electricity, and the whole thing is configured. Guess what? They can actually ephemeral racks.

That means the network devices... So we talked about in the early days of DevOps, we called it the "Can you throw your server out the sixth-floor test?" There are companies now that say, "Can you throw your switch out the sixth-floor test?" Because it's the same idea.

SDLC for networks. Guess what, folks? There are people building network configs in Git, running them through ERB templating, basically running them into Jenkins, running testing scenarios, and using virtual switches to run some form of TDD.

I'm going to sound really like an asshole here. Third joke. If you don't know that's happening out in your competitors, you probably want to go look.

We ran an event last week called DevOps for Networking.

We need to make this transparent for developers. Again, remember that gap? Network has to be part of that shrinking gap. I want network people to be able to get complex network scenarios without having bottlenecks and roadblocks.

Here are some things. I'm really running low on time.

Gene talked about the cookbook. We talked about embedded ops into dev. Well, you should be embedding NetOps. Damon Edwards, those guys are pretty much experts on this value stream mapping. If you haven't looked into value stream mapping, think about the pull mentality. Git is a good tool.

Invite your NetOps to hack days. Hey, how about that? And let the network people have fun, too, right?

I go into these places and even enterprise, they're like Nerf pistols and all this stuff. Where are networkers? Oh, they're out to the dungeon on the second floor in the basement, right? Let them have fun, too. Invite them.

All right, so Gene asked every speaker to make a plea, right? And I'm going to kill it. Fifty-two seconds. This is awesome.

Basically, my plea is I spent the last 10 years trying to be an evangelist in compute. I've done a pretty good job. I was at Opscode. I then did stuff with DTO. A year ago, I got this religion about network.

So I want you all, and it's self-serving. Yes, I have a company. This helps me. But I had the same mentality with Opscode, but it worked out well for people in that evangelism.

I will tell you, go find the network people. As much enthusiasm as you've had today and in the last few days about what's going on here, get them enthused about this as well. It will be a much better story for you. I guarantee it.

Thank you very much.