Enterprise DevOps and Unicorns

Log in to watch

San Francisco 2017

Enterprise DevOps and Unicorns

Director, Platform Architecture · Pivotal

My DOES 16 keynote last year discussed lessons learned running operations at places like Basecamp, Heroku, GitHub, and DigitalOcean. This year I've spent a lot of time working with Fortune 100 organizations to adapt those lessons to the realities of a large enterprise.

We'll revisit some of the lessons I presented last year along with other experiences from previous roles and explore how they apply in practice to the enterprises I work with today.

Chapters

Full transcript

The complete talk, organized by section.

Mark Imbriaco

Good afternoon, everybody. I guess I'm going to go ahead and get started because my clock has started, and I have no idea how long this takes, so we'll figure it out together.

I'm Mark Imbriaco, and I'm here today to talk about some lessons learned in the past 25 years working at a bunch of startups and larger companies.

I gave a similar talk last year that was really focused on the past. It was focused on a lot of those startup lessons. And this year, Gene asked me to talk about what I've seen over the past seven months since I started working at Pivotal.

I work at Pivotal today. My title is Director of Platform Architecture, but really my job is the choose-your-own-adventure, go-talk-to-interesting-people job, which is amazing. I spend my time going to talk to Pivotal customers who are doing really amazing work, and who are in the midst of these transformations, like all of you are, and learning what they're doing and what's working and what's not working, and seeing how the things that I thought would map actually map, and seeing how the patterns that I've seen over my career map in their worlds.

And there are some surprises and some things that aren't surprising. So we'll talk about that.

To get started, Gene told me last year, it was really interesting, that I had to really lean into the credentials or nobody would take me seriously. I had to tell people where I worked, and I had to really ham it up. I'm not going to do that, but we will go through my talk from last year, and we'll sort of talk about places I've been.

I'm going to quickly run through the lessons learned, these little soundbites that I gave last year.

Back in the mid-2000s, I worked for a little company called 37signals. 37signals is where Ruby on Rails was created, and it had a big impact on my career. I was the first ops hire there. I was the seventh or eighth employee. I usually say seventh, but I was thinking about it the other day, and I'm not sure. Number seven or eight, first ops hire. I ran the operations team and built the ops team there.

The things that stick out in my mind from 37signals are this idea of making tiny decisions. The smaller units of decision making you can apply, the safer you are, the more likely you are to be able to roll back.

And there's the idea of fighting hero culture, right? Work sustainably. Have balance.

Then I left 37signals in 2010 to go work for a company called Heroku. I suspect most of you know who Heroku is. Heroku is a platform as a service. I joined Heroku. I was the 20th employee. We had two people doing operations for a platform serving 60,000 apps. Eighteen months later, we had about 10 people doing ops for a million and a half apps. So just a little bit of growth. It was super exciting.

I ran operations at Heroku, and there are a bunch of things we learned there. There's this idea that you hear a lot in sports and in the military: you play the way that you practice, or you fight the way that you practice. So practice makes perfect. We made deliberate practice a part of our operational culture.

And this idea of don't make me think. At 3:00 a.m. when I need to solve a problem, give me as much information as you can up front so I don't have to think about things that aren't relevant to what I'm trying to solve for.

And create a safe environment to learn. We've heard John talk about blameless postmortems, and that's really what that's about.

And then take that a step further, and don't be afraid to share the results of those internal postmortems publicly and be very open about what went wrong. And when you have a 67-hour outage of your public platform as a service because Amazon has an EBS problem, talk about that publicly in great detail, because the customers have a right to know.

Then I left Heroku and I went to a company. I'd never done e-commerce. I thought it'd be interesting to do some e-commerce work. So I looked around and I said, "You know, a billion dollars a year in transaction volume seems like a good place to start." So I went to LivingSocial and ran operations at LivingSocial for a while.

This empathy-as-a-core-value thing is one that really sticks with me from that time. It's interesting. It has nothing to do with customers. It has everything to do with the internal relationship that existed between development and ops. The relationship was, to put it charitably, very poor when I joined.

The problem we had to solve here is that there was just zero empathy from development to ops and vice versa. And we've all seen it a million times, right? "Those idiot developers won't quit writing bad code and waking me up in the middle of the night," and, "Those guys in ops just won't let me get done what I need to get done. They're getting in my way all the time."

And we had to get past that, and we had to have adult conversations and talk about problems. The relationship was dramatically better in a very short period of time just by listening to one another.

Then I had the chance to go work at GitHub, and this was a dream. I had known the founders of GitHub for a long time, and I'd really been focused on this idea of enabling developers and enabling teams. It's why I went to Heroku in the first place. So when the chance came to get back to that, to go back to what I was doing at Heroku, but at GitHub, I jumped at it.

The things that stick with me from GitHub are this idea of collaborating by default. The work that we do day to day should be visible to the rest of the team. We should collaborate around that work all the time instead of making it an explicit action. And that was very much the culture we had at GitHub, and it's tremendously powerful.

And this idea that ops tools don't have to be ugly, that applying a little bit of design to these tools can make a dramatic difference. And not just ugly, but also usable, right? They don't have to be unusable. I made a joke last year about how Bootstrap turned into millions of ops tools that are now driven by Bootstrap just because it gives you this ability to make them not look like complete garbage.

But we went a step further and enlisted designers to come and help us build ops tools so they were usable and beautiful.

And build this culture where you value shipping, where the act of delivering new value to your customers is something that the entire company can celebrate together. And we absolutely were focused on that at GitHub.

Then I went to a company called DigitalOcean when it was time to move on from GitHub, and DigitalOcean gave me the same feels that I had from Heroku and from GitHub. DigitalOcean is a public cloud provider that's very focused on the developer experience and making a simple experience for those developers, giving them the features they need, but not more than that, and letting you spin up a virtual machine with five decisions instead of 40.

So that was incredibly powerful. Do the simplest thing that could work.

And close the feedback loop. The close-the-feedback-loop one, I told a great story, I think. It's not about customers here. This was about development and operations, again, where you think you're talking to one another, but you're really not.

We had a case where people on my SRE team were being woken up every night for an issue. They would wake up, they would Band-Aid the problem, and they would get the developer the next morning to do some more work on that. It went on and on and on for three or four days, and then we had a weekly meeting and I said, "Guys, what's going on? Why aren't we fixing this?"

"What do you mean? What problem?"

"What do you mean, what problem? You just fixed it this morning for us. How do you not know you're waking us up every night? You know this."

But they didn't. So just being very deliberate about the way that you communicate and very deliberate about how you give feedback and making sure that the loop is closed, even when you think it is. Be deliberate. Make sure that they're hearing what you're saying.

So we've got all these lessons, right? I've spent 20 years at this point doing things, and last year when I talked, I was running a company called Operable.

So you won't believe that my clicker stopped working. Yeah, there it goes. You won't believe what happened next. This is my clickbait intermediate slide.

I was running a company called Operable. Operable was a startup founded in 2015 when I left DigitalOcean, and it was really driven by this desire to take those lessons from GitHub, these collaborate-in-the-open, this work together collaboratively around the work you do every day, remember what you did, and provide an environment that you can learn from. That was really the focus of Operable.

We raised a bunch of money from venture capitalists, and we built this ChatOps platform. And then a couple of years later, this next thing happened. We closed the doors because we failed.

I say failed very deliberately because, first of all, we failed. We didn't achieve the mission we set out to achieve. We weren't successful as a business, so objectively we failed.

It's interesting that that term makes people who aren't me very uncomfortable, because they look at me and they're like, "Mark, you didn't fail. You did some things." I'm like, "No, no, no. We failed. We didn't do what we set out to do." That's the definition of failure, in fact, and I'm okay with that. It's not a value judgment on me as a person. It's a descriptor of what actually happened.

And I think being comfortable with this failure is super important, so I keep using the word because it's the right word, even though it makes people uncomfortable for some reason.

But Operable is interesting because I learned a ton from Operable. From a personal point of view, it was very hard, probably the hardest two years I've ever spent doing anything. But I wouldn't change it.

What's interesting is I've spent a ton of time retrospectively looking at Operable and the things that we could have done better or the things we could have done differently, not because I think that we made bad decisions along the way, because I've looked at it pretty hard and I don't think we did. I don't think objectively we made egregiously bad decisions with the information we had at the time.

But there are certainly things we could have done better, and I want to understand those. So I spent a lot of time looking at that. As I was preparing this talk, I was really struck by some of the lessons that I told people to be aware of last year that I was aware of and that I did anyway.

This next slide, really, I tweeted this earlier because I love this: "Those who do not learn history are doomed to repeat it at their next conference presentation."

So just knowing about an issue and being aware of a problem and being aware of a behavior or capability that you should focus on, knowing is only half the battle. If you forget the red lasers and the blue lasers, you're in trouble.

So let's talk about some of the things that I skipped or that I missed at Operable.

We'll start at the beginning: make tiny decisions. No, we were pretty bad there. It's interesting, right? You're building a company and you're building this product, and you can see this long time horizon. Like, this is where I want to be in X number of years. This is the vision. I've got this grand vision in my head. I convince people to give me millions of dollars. I convince some of my good friends to be angel investors, and I've got this huge vision, and it makes it really hard to focus on the decision I need to make for tomorrow.

And especially if that decision is in conflict with where I want to be in five years. It's tough. Even when you're aware of it, it's tough to get through that. So I definitely made that mistake and ended up kind of paralyzed with larger decisions that I didn't need to make right now.

And hero culture, it goes without saying, anybody who does a startup completely screws this up. I completely screwed it up and hit the breaking wall and was literally standing in the dark in my bedroom at 3:00 a.m., tears streaming down my face, having a nervous breakdown before Monitorama last year because our demo just wasn't ready.

You let this happen to yourself even though you know you shouldn't. Even though you tell your employees not to do it, you do it to yourself anyway.

So again, it's like, okay, great. I know these things, and I knew them then, and it's like, well, that's your opinion, man. I can do this if I want. I'm an adult. I can hurt myself. Whatever.

And then there are some other things that stand out, like do the simplest thing that could work. This is very much like that tiny-decision one. When you move away from those tiny decisions, you sacrifice the simplest thing that could work as well, because you're making these big decisions and you're like, "Okay, well, I know that I'm building a platform. I'm not just building a chatbot, and I know that down the road it needs to do these things, so I better..."

And you spin your wheels doing a lot of things that you don't need right now. And that means that you don't deliver features that your users can give you feedback on.

You know this. You know this when it's happening, and you can see it happening, and you rationalize it because, "It's so complex. I can't do the simpler thing. This is so complex. I've got this picture in my head. If you could only see this picture, you would understand why I have to do this."

And it comes back to, I think, the biggest thing I learned at Operable. This next slide is one of the things that really sticks with me. I spent two years doing product, and the number one thing I learned is to always ask this problem over and over again until it makes people angry, and then ask it some more.

This question solves conflict incredibly often because when two people are in conflict, or two groups, or your security group and your development group, or your audit group and your ops group are in conflict, it's often because they don't actually understand the problem each other are trying to solve. And they haven't had a conversation about what problem they're trying to solve.

Instead, security has said, "You have to do this." And ops has said, "That slows us way down. We don't want to do that." And security says, "Well, you have to." And ops says, "Well, we don't want to." And you go back and forth like children.

Unless somebody says, "Okay, security team, what problem are we actually trying to solve with this? You told me to do that. I don't want to do that. That's really slow. What can we do to solve that problem in another way?"

And when you build this context, this awareness of what the other people are trying to solve, what your colleagues are trying to solve for, you can break through a lot of things.

And even when there's no conflict, when you find yourself down in the weeds, bike shedding or yak shaving and building something that doesn't matter right now, being able to come back and say, "What problem am I trying to solve? I just spent three days on this thing, and when I started on it, I thought it was going to take me two hours. How did I get here? What problem am I trying to solve? And am I solving the right problem right now?"

So I'm two days in. I expected this to take two hours. What problem was I trying to solve, and is it the right problem? And should I just throw away what I'm doing because it's not important enough for two days, even though I've already spent two days on it? I think it's going to take me another day. Should I keep going? Well, no. I thought it was worth two hours to begin with. I need to throw it away.

And we need to get away from, we all understand sunk cost, but it's easy to understand it, and it's hard to internalize it. And it's hard to admit at times that you're solving problems that don't matter or that don't matter enough.

This was another one of the things that I really learned at Operable the hard way. This is probably why we did not continue to exist as a business. It's because we were solving problems that were real for real people who did real work like I was used to, but we weren't solving problems people were willing to give us money to solve.

And that's critical, right? We were solving real, legitimate problems, but they didn't matter enough in the right context. So asking this question is just as important as asking what problem are we trying to solve: understanding the relative value of that work compared to what else you could be doing.

This one, "You don't get to choose whether to incur sunk costs. You get to choose whether or not you keep funding it." I freely admit that I stole the outline of that from John Allspaw with his incident one, because it's so good. And I spent many hours with John on his talk before he gave it, so I think it's fair.

This next slide I stole the title for from Nicole Forsgren: "Linear causality is for chumps." Her line is a little different. Her line is, "Maturity models are for chumps," which I also agree with.

So linear causality, this idea that cause and effect are a thing and that this event chains to this event, chains to this event, chains to this event, and I have to do them in this order if I want to get to point B.

One of the things I discovered about myself that was incredibly powerful was that when I'm under stress for a long period of time, like, say I'm doing a startup and I'm eight months in and I haven't built anything useful yet, I tend to focus on problem-solving in a linear fashion because it's easy and it's comfortable, and it's also completely wrong.

It's, here's the end game that I want to get to. And if I imagine that there are sort of three big milestones along the way, if I think linearly, I'm always going to start on one side and move to the first milestone, and then move to the next, and move to the next.

And if those aren't milestones, if those are instead sets of features, I can completely miss the fact that if I were a little more creative, I could have skipped that first set of features altogether because that's really secondary to what I'm trying to solve.

And if we get in this linear thinking, it's a huge risk. So whenever I find myself thinking, "Okay, well, I have to do that before I do that," I always want to stop and take a step back and say, "Can I skip any of those steps? And why do I want to spend time on something I don't care about if it's before the thing that I do care about? Do I really have to do that?"

And linear thinking, to me, is almost always an indicator that I have not applied enough imagination to the problem, that there are probably creative ways for me to solve the problem I care about without doing the things I don't.

And that's what it's all about.

So now we have kind of traversed my ancient history and the sort of recent history, and now we're in the present. I work at Pivotal, and I get to talk to really interesting people.

And this section was really hard for me. If you look at the slides after the talk, there's a bunch of sort of bonus material after the end slide. First of all, because I don't know how long this is going to take, so I wanted to give myself some safety net because I'm an ops person. And second of all, because I couldn't make up my mind which things I wanted to talk about.

There's so much interesting stuff happening, and there are so many interesting observations. Every time I meet with somebody like John Allspaw, or Richard Cook, or Tony Hansmann, my colleague at Pivotal, or anyone in the industry, I'm like, "You won't believe what I just saw. This thing is amazing. These guys are doing great work."

Or I'm tweeting about something. Like last week I was at the Home Depot in Atlanta, and I'm sitting here watching this team, and they're doing something that to me, from a technical point of view, isn't super exciting. It's not that compelling. They are deploying from a repo to production with no manual intervention along the way.

And my friends in technology companies might look at that and go, "Well, that's not very exciting. Who cares?" And I'm like, "Wait, wait. You're missing the entire point. You missed the 18 months between the beginning of that and the end of that, where they had to navigate the political minefields, and they had to ask the security team what problem are you trying to solve and solve it, and where they had to bring together a bunch of people to build these tools to solve it."

And that's super exciting. So I have a hard time figuring out what I want to talk about.

But we'll start here: this TIMTOWTDI. Anybody who's done Perl, especially if you did Perl a long time ago, knows this: the idea that there's more than one way to do it.

And this really strikes me because you see a lot in the DevOps transformation world, people sort of behaving as if there's one magical path to do the DevOps, and it's just not true. It's so contextual.

Anybody who sells you cargo-culted, cookie-cutter consulting that says, "This is our method, and this is how we do it," kick them out of your building because they can't help you. If they're not willing to invest to build the context in how your organization works, where you are, they can't provide value. Context is incredibly important here.

I feel this because before these enterprises, my job was to go to places like GitHub and LivingSocial and Heroku that have had massive growth and have big problems. And the first thing I did every time I went to one of those places wasn't make changes. It was sit for three months and figure out what the hell is going on. Like, what is the lay of the land? What's the context?

And these are massively simpler organizations than the smallest organization I talk to today. Massively simpler.

So cargo-culting software is dangerous. Cargo-culting org structure from a blog post is reckless. Don't do it. Just don't.

You need to think very carefully about the changes you're making. You need to make small changes. You need to iterate. You need to be very deliberate, and you need to not assume that something that worked for someone else will work for you.

Advice is great. Hearing the stories from people that have done similar work is fantastic and powerful and important. But copying what they did without a deep understanding of the nuance around it is dangerous.

So this next slide is kind of for me because, well, I'm here to amuse myself as much as you.

So this is the Bene Gesserit fear... what is it?

Litany Against Fear. Yeah, the Litany Against Fear. That's it. Thank you. Thank you. I'm killing myself.

"I must not fear. Fear is the mind-killer. Fear is the little death that brings total obliteration. I will face my fear. I will permit it to pass over and through me," blah, blah, blah.

And this one, I have to admit, I picked the slide after this, and then I said, "I've got to put this quote in because it just fits perfectly." Because fear is not the mind-killer. Change approval is.

And when you think about it, they're kind of the same thing, right? Fear, change approval, they're really the same thing. Depends on how you look at it, but they're pretty much the same thing.

So change approval is, without question, everywhere I go, the number one thing that will screw up your transformation process, period.

And it's hard and scary, and you have to talk to a lot of people, and you have to ask what problem are you trying to solve, and you have to convince them that your creative solution to that problem is good enough. And then you have to talk to the audit people, and then you have to talk to the compliance people. And it's super painful to navigate, and it's also super important.

So we have a tendency, me included, to want to push this off, to want to get some momentum going. But we should fight against that. We need to start work on this at the beginning because it will be a problem.

This is not one of those make-little-changes things. No. Just start on this at the beginning, because the last thing you want to do is spend a year making progress, and you're like, "This is great. We can go from development, from a new idea to a staging environment in an hour," and it takes six months to get it pushed out because we can't get through the change approval processes.

Talk about a way to kill your transformation efforts.

So you've got to address this head on. You have to have adult conversations with people who have conflicting goals to you. And you have to understand that your organization is setting you up for failure in many cases because the incentives that people are optimizing their workflow against are different.

This is the historic problem we've talked about for years. Dev is optimized for delivering features. Ops is optimized for availability. Security is optimizing against no breaches. And the mitigations that each of those apply to solving that problem are completely at odds with one another.

You have to understand that from the beginning. You don't have to love it, but you have to understand it, and you have to understand it deeply enough that you can convince them that, okay, I get it. You're worried about this because it impacts you directly if we get it wrong. But we have to solve this problem.

I'm here to tell you that I'm going to be delivering software faster, and you know how the Ops folks have been really angry for the last year because they've gotten all this pressure to do this DevOps thing? Well, that's going to be you in six months when I'm delivering software so fast that the compliance can't keep up.

So I'm trying to help. Let's get over this before you become the long pole in the tent. And let's work together to solve it in a way that everybody can live with, and in fact, in a way that will put you in a good position for the rest of your career because you're going to have to keep doing it this way.

So having these conversations with people where you can identify what drives them, what they're motivated by, what they're compensated for, how they're recognized, and really internalize how that's different from the way that you are, and build solutions that solve everybody's needs, is critical.

And this one's probably the hardest one. It's the scariest one, and most people wait until last, and that's not the right thing to do.

So another thing that really struck me, and I was surprised by this. I'm not surprised by the idea. I'm surprised that these enterprises are willing to do it. It's opt-in versus all-in. It's letting developers choose their own tools and not dictate that you must go down this path.

I expected them to say, "Okay, well, the DevOps team built this CI pipeline, and you have to use it, period. That's it." And okay, that's great. It's probably better than what you had before, but it's not ideal, and the feedback loops there are not great, and it's a little bit far away from the developers, and there are some problems there.

The platonic ideal here is people building the tools or selecting the tools they need and optimizing them for their use cases. There are downsides there too, though, because now you've got these development teams all maintaining their own stuff.

The places where we've seen the most success and the most rapid growth and the most buy-in from teams is when they've allowed them to select their own tools and build a movement around those, and to customize those to fit their specific needs, and not forcing them down the shared community path or the shared enterprise path.

Not having a captive audience also means that that enterprise shared services team doesn't get to say, "You have to use this tool." They have to fight for your business. They have to run their part of the organization as a product, which is the right thing to do.

In fact, it's the only way they're going to work. Those shared services teams are in trouble in a lot of organizations because they refuse to recognize this and continue to try to force people into their tools instead of treating it like a product function, talking to their customers about their needs, building products they want, and iterating.

Instead, they try to take requirements and solve it in their own way, divorced from the people that will be using it, and force that on everyone. And that's always a recipe for disaster. If not disaster, it's at least a recipe for mistrust and reducing the speed that you could have had.

And related to this is the idea of building communities of practice. This is one of the things...

Cool, I've only got one more slide. I'm good. Yes.

Building communities of practice. I saw this at the Home Depot, and it was amazing. This team that stood on stage and...

So I went to this internal developer conference thing, sort of a one-day event that they do, I think, quarterly or something. Maybe it's monthly, and it's an afternoon for a couple of hours. Anyway, they had their continuous delivery group giving a couple of talks and a panel conversation about this.

What's interesting about their continuous delivery group is that it's not in the org chart. It's a developer from that product team and a developer from that product team and one from over there and one from their platform team. And they coalesced around these shared ideas, and they run their continuous delivery pipeline toolset as an open source project internally. Exactly like you would see a public open source project run, just not public. So it's open source internally, and it's fantastic.

They've coalesced around these needs that they each have, and they have such deep context about how their organization as a whole works that they can be very good contributors to one another.

And this means that when you are ready to take this work that they've done and turn it into a product within your shared services or your platform group, that you have a much more engaged community around it. And in fact, you have developers who are engaged in it who want to keep working on it and who are going to give you the best feedback that you could possibly get.

This is what kills products, is when you don't get feedback from customers. And now you've got an engaged audience that wants to tell you what they need and wants to work with you on it.

So allowing these communities of practice to form around ideas and to iterate on these ideas before you try to turn it into an official, blessed enterprise product means that you get to skip a lot of the missteps that you would have made.

If you're a shared services team, this should be a no-brainer because your budget's not paying for that initial development. It should be a no-brainer. You wait for them to get something right, and then you help. You enable your teams, which is what you should be doing in the first place. So incredibly powerful.

Last one, and I have a minute and 15 seconds, so I think I'm good.

The last one is you're all underestimating yourselves. And I underestimated you, too. I've been really surprised, pleasantly, by how forward-thinking the companies I've talked to are and how much they've been able to break through linear thinking.

It's interesting, right? You sort of expect, okay, well, the enterprises are five years behind the startups and these tech companies, so what we were doing five years ago, they're going to do it. No. What we were doing two years ago, they skipped those first three. They're way smarter.

They skipped those things because they figured out that they weren't relevant, and they were able to bypass entire eras of technical debt, or somebody called it heritage software on Twitter. I like that term. Really fun.

And they're able to bypass this, and they're able to do things I didn't think they could do, like build and run. You build it, you run it. Platonic ideal of operations for applications, in my view. But I didn't think enterprises would be willing to do it because it's also the most expensive way to run applications.

But I'm seeing it over and over again in more places than I expect, that people are doing the right things and not the easy thing.

So don't underestimate yourself. Don't think that just because you're a big company, you can't do whatever you want. And then when you do it, tell me about it so I can amplify your voice, so I can tweet about what you're doing.

And if you're the engineer who did that continuous delivery demo on stage, you get to participate in a conversation and then have your SVP call you out on Twitter for being amazing. And you deserve it because you're doing great work.

So that's all I got. Thank you very much.