Bill and Aimee's Excellent DevOps Journey

Log in to watch

San Francisco 2014

Download slides

Bill and Aimee's Excellent DevOps Journey

Bill Donaldson

Senior Principal Engineer · The MITRE Corp

Aimee Bechtle

Principal Software Engineer · The MITRE Corp

In their presentation Bill Donaldson and Aimee Bechtle will share a summary of their tumultuous journey that has resulted in 75% of their corporate applications utilizing Continuous Integration with automated deployments, a 70% reduction in labor, and 288% reduction of cycle time. They will support these numbers with charts depicting the deployment volumes over the course of a 1.5 year adoption. They will share how by selecting the right applications, approach, and people, and using creative ways to advertise success, the new capabilities were embraced and adopted in their organization. The presentation will conclude with how Aimee has transformed her team to support Continuous Integration with auto deploy. Additionally, she will share why the 25% remain in the manual system and how she’s pursuing Continuous Delivery.

In 2011 Bill Donaldson and Aimee Bechtle were leading development and deployment operations teams in The MITRE Corporation’s corporate IT department. MITRE is a not-for-profit corporation whose mission is to provide the US Government world class Systems Engineering. MITRE’s IT was running well with a mature and stable release process. Bill’s team had successfully adopted agile development practices and was adept at producing high quality software code quickly. However, the successful adoption of agile introduced a problem between the developers and the deployment operations team lead by Aimee. Developers had to “hurry up and wait” on operation’s 24-hour SLA to build and deploy their apps. The SLA frequently expired due to multiple handoffs and human errors. Tired of the bottlenecks and lengthy cycle times one day Bill said to Aimee “24 hours is 23.5 hours too long”. This simple requirement sparked the vision for a transformation of MITRE’s software delivery process.

The deployment operations team was employing a manual, mature and repeatable process that had been in place for over 10 years. In fact, in 2012 the Release & Deployment Management process scored the highest among all the processes in an ITIL process evaluation. So why would Aimee be motivated to change? Because MITRE’s unique mission and culture values innovation, active cost management, and establishing MITRE as a showcase to their Government customers.

Bill from Dev, and Aimee from Ops, partnered to successfully deliver an enterprise continuous integration and automated deployment capability. Aimee took Bill’s requirements and declared them as acceptance criteria. Together they formed an influential, cross-functional team that was critical to building momentum and adoption within the organization. And together they experienced the pitfalls and challenges of implementing change, working against the resistors and laggards. At times project goals were in question but through determination, and keeping focus on the singular goal and utilizing expert help, they turned the project around. On June 6, 2012, upon a source code repository commit, MITRE’s first application was automatically built and deployed, cracking the doors to a new era in software delivery. Now, 65 software applications later, MITRE’s IT is poised to expand into the future with a sustained focus on DevOps and a Continuous Delivery architecture.

Chapters

Full transcript

The complete talk, organized by section.

Aimee Bechtle

Hello, I'm Aimee Bechtle, and I'm a principal software engineer with the MITRE Corporation.

Bill Donaldson

And I'm Bill Donaldson. I'm a senior principal engineer. I'm not quite sure what that means, but I'm a project leader on a bunch of large Air Force and Army ERP programs now.

So, welcome to our excellent DevOps adventure. Some of you may get that reference.

It's really cool being among the clan that really get DevOps. It's exciting to be talking to the same people that speak the same language, because when we go home, sometimes people don't get what we're talking about.

Our journey first: I first started working with Aimee 10 years ago. I hired Aimee as a performance test engineer manager. She was eight months pregnant and ready to go out on maternity leave. As you can imagine, my wife's going, "Why are you hiring someone who's pregnant and going to go out for many months?" But it was well worth it.

You'll notice sort of a family theme here, because this pregnancy thing reoccurs. I had nothing to do with it. I had nothing to do with it. This is our DevOps van that takes us through our journey.

So, a quick overview of what MITRE is. We're 7,500 employees spread around the world. The British Empire used to say the sun never sets on the British Empire. That's no longer true. But for the MITRE Corporation, it is still evolving around the world.

We're a not-for-profit corporation, and we do work for the DoD, the FAA, and the IRS. As you heard other people talk about risk averse, we're not risk averse. We have allergy to error. So, as you can imagine working with these customers, when we have our data, we don't want to have anything happen. As part of that, we've even instituted SOX compliance, even though we're a not-for-profit. We're not a public company. We don't have to do that. But that's sort of the culture that we live in.

Our DevOps journey started about five years ago. I was the head of the IT development group for our custom web apps. We had about 50 developers working for me and about 100 applications: web, Java, PHP, some .NET, and Oracle.

Aimee Bechtle

Five years ago, I had transitioned from the performance test team, and I took a new position leading the application release and deployment. I crossed over from dev to ops. Shocking.

In that new role, I was leading four and a half staff who were performing about 2,600 manual deployments to our test and production environment, supporting about 300 apps. That ranged from the custom web apps that Bill built to the COTS products, desktop, you name it, we were pushing it out.

In this new role, he was now my customer, and I was pregnant yet again with my fourth child. So he was going to have to wait another maternity leave to work with me, but that was a small challenge. It's a very small challenge compared to the journey that we were about to start, which is the journey we want to take you on.

We want to walk through in this presentation, we want you to take away two things: the emotional roller coaster that we went on, and then, second, what we did after the system was up and running to get people into the system. Hopefully you can take that back to your organizations if you need to.

And so we packed up the van, and we went on our road trip.

This is the process that I inherited when I first took over the team. I'm not going to get into the details, but I think a picture says a thousand words. It was an obstacle course. And even though it was an obstacle course, it was a very well-known, well-understood, and practiced course that the development community knew how to navigate through.

In an ITIL assessment, of the eight processes that were assessed, we scored a three out of five. It was the highest of all the processes because it was repeatable.

That sounds great, but it still had a ton of problems, right? The problems, the three Ps, I call them.

It was paperwork intensive. The developers had to write tons of instructions for my team to deploy the apps. One of our applications required a 43-page CM form to roll things into production.

It had a lot of people. On a deployment, we had to hand off to the sysadmins, the DBAs. If you came to my team and there was only one step, my team had a 24-hour SLA. But as soon as it left my team to be handed off to somebody else, that SLA went out the door. They didn't matter, and people could wait a week, two weeks, for their application to be in the environment.

To do a production rollout, we would take seven people, four hours, and we'd do that Wednesday night. We'd start at 8:00, and we were lucky if we finished by midnight.

The last P: processing. We were running the builds in each environment, getting different results each time. None of the environments matched. This all probably sounds really familiar because I think everybody has the same story.

We were doing this process for the following technologies. We were using the typical tools. We had build scripts. Some people did Subversion for source control, and Java and Oracle was a common platform, and Windows and Linux.

We had an average cycle time, and again, this is across 300 apps, so we had a really wide variance. It was about 15 days from the time that they said the code was ready until it was in the user's hand, but it ranged from a minimum of 30 minutes. And that 30 minutes, trust me, is just a simple SQL script being run, and that's probably under an emergency situation. And that happened maybe once a week.

Then we saw up to 486 days for larger, more complex apps. And, as I mentioned, it was about 2,600 deploys to just production and test.

Bill Donaldson

So, while Aimee was out on maternity leave, one of the key things I wanted to do was change the culture. I wanted to make it an exciting place for people to work in. The big thing I saw that changed people's life was freeing them up.

What we did was we started doing all the typical continuous integration things. We had test-driven development, all the things that everybody else has talked about.

Aimee Bechtle

And so he's hit my service, which, boom, he was developing really fast, and then he had to wait to get his applications in. He's thinking, "Oh, they're slowing me down."

Meanwhile, I'm thinking, "I just got this level-three ITIL process, and I have a mature team, and I'm going to have my fourth child, and this is going to be great work-life balance. This is going to be easy, right?"

No. Boy, was I wrong, because he had different ideas.

Bill Donaldson

As Aimee came back, she met with us and I told her all the great things we were doing. We were super excited about doing all the continuous integration work, and we were rocking and rolling. I was trying to get her on board because we knew that was the next step, was get her to get the release management stuff up on plane.

I told her in sort of probably not the best way. I said, "Hey, your service sucks. Its 24-hour SLA, that's about 23 and a half hours too long."

Aimee Bechtle

So I fly up to Boston from DC, I meet with him, and he's saying these things to me: continuous integration, Jenkins, automation. And I don't know if you remember Charlie Brown episodes talking to his teacher: "Wah, wah, wah, wah, wah." I'm like, "What? But we're so good."

I hear what he's saying, and when you're a team leading staff and you hear automation, fear kind of sets in. You're a little worried about getting jobs replaced and your own service. So I'm hearing him, I am concerned, I'm fearful, but yet I am intrigued.

Bill Donaldson

So, Aimee started talking about making processes, changing the tools that were already existing, streamlining reporting, and I'm thinking, "Eh, no, she really doesn't get it." We've talked about that, as other people have talked about it.

We sat down with her and really went through what is Jenkins and stuff like that. It took a long time to get the folks that were doing the manual stuff into what the new techniques and tools were.

We also brought Chris Sterling in, and he said, "Hey, this is the land of unicorns, but you can do this. We're mere mortals. We're not horses. We're just mere mortals."

So, a lot of work with showing her how we could get it done.

Aimee Bechtle

So I went back up to Boston. I met with Chris Sterling. I heard, talked to the developers. But the most important thing was I got my own education.

I started attending all the webinars that came in my mailbox every five minutes, and reading white papers. I researched the tools and technology, and I bought Jez Humble's Continuous Delivery book. I read it front to back. It's now my Bible at work.

I started to get it, and it wasn't so much anymore that I was fearful about what was happening to my team. I started to realize this is something my customer needed, this is something that IT needed. I changed my perspective, and I called him up and I said, "Okay, let's do this."

Bill Donaldson

And the chasm... Make sure everybody got the animation at the bottom of it. It started to close between us, so we thought.

So, they were still in the waterfall methodology, and we said, "Hey, Agile's a better way." The first thing besides teach about Agile, we defined some acceptance criteria.

With that, this is literally what we had. It fit on a single PowerPoint slide, really simple. The whole thing was all the things you typically see: single command, build, deploy, all of this stuff.

The key thing was that we wanted this to be enabled to anybody. The ops folks said, "Oh, no, it can only be for certain developers." We're like, "No, this is going to be any tester, any developer."

The other thing was we wanted some deploy to production without handoffs. The CM team had to be the guardian, fine, but no handoffs after that.

Then, we had been doing Sonar and code quality, so we said, "Before it releases, we want to check in before it goes to production that you've met the certain quality standards."

Aimee Bechtle

And I agreed with his acceptance criteria. I had some of my own. I'm responsible for SOX audits. We'd still do the SOX compliance and working with the external auditors on meeting all the IT controls. So I wanted to make sure there was auditability and traceability.

Because we are the guardians and maintain the integrity of the production environment, restoring back to that known good state was really important, and it had to be able to roll back easily and readily and rapidly.

I agreed to these, and this became the first iteration of our backlog. I went into our Agile management tool, and I put these in as stories. And we started to form our launch and get ready to kick this whole thing off.

Before I could do that, I had to hire a build and release engineer. I opened up a req, and I brought in somebody who used to work at Fidelity.

We then put together a team. We were really careful to choose people that were influential and well-respected by their peers to be on our team. And it was cross-functional. We got everybody together. We went over the mission, the vision, the goals. Everybody nodded their head, "Yay, yay, yay." We got everybody on board, and we picked our app.

The first app that we picked was simple. It was not something that had multi-platform and went everywhere and had a lot of modules or moving parts. It was a simple Java app with an Oracle database backend that could help us get the toolchain in place. That's what we wanted. We wanted to get the technology and the process in place. We would worry about evolving to other platforms later.

Everybody said, "Yay, let's go." So we got everybody on board, and we went.

Bill Donaldson

And we went. And we went.

We had monthly sprints and product reviews at the end, and the feedback from me and other development team was, "This isn't going to work." It was ops thinking. It wasn't enabling the developer. It was going to take months and months for every application to get into this toolchain.

We said, "Hey, it's not working." We had this conversation repeatedly, and eventually I just had a heart-to-heart with Aimee, and I said, "Aimee, this isn't working. You need to understand this is broken."

Aimee Bechtle

Gene used the term this morning, "stuck in the mud." So I guess our van was stuck in the mud.

Somewhere along the way, I had gotten lost. It was taking a really long time. I knew it wasn't working. We weren't getting what we needed to in place.

There were a couple reasons why. I was trying to be cost conscious, and I did not get training or help from the outside. I figured I could use my staff and the knowledge that I hired and to get him to learn the tool. He could figure it out on his own. And so that added a lot of time.

So when Bill called, it hurt. It did. At this point, I was at a crossroads. Do I drop the project and say, "I failed, let somebody else take it over," or do I be the person that I know he hired me to be, which is that stubborn, focused, loves a challenge?

And that's what I did. I didn't want to let him down. This is family. This is my work family. This is a guy stuck through two pregnancies, and I relocated to LA for three years, and he supported me in all of that.

I decided to do it, but I said, "I need help." And that was hard for me, to say, "I need help." I'm not that kind of person.

So we got, at this point, a course correction, which there's somebody in the room here who helped us do this course correction. Eric Minick is here.

We bought a tool, and we didn't get training, so the first thing I did was, "Let's get training." We had the vendor. Eric came in for a couple days and taught everybody who was going to work on the team and touch the tool how to use the tool and what it does.

Then we brought in expertise. I actually kept Eric on for another week, and he reviewed our implementation and looked at what we were trying to do and did a gap analysis and recommendations on how we could get out of this hole and met with us weekly afterwards.

Then Bill brought in, from the developer side to help with the build and work with those guys, also brought in professional services.

The most important thing we did was we freed up resources. This cannot be a hobby project. If you want it to be successful, it's got to be at least, we said, an 80% commitment from the staff.

So he made sure his staff on the development side was available, and I made sure the ops staff were available, and everybody focused and got it done. And we worked against a new plan. We regrouped the backlog, reevaluated our tasks and schedule, and made it happen.

So 13 months later, after that kickoff, 13 long months, we implemented what we call the automated build and deploy solution. It's affectionately coined AB&D, and AB&D is as easy as one, two, three.

Developer commits code, it runs a build, deposits the build, and then we run the deploy workflow to deploy it.

And it was a success.

We celebrated our success. We do what everybody does to make everybody happy. We fed people.

Everybody was excited it was working. But again, that was one app. Now we had to climb the mountain. The van had to drive up the mountain.

How do we scale? I was just a bottleneck for so many years. I didn't want to be another bottleneck. I don't want people to have to wait a month to come into this system. I was very mindful of that.

We wanted to be a value-added service. We wanted to get people on board. So then now we started the marketing and adoption phase.

Bill Donaldson

The key that we did was we identified all the applications that could be enabled for continuous delivery. What we did was we made it very visible. We put it out in the hallway where the developers come in and out every day.

We identified and put them on stickies, and we identified which ones were enabled, which ones hadn't been, and then we had a work-in-progress queue. If you notice, it's much smaller because we didn't want a lot of work in progress. This way, it kept things moving, and if something stayed up there for too long, everybody could see it, and we could all pitch in to make it go.

Aimee Bechtle

We wanted to work with the developers and engage them in a way that they were accustomed to. So we developed sample stories and sprint plans for them to pop into their backlog so they can get started, and put them in their release planning.

Bill Donaldson

After we had the stories, we also knew that some of the people weren't really up on plane for what it was to get this stuff into the tool. So we created sample scripts for them. We created checklists that they would do to make sure that they could actually work it and get it going there, and we created a SharePoint site that they could go pull all this information from themselves.

We hand-held. You've got to handhold, because everybody's coming at different levels.

Then we started marketing. We got everything, infrastructure in place. We were allowing them to go on fast. But what was going to motivate them? So what we did was we created more visible metrics.

Again, simply in the hallway by the developers, but also by the senior management. What we did was how many deployments to production? Because that's the key thing. Not did you get to deploy somewhere or any integration, it was to production.

What we did was we set a target for 100% of what we did one year manually in a quarter. So we were looking for 400% increase of our deployments.

The other thing is every Wednesday afternoon, we had a snack fest. You bring food, developers show up, and we had the core team come up and say, "Hey, which application are you going to put into the framework this week? Let me know because I'm going to attend your sprint planning sessions," so that we provided more guidance and help to get them there.

Aimee Bechtle

Nothing was better than the water-cooler talk. The guys that came in on Thursday mornings after a Wednesday night deployment and like, "I only worked 15 minutes, and I got to watch the Bruins play." Or the developer that on Tuesday morning, the three changes that he developed the day before were already in use by the user.

So they all started talking and bending people's ear. Then I had people start calling me because they knew, "Oh, I don't have to work for four hours on a Wednesday night if I do this?"

I also trained my staff that when they received the manual instruction sets, what to look for as a candidate for this application. Then we started a list. Who are we going to go talk to and who are we going to go demo?

Then we added formal metrics to our CIO and work package reviews with senior management.

Bill Donaldson

I love hearing those stories about how you're able to get a release out very quickly.

So that was great, and we started to finally see the end of, after all the potholes, the speed bumps, flat tires, we started to see results.

We do have 68 applications in the system. For different reasons, we still have people who don't want to come on. We have people who are still afraid to export from head out of SVN. There's late adopters.

We support multiple platforms. We can turn platforms around really quickly. Our engineer was really great at designing kind of a snap-on way of getting technology in.

We've dropped from 15 days to nine days. It's a 40% improvement. It's still not where we want to be. I'll get into that in a second. But from 30 minutes down to two minutes, and a lot of our apps do go through in two minutes, that's like a 98.5% improvement on deployment, or just cycle time from the time code was ready and it's in production.

Seventy-five percent of deployments are automated. Our throughput's up about 38%. We have a ratio of three to one of deploys to test to prod. It's about what we're doing. So a lot of that throughput is to test.

Aimee Bechtle

I didn't lay anybody off. They got to move on to higher-value, higher-skilled work, and their professional careers have taken off, and they love it.

We applied and got the recognition for InformationWeek's Top 500 Business Technology Innovators last year, profiling this project on our application. All of this is great. We're really proud.

It is nothing compared to the quality of life that we all experience of no more long weekends or long Wednesday nights, or spending the next day at work trying to figure out how to revert back an app.

Bill Donaldson

I think that's key. Changing people's lives, right? Giving them time back. That's why a company like us, a not-for-profit corporate IT government, quasi-government organization, would do this: because we wanted to give life back to our staff.

So now I'm looking to the future, and we have the basic continuous integration and automated deploy, but I want to do full-blown continuous delivery.

Even though we can do automated builds and deploys, our infrastructure is still manually configured and provisioned, and they still have snowflake servers. And though we have automated performance testing, we don't have automated functional regression testing. We've really struggled, a lot of stop and starts on that.

Aimee Bechtle

Wow. So we're having a family reunion. We're getting back together, and I'm not replacing, I guess, but we have two new team members.

We have an ops automation engineer now on the team who's gone to Chef training, because I'm not going to repeat the same mistakes. I've had a consultant since June working with us, and we are getting everybody the training, and we're going to approach this the right way.

And we have a test automation engineer. They're saying, "Let's do it again. Let's evolve."

Bill Donaldson

And off our van goes.

I think the keys to this are anybody can do it, right? If a not-for-profit, government-facing thing, where everybody's very risk averse, can do it, anybody can.

The key is knowing where you want to go, right? You have to know where you want to go and why, and you have to keep that in your forefront. Aimee talked about getting stuck and should she quit, but I think it's knowing where you want to go.

And again, patience. Not everybody's going to be in the same plane. The developers got really, really frustrated because the ops folks weren't on board, and they wanted to take their own approach. So you just have to make sure that you get people through this change management.

Education is key. And then, obviously, using the Agile stuff that everybody knows about.

So what do we need help with? Probably everybody's familiar with: we need to change skill sets and mindsets to support infrastructure as code and test automation. The infrastructure as code and the idea of dropping down the number of sysadmins is very scary.

We need to break up the silos and figure out how to move from a development team to a cross-functional delivery team. I asked a sysadmin, I said, "How would you feel if you got a storyboard card from one of the development teams?" And, "Oh, no, it's got to come through the ticketing system." So getting through things like that.

The other is we haven't solved the large ERP systems, how to get those into DevOps. That's a huge challenge, trying to solve that in the DoD. I see a lot of head nods. So if people have ideas, or where your challenges are, maybe we could partner as a community and move this thing forward, because it is a big need.

With that, I want to thank everybody for their time. If you have any questions, we have about five minutes left. So we have a few time for some questions.

Aimee Bechtle

Okay. And we put the lessons learned as bumper stickers that we collected along the way. Like when you see cars, you don't have to meet anybody anymore. You just read the back of their car, you'll know who they are.

Q&A

Q: Real quick, I come from a nonprofit as well. One of the things we're struggling with is do we need to make any structural changes to support some of the DevOps activities? It sounds like, based on your slide before this, one of the things you're struggling with is what do you do in that space? So did you do all of this without any structural changes?

A: Yeah. So we did the Agile thing, start from where you are and start iterating with what you can control. Making changes to the organization is hard, and you have to understand why. If you do it premature and it doesn't work, then what?

Q: Yeah.

A: So no, you can get this going with just as it is.

Q: Just promoting collaboration.

A: Absolutely.

A: Yeah. Yep. You've got to find the change agents.

A: Well, and I would say a key thing for us was hiring a build and release engineer with an eye for how to automate.

A: Yeah. Question?

Q: The criteria you used to identify when your applications were ready to be put into the training, is that shareable?

A: Yeah. Gene, when we submitted this, he was so impressed. He goes, "You got this thing out into the public." So it will take a little bit of hoops, but yeah, absolutely. Send us an email, and we'll get it to you.

Q: Great. Thank you.

A: Other questions?

Q: So one of the things, obviously, is cost. I've heard some people talk about DevOps initially can appear more expensive, especially if you're trying to eliminate the specialists and go with generalists, and you go to engineers and things like that. Did you guys experience that? Obviously, the return is well worth it, but initially selling it. It's like, "Hey, I need this release engineer," or whatever.

A: So I have to be honest, I did have somebody leave my team, and there was an opening, and I struck while the iron was hot. And I reengineered that position. I actually called Bill up and said, "Help me write the req."

A: That's right.

A: And he helped me write the req.

Q: Got it. So you used existing funding to kind of do a switch?

A: Yeah. Change of direction.

A: And then for the tool funding, we just started off with a small number of users. And then when we proved the ROI and the senior management saw what it did, then they started throwing money at it.

Q: Got it. Great.

A: Other questions?

Q: At one point you talked about the metrics, and you said the formal CIO metrics, but then you quickly changed. Can you share what those metrics were? Because that helped with the convincing part.

A: Yeah. I think that some of the ones that I was focused on was the time to get things into production, so showing the overall cycle time. That was a compelling thing because they could sit there and say, "Oh, we could do that." But also along with those, we did get the number of deployments in there.

A: Yeah. That's what this graph is, actually, is the blue line's the increase in the automated deployments, and the red line is the decrease in manual. We crossed over 50% about a year into it, and we showed that one. I also showed a decreasing FTE line as well.

A: But also what really sold them was the stories about getting something into production in five minutes. The stories, like upstairs, you hear about Moby Dick. These allegories. If you can get these stories going, the water-cooler stories, those are the ones that sell also. They love the metrics, and if you can get a dashboard, it sells because they love pretty pictures.

Q: Okay. So it was the metrics you were using anyways. It wasn't something special you did.

A: Oh, no. We added these metrics into it.

A: Yeah. We added them.

A: Yeah. These were new metrics.

Q: Okay.

A: We didn't have them before.

A: So I think we're at time.

A: Yeah. All right. All right. Thank you.

A: Thank you.