Three Steps to Change: Lessons from Battling Bureaucracy
Paula Thrasher has spent the last 15 years trying to implement Agile cultural in the federal government. Her DevOps journey began after a career switch from application developer to IT Director, when she started trying to bridge the worlds of Development and Operations.
Having worked with 16 different federal agencies and components she has seen both success and failure in IT transformation at scale. This talk is not about the successes – it’s about different ways to fail.
Cultural change is hard – and real transformation requires changing how organizations work and collaborate. When transformation does succeed, it shares a common themes that make it possible.
Chapters
Full transcript
The complete talk, organized by section.
Paula Thrasher
Hi. Good morning. Thank you. Thank you, Gene, for having me.
So I'm going to talk a little bit about my journey, sort of my experience, a little bit about who I am and kind of how I got to be Paula, I guess.
I started my career actually as a help desk for an IT startup, and I eventually worked my way into software, and I did that for a long time. And then in 2010, I took a random detour, and I became the IT director of a small-business federal contractor. It was about 500 people at the time, which apparently some people don't call small, but in the federal government, that's called small.
And we had outgrown our IT department at the time, and I was sort of tasked with trying to build it. And that was my first DevOps moment, was actually realizing, from having been a developer for so long, now that I was the IT operations person supporting the developers, I realized, like, "Oh, I know why they didn't want me to touch their servers. I'm a hacker."
So I kind of appreciated the other side, and it kind of coincided with, I think, some of the DevOps movement.
After I left that role, I went on to two other agency transformations where I helped agencies get to that kind of DevOps place in the intersection between sort of dev and ops. And I'm going to talk a little bit about some of that.
Right now, I'm actually an application delivery lead with CSC. And I actually have a program that I work with for a federal agency. I've got about 300 employees. I have 28 applications. I have 24 development teams. I have an operations division, a security division, and all the things that come with running a 300-person program.
And I also have four other agencies that I'm sort of a peer mentor in the applications group. We work horizontally, and I work with my counterparts in those organizations to sort of help them on their journey as well. So I kind of have my role as an advisor and then my role, my day job.
So I describe what I've spent the last 15 years doing in the federal government as the following problem, right?
You can't change the money, because it comes from Congress with really specific strings attached. Or maybe it's a fee-funded agency, but even then, it comes from the business with strings attached. You can't usually change the contract. It takes years to get a contract in the government. And once it's written, I mean, that's in stone, right?
And you can't really change the people. Okay, so they like to change the contractors. That's why they hire us, so that they can fire us. But even when we do that in the federal government, what ends up happening is that you fire XYZ company, and then the new company comes in, and then they hire all the old people that used to be at that company. So it turns out that you're not actually in a situation where you can do what you do in a lot of, well, I mean, I think you do in a commercial environment, which is you just fire everybody, and then you get some DevOps people. I don't know.
So knowing that, it kind of presents an interesting challenge to transformation because it's this huge bureaucracy with inertia, and all the things that you would think you would use to change the organization are just not available to you, right?
So one of the things I kind of stumbled across is I realized that I wasn't really a software person. I was a change person.
And borrowing from other disciplines, John Kotter is kind of like the godfather of change. And even if you haven't read John Kotter, I'm sure you know these words, right? There's a Harvard Business Review sort of compendium of his stuff. He's written numerous books on this topic.
He talks about establishing a sense of urgency, right, and creating a guiding coalition. And for the record, this is great stuff. This is totally, totally true.
But as I've studied this and have lived this, I've sort of simplified the, like, okay, I've screwed this up a bunch, and I think I've got down what the key parts are, right?
The original title of this talk was going to be about failure, and I started taking all the times I had failed trying to transform the federal government, because we're still struggling, and I came up with 36 different things that just didn't go well. And I thought, let's not talk about failure. Let's talk about what it looks like when it works.
So this is my simplified, I'll save you the effort of reading the book, the top three, which is: it really comes down to three things, right?
First, you have to want to change. Then you have to know what it is you're going to go change. Then you make the change. And then there's this whole thing about like, and then it's part of the culture, and da, da, da, da. But basically, if you do those three things, you kind of get to the part with the culture.
And I think fundamentally it's kind of about this growth mindset versus a fixed mindset. That what you're trying to do to get a real change program in place is to get people to that growth mindset, that they can change and that they have an opportunity to continuously improve.
And that's kind of the opposite of why people go work for the government. Both, I'm not going to slag on the government, it's just as bad on the contractor side, trust me, where you join for stability, right? So you sort of career-selected for a role where you don't have to change much, right? There's a lot of people that just don't want to be changing all the time. There is sort of a fatigue.
So getting them to the growth mindset that you need to have continuous improvement is kind of the journey you have to go on to get to the part where you actually get to make the change and do all the awesome things.
So the first one is want to change, which, I mean, this sounds super obvious. I'm sure there's some fancier way to know how to make people to change, but I've pretty much figured out that the most effective one is to have a crisis.
And I'm glad 18F came and talked about, folks talked a little bit about Healthcare.gov. That's obviously one of the more recent and very public crises.
I have my own story. So the year was 2012, and it was just after I'd come off being this IT director, and I was like, DevOps, right? And so we're coming into this agency, and we're here to transform them.
And it was day two of the contract, and I'm brand new. And I walk in, and I say, "We are going to have a change control board meeting."
And this agency, I can't name names because bureaucracy, and to protect the innocent I will not disclose. But this is an agency that we were working for their headquarters, and a large part of what we were supporting for them was they ran this website that was actually one of the top five websites in the federal government, and on any given day, one of the top 100 websites on the whole internet. Millions of hits a day. Very, very public.
And they were coming out with their very first mobile app, and they were really excited about it. So excited, they had put a press release out before the change control board meeting to approve the application.
So I get to this meeting, and I basically witnessed this battle, because we hired the previous people, right? So I witnessed this battle between the developers and the ops people.
And the developers are saying, "Yep, we've tested it. It can totally support two million users. We're completely confident it can do that."
And the ops people said, "Well, we just got this thing yesterday, and there's no way it can support that on the hardware you're running it on. The absolute max has got to be, like, 400,000. No way. Absolutely can't go any worse than that."
And they bickered back and forth, and it was pretty clear that even though I had no idea who was right, this was not a good scene, right? But it's day two, and the press release is out. So it's going, right?
And so it did, and we made the front page of TechCrunch and Reddit and all these great little internet sites, and it was awesome. And millions of people signed up. And when about the 450,000th person signed up, the whole site came down. Whole thing just totally crashed.
And then we were also on the front page, people slagging on, "What's wrong with the government? Stupid."
So not my good day three, right? But it was great because I thought, "This is good. Okay, this is like a disaster, and now they will listen to me when I'm like, 'I got a better way.'" Right? So I'll get back to that part.
So you don't always get the crisis you want. And plus, actually, sometimes you get fired when you have a crisis. So the other option is that you can actually make one.
And I've actually coached other teams that if you're having a hard time motivating another part of the organization to join your sort of DevOps revolution, you just kind of have to make it hurt.
So this is my current program, actually. And this is the story of us before we were agile and after. And before we were agile, we had 32 applications, and we deployed once a year. A few applications deployed maybe quarterly, but basically once a year, right? And ops team supporting us.
And then as we made the transformation and more and more applications transformed to agile releases, over time, we've hit 290 as of May. I think we're on pace for about 350. Actually, may even go a little higher than that for this year. So you can see exponential growth.
Awesome to be a developer. My project, they're loving it. Kind of sucks to be the ops team, right? Because they have the same number of people supporting 220% more deployments, right? And it's not any more automated than it was when we started.
So originally, I think they weren't necessarily interested in talking to us about how we could help automate. But the more we started throwing at them, the more interested they were in hearing our side of the conversation in how we could help them, and vice versa. I think it's forcing our teams to think about the fact that they're doing this to their counterparts.
So then the third sort of is the, I think, nobody knows you can do this. You could let a crisis happen.
So I guarantee you somewhere in your organization, there is a crisis waiting to happen. You just have to find it.
And when I was IT director, so the year was 2010, and I'd just taken on the role of IT director for this small business, and we'd really clearly outgrown IT, and the help desk in particular had no ticketing system. And, I mean, I started my career at a help desk. I'm not an ops expert. I wasn't really feeling like I knew how to do this job, but I did know one thing: you need a ticket system, right?
And I couldn't convince any of the leadership that this was important, and I couldn't figure out why. And then I realized. The two folks that were on the support team had come up with this VIP help desk system. They had a separate phone number. It forwarded to their cell phones, and whenever a VP or the CEO or somebody called, it went straight to their cell phones, and they immediately answered it. And they solved it.
And I thought, "Oh, I get it. No wonder they don't realize they have a problem, because they've never felt the problem."
So I was really fortunate. And I went to the CEO of the company and said, "Did you know they're doing this?"
And he was like, "No, I didn't. What?"
He'd never asked them to do this. They just did it.
And so I said, "Well, do I have your permission to just stop doing it?"
He goes, "Of course. Sure, sure."
All right. So I told the team, "Just stop."
I think they, at that point, were like, "Oh my God, who let her in charge?" But they did it, to their credit. They put regular tickets. And the first day that the CEO had an email outage...
So by the way, our email server had an uptime of 80%. So it went out a lot.
First time the email server had an outage and the CEO didn't have email for a day, I went back with my proposal for a help desk system, and it worked.
So while that's not necessarily a DevOps story, I definitely think if you're trying to motivate somebody to make a change, you have to kind of give them a reason to rethink their ability to change, that growth mentality. You can always make a crisis. So if one doesn't happen to you, I guarantee you can just go find one. Right?
Okay, so it turns out that that's kind of not enough. And I think that that's something that I learned along the way, is that it's not enough to just have this cataclysmic event that makes you want to do this. You actually have to organizationally know what you're going to change.
And I think you have to really help. This is where I think the DevOps piece is different than just the traditional John Kotter, where you just stand up in front of a crowd of people and tell them these things, and you're all going to go home and implement. There is kind of something special to DevOps, that there's a certain kind of way of thinking that you have to help people through. And if you can get to that, you can help them know what to change.
And I really think it's powerful visuals and measurement, and I think those two things go together. And ultimately, that gets you to kind of finding a big picture.
So why I'm so convinced that the visual works so much better than just the talking is a little bit about this story.
So on my current program, this was in April of this year. I know some folks listened to Damon Edwards give a talk yesterday, and he talked about this value stream mapping a little bit, and CSC obviously has done a lot of this, Simon Wardley and others. But this is super powerful.
And what we did was, this team had actually already been doing a transformation. They'd already been part of an agile pilot. Well, they'd already been doing agile for two years, but they actually had been specifically working on acceptance test-driven development and some other improvements. And in December of that year, they took on this testing improvement, and between December and April, they had gotten their cycle time between when they started a story and when it was ready to go from about 13 days down to six.
But they were really blocked. And then the retrospectives were getting really nasty. It was everybody else's fault, right? Why can't we go any faster?
And we did this over the course of about two days, where we got everybody in the room. We got the product owner. We got the government project manager. We got the person that's sort of the business analyst domain expert. This guy has the system in his head. He probably knows it better than anyone else. We got the lead engineer. We got some of the senior technical people, the testers, the independent tester, which is a different contractor. We got the government release managers who supported us. We got the folks in the deployment, which was also another contractor on that side, over the phone because they're actually based in Texas, in the room. And the folks on our team that answer the 1-800-HELP desk when the thing breaks.
Everybody in the room, and we used this model to actually model a particular story. And actually, I'm really glad about the previous talk being about mainframes. This particular story is about the integration that this application has to the mainframe.
And just a little context, this agency has a very large constituent-facing element to what it does as a mission, and this system runs one of the most important services the agency provides. It's really part of the meat and potatoes of this agency. If this system doesn't work, a really core part of the agency mission is not going to work. It's not operational. It's very high profile. But it's entirely internal. This is how the agency addresses that. And there's a public-facing component that goes with it, but that's not this team. And there's a back end, which is sort of like the mothership in the mainframe that kind of keeps the system of record.
And so this particular story is actually about a workaround, actually, that we have modernized the system to be web-based. And when it sends transactions to the mainframe, sometimes users were worried it wasn't what they thought it was going to be when it sent back. So this was to address the fact that they were concerned there were some data quality issues. It was to give them a screen to let them see what it was going to send to the mainframe. That's what this story was about.
And so the first part is how that story came to be in the release. They talked about it, they looked through the ticket logs, a little Pareto chart. This is causing a high number of tickets. Here's what the fix would cost. Is it worth it? Da, da, da. Right? All that kind of stuff happens. It makes it into a release plan. Sprint happens, it makes it into a sprint. Team does the work, acceptance test-driven development, all sorts of fun agile stuff.
And this is the development part, and this is everything else it takes to deploy that story.
And this was an eye-opener, for me too, actually, that the main reason this team couldn't go faster was the only improvement we were making was to the agile development team and not to the operations side.
And of course, so there are some boundaries here, and there was a lot of waste going on. There was a lot of repeating. We had automated everything in dev, and the second we pushed that thing into test, it was manual for no reason other than we just weren't running any of our automation in the test environment.
When we handed it over to another contractor, they repeated a build and everything else that we had just done, and then they threw away the thing that they just built and used the thing that we gave them instead. Not sure why they did that.
And the other issue was that all told, every time we would go into our staging environment, which was, because it's a mainframe, that's the only place we have access to the emulator, and it's also the only place that we have access to certain integrations with other applications. So getting to staging was a really key part of being able to trust and verify our stories.
When we got to that environment, there was an issue. It was like a 508 issue. I don't know if that actually makes any sense. It's an accessibility rule in the federal government. So we had to go all the way back to the beginning and repeat just for this one thing that we couldn't catch until we get to staging.
And all told, that process took them... It basically calculated that in the last couple of sprints, it wasn't possible for them to do it in anything under 48 hours. And lots of waste going on here. And it was a huge bottleneck because if it takes 48 hours to do something, you're not going to do it very much, right?
So visualizing this was really helpful because the best conversation that happened in the room when we had this one was when the release team and the ops, the team that puts it in production, were pushing back at my developers, being like, "The problem is your build. It's your release process, because when you release this, you have to stop the services in a certain order or else the whole thing breaks, and it's too fragile."
And the developer goes, "Fine. I know what your problem is."
So the conversations that they were having, and even the conversations about this repeat process, once they sort of talked it through, they agreed that it was kind of like, "Why are we doing that? That doesn't make a lot of sense."
And that was a really powerful conversation for them to have because collectively they were figuring out how to solve this problem.
Because before, I don't want to imply that the teams weren't trying to solve this problem before, because they were. Operations team was totally trying to optimize their stuff, and my development team was totally trying to optimize their stuff, but they weren't talking to each other. They weren't having the conversation.
And when they were talking to each other, they would get on conference calls and yell at each other. They weren't actually having a constructive conversation about how to work together.
And so that's why I think a visual is really powerful, because in the world we work in, it's very abstract. And if you don't have something visual, I can make you want to change all you want, but when it really comes to doing it, we've got to actually talk about how we're actually working and how it's actually going so that we can make some constructive progress.
So the other thing I'll just mention briefly is that numbers kind of work too. And this is actually same agency but a different project. This group implemented automation as well, and it's resulted in them delivering 160% more features after they've automated some things, and that's helped them get advocates.
So I think a lesson that I've learned as well in terms of use the right tool for the job is that there are some organizations that need to see numbers, and there's some organizations that need to see pictures, and most need to see both.
And back to the one about the first agency that sort of blew up, a really bad public relations nightmare. A mistake I made at that agency, in terms of my transformation, was I didn't take a baseline. And when we improved things later on and we took their website to Amazon and we did it with one-week deployments and full automation, I wanted to just be like, "See, we can do it."
And they were like, "Oh, we were awesome before."
And I thought, "Are you kidding me?"
But they didn't know it, and I couldn't prove it anymore because I hadn't taken that baseline. So if you're starting on this journey, it seems like you don't need to worry about the numbers till later, but there's some value in taking a baseline for how you are, because it lets you do this. It lets you show how much you're improving when you actually do succeed.
So I thank this program for doing a good baseline because it shows how much you can actually get out of these improvements. So you've got to pick your tool. Numbers are really powerful, too.
And I guess I fundamentally... Oops, backwards. I fundamentally kind of think that you have to get to ground truth.
If you have one of these, this is a status report. It's got green and yellow and red. Raise your hand up if you have a status report that looks like... Oh, lots of you. Kill it.
That's not truth. I had people on my status report telling me that their releases were green, and then I would look at the actual release, and they were having all sorts of issues, and I thought, "Well, that's not green."
So for all the people that have brought to this conference actual dashboards, which I can't show you because it's a classified da, da, da. But I have them, and that's what my team briefs me on every week now. They do not give me a status report. I made them stop. I had to have a whole two-hour meeting with my team to convince them that they could stop giving me that. But I made them stop, and I'm just looking at the dashboard.
And you really have to get to that because everyone has a different view of the world. And I think if you're an ops person, you see certain things because you're used to being ops. And if you're a developer, you see certain things because you're used to being a developer.
And having lived with a foot in both worlds, I can appreciate that the view of the world is different. And when you get to that visual, that's how you get to ground truth. You have to actually get to the actual work and the actual systems, because that's actual truth. And it will be much easier to solve the challenges of talking between worlds if you're talking about the truth than if you're talking about green, yellow, red.
So then this gets to be the fun part, right? So now we know what we need to do. So now we need to make it happen.
So build a good team. I just kind of found one. I picked on a team that was a really good team. It was about 20 folks. As a team, they're very cohesive. They have very constructive arguments. They actually are two separate teams, and they kind of plan together. And that really helped because I was trying to sort of break through for the organization, and I needed a good team that would kind of stick through the mud, because this is going to get kind of hard.
And the next part is important, too, because I think in the government, we want to write a policy, and here's the process for doing agile. And I have, even internally in CSC, we've found that as we're trying to transform ourselves as a company, the first people that want to approach me are our process group.
"I would like you to write how we do this DevOps process."
No.
And we're trying to change the conversation to be: it's not about writing a process document. It's about having a culture. We need to help people, empower people to lead. It's about servant leadership.
And I think actually most of it then is going around the organization. And your job is not to do the change for the team. I didn't actually make any of those changes that the team did. I didn't do a single one. I tried to go around helping the organization move blockers so that they could do that change. That's all I did. I just tried to help them.
And I'm not the one changing. And I think that's important, is you actually have to let the people doing the work figure out the change and let them drive what change they want to do next, if you give them the right context, right?
And you need to manage your change program intentionally. Give it the respect of a real project.
So just again, for those that saw Damon Edwards' talk, he talks about the J curve, right? So on the left is, it's called a Virginia Satir curve. Virginia Satir was a psychologist, I think it was about the '70s, and she wrote about this process of change and status quo, and then introduce this change element, and then chaos happens, and then you sort of incorporate it into how you work, and then at the end, you have this new status quo and all is better.
And I think that's kind of a myth of change, that we're just going to bring a change and it's going to go straight to awesome. But actually what happens when you ask someone to change is that the first thing they will do is get worse. Right? And that is true for an individual, and it is true for a team.
So one of the things we've tried to do is keep the changes small so that it's not chaos. But the other thing I've tried to do is let the team make the change.
So the story on the right is that team that I showed you the value stream map for. This is an abstraction of their velocity over that time period. So this is kind of two changes in one, so it's actually sort of two J curves.
The first one was this acceptance test-driven development. It was the agile improvement that we made with a team. And the very first mark is the first sprint that they did that. And their previous baseline was 40, 50 stories, I think. No, actually it was 42 stories. In the first sprint, they did 42 stories.
And I went to the retrospectives and I was like, "I don't think you guys changed."
And then they, like, fessed up. "Yeah, we tried it and we kind of like, yeah, we backed away kind of scared."
So then they kind of committed that wasn't good enough. So the next sprint, and I give credit actually to their product owner and their project manager who were kind of like, "Do it," and sort of supporting them. They were like, "All right, next sprint, we're going to knuckle down and we're going to do it."
And they did. And their velocity cut in half. It was the first thing that happened. But that was okay because we kind of gave them permission to do that. We told them that was coming, and we let them do that.
And then they got a little better. They kind of figured out some things they needed to do a little differently as a team to kind of make it work. And then they were like, "All right, woo-hoo, we're better."
And then right then is kind of when we started introducing... Right when they were sort of on the uptick of that change is when, so the April mark, actually between March and April, that's the sprint where we reintroduced the DevOps piece. So then kind of a little bit of a dip as well, because now we're kind of in this new transformation piece there, kind of dropped a little.
And actually I don't know where they are, but they're somewhere in the upper 50s, 60s for velocity, although they've kind of restructured their points. But the point is that they're doing way better now. Significantly faster.
But a key to making that happen was actually allowing them to go slower. And that's really hard, back to the you can't change contracts, because a lot of contracts have performance incentives in them that say you have to have this, or you have to keep getting better. You can't ever go slower.
And if you're always forcing somebody to go faster, faster, faster, faster, faster, they will never change, because you can't change if you don't have a chance to just slow it down and figure it out.
If you think to anything you've learned in your life, there's that point where you're practicing and you're not really good at it, right? So you have to have that period or you're not going to actually make the change.
So I think as a leader, that was probably the most important thing that we were able to do for that team was, and it kind of helped that it was Christmas and nothing happens in the federal government in December. So we were kind of like, "Just do it. No one's looking."
So that's, I think, a really important thing to help. And lots of little things, right? We didn't drag this out too much.
So the other piece is, I say, use lean to go lean, right? So we really created a continuous improvement project. We took that improvement and it has its own Kanban. We have our improvement kata, and the team came up with their own metrics for how we were going to improve it.
And then those baby steps are on the board, and we actually sit every day and the team figures out what they're going to do and how they're going to do it.
And I think a key lesson is, don't confuse the Rebel Alliance with the Imperial Senate. The people that are doing this change program aren't there because of their title. They're there because they want to be part of this change program, and they're doing it every day.
Don't feel like you have to pick the people that are actually doing the heavy lifting of change because of their job title. Pick them because they want to be there, because that makes all the difference. Those are the people that are going to really carry this through the organization. You need to pick them well.
And improvements are pretty dramatic. That team that I talked about that couldn't get it under 48 hours, on average, took 12, now runs in 12 minutes, 30 seconds. And it's totally automated.
And the negotiation we were able to make to get this working with the operations team that helped them out in terms of automating this as a baby step, because I think we've got more ways to go, was that that team actually has servers that they patch, a whole bunch of servers in a big wave, and they just wanted to be able to schedule the automation so it didn't collide with one of those.
And we said, "Sure, we can do that."
So that's what we did. We implemented a little gate that schedules when it runs. But now something that used to take them 12 hours takes 12 minutes. And when it takes 12 minutes, you do it a lot more.
So this team now, they used to go to staging once any given sprint. They now go every single story, which means they go six times a week. So that's huge for them. And they're still going, right? There's still lots of improvements to be made. Oops.
And so to kind of summarize, I sort of say that if you're going to bring this to your own organization, the kind of three things of how you go is you build that change team, and you come up with that improvement backlog.
And you try to disrupt the status quo, maybe through a crisis. Maybe you have some other way to do that. And you create that desire to change.
And then you measure your baseline, where you are. You visualize the actual work. You identify the waste and the things that you're going to fix and make changes on. You get that change team. You change the bureaucracy, and then you measure the improvement that you're going to make.
And so my five takeaways are: get that Rebel Alliance in place. Find your Rebel Alliance and your change program. Want to change. Visualize the change. Make the change. And then repeat.
And that's how you kind of eventually get to the organizational transformation.
And my last of what I still don't know how to do, I'm really challenged still to figure out how we scale this fast enough. So I kind of empathize with our HP counterparts. We are, as CSC, splitting the company this week between commercial and public sector. And then the public sector, which is where I work, will be merging with SRA, which is actually my former company. So that's really great.
But in the meantime, I've got 20,000 people to figure out how to change. And trying to figure out how we scale, not the practice, but the actual change agents, is a project that we're definitely piloting. I'll let you know if it works.
But if anyone has any ideas on that, that's what I could use help with.
Thanks.
Thanks a lot.