How DevOps Can Fix Federal Government IT
The Federal Government spends more than $80 billion each year on information technology. As the fiasco with healthcare.gov demonstrates, the results are not always good. Government IT programs are expensive and monolithic, and the lead time from a “mission need” to a deployed capability is often measured in years (in one of our agency’s programs, about 12 years!). IT systems are often difficult to use, and the US government’s online service offerings to citizens are far from meeting the expectations of a public that is used to Google, Facebook, and Twitter.
The US government has only recently begun to adopt agile approaches, and only in a few agencies. But the results have been encouraging, and show that it is possible for the bureaucracy to be agile. DevOps, however is a game changer. At USCIS we have moved to a continuous integration, continuous delivery approach, and have begun experimenting with a DevOps model tailored to the needs of the government.
By combining DevOps with some ideas taken from the Lean Startup movement, I believe we can cause a radical change in how the government does IT. We can dramatically reduce lead times and costs, improve the usability of systems, provide more transparency, create citizen-centric online services, and – importantly – significantly improve the government’s security posture.
Chapters
Full transcript
The complete talk — auto-generated from the talk's captions.
I'm going to try to convince you all that absolutely crazy is better. Gene, I noticed something about your speakers. I don't know if this has anything to do with how you recruited them, but most of the people speaking today have mentioned bureaucracy, and mentioned it in a way as if it was a negative thing. So, I thought I would talk a little bit about that, and maybe talk a little bit about something else that's come up a lot, cultural change.
And I'd like to question a few assumptions, maybe, around cultural change. So, I joined the federal government for the first time just over four years ago from the Bay Area startup community. I knew it was going to be somewhat different. For one thing, people called me sir, and even when they weren't being sarcastic.
So, they called me sir. For another thing, I was going from an environment where I was managing about 30 people here, to managing about 2,000 people at USCIS, from a budget of about $3 million a year to about $500 million a year. So, it was clearly going to be a big change for me. USCIS, if you're not familiar with it, I bet a lot of you are, we are responsible for legal immigration to the country.
So, if you're confused about the different immigration arms that are part of DHS now, I can clarify it. We're not the mean and nasty people who deport people. That's ICE. We're not the grumpy people who sit at the airports and look at your passports.
That's CBP. We're the ones with the long backlogs. So, there's our specialty. We're the ones you apply to.
If you want a green card, you apply to us. If you want an H-1B, you apply to us. If you want to become a citizen, you apply to us. We receive about seven million applications a year.
And we process them as quickly as we can, basically, is what it comes down to. So, joining the government like that, realizing it was going to be different, the idea in my head, or the thought in my head was: Can we have a lean bureaucracy? Can we make bureaucracy lean? Can we make it agile?
Is it possible? Because the government is bureaucratic. That's the way it is. And it's an interesting question, when you think about it.
Everybody talks about bloat in the bureaucracy, and waste, and so on. Can we actually take it and figure out a way to make it lean? So, there's my question for you. A couple of incidents stand out from my first few weeks in the government.
Or at least, I think I remember these things happening. The first one, I'm in a conference room with a few people on my staff, and we're talking about some important user need that has just come up. And they've shown me a Gantt chart for what's going to be involved in doing it, and the Gantt chart says that it's going to take about a year to do. And I looked at them, scratched my head a little bit.
I said, at least this is the way I remember it, "This is just a few changes to a webpage. How can it possibly take a year to do?" It's probably a little bit more than that, but that's what it seemed like. And they seemed to get really nervous, and they chatted amongst themselves, and then they said, "Okay, I think we can do it in eight months." "Eight months? How could it possibly take eight months to do this?
I could do it myself in just a couple of minutes." "Yeah. We could do it ourselves in a couple of minutes too, but we can't do anything in less than eight months." They actually meant that literally. We cannot get a release out in less than eight months. And so I said, "How is that possible?
Why can't you get a release out in less than eight months?" "The SDLC." And I thought, "That's a little strange." I thought I knew what an SDLC is. A software development life cycle. I'd never heard it used as a reason why you can't do things. So, eight months to get a release out, and I thought, "I'm the CIO.
I'm going to take charge of this." I said, "Well then, let's change the SDLC." I don't know if any of you have ever tried learning a foreign language. Maybe you've experienced this. You're learning a language, and you're sitting there talking, and blah blah blah blah blah. And all of a sudden, everybody gets quiet and starts to look embarrassed, and you realize, "Oh, I probably said something wrong there." "I meant to say I'm getting a haircut, and I said everybody's dying and in the hospital," or something like that.
So, it was like that. "We can't change the SDLC." "Why can't you change the SDLC?" I've been a troublemaker since I joined the government. Why not? Let me see if this works.
"Why can't we change the SDLC?" "MD 102, sir." MD 102. MD 102, that's Management Directive number 102. We're part of the Department of Homeland Security, that's our parent agency, and they have a policy called MD 102, and it has our software development life cycle. It's there.
In fact, I never travel without it. This is actually MD 102 right here. That's double-sided, by the way. If anybody wants to look at it later on, I've got it here.
They had sent me a copy of MD 102 before I started work, and said, "You should probably read this." And I had glanced through it, and I realized immediately it had nothing to do with the way we actually develop software, so I didn't bother reading it. In retrospect, that might've been a little bit of a mistake. Second incident that I remember, I'm in a conference room. It's a big conference room.
It's filled with people. Most government meetings have a lot of people. It's kind of a rule. And we're discussing a legacy system that nobody is using anymore, and it's costing us a lot of money to maintain.
And it was called RNAX. With a name like RNAX, you know it's a legacy system, right? It's one of those things. So nobody's using it.
I'm the new CIO, and I ask a lot of questions. I just make sure that nobody is using RNAX anymore and that it's costing us money to maintain. And then I say, "All right, let's decommission it." And it's that silence again, just like that. I know I said something wrong, right?
So little voice from the back of the room, "Uh, sir?" "Yes, Tino, what is it?" "Sir, you don't have the authority to do that." And I thought, "What? I'm sir." "Don't you know who I am?" "So all right, Tino, why don't I have the authority to do that?" "MD 102." So I figured, okay, it's time. I have to read the thing. I didn't just read it.
If you look at this thing, actually, it's highlighted, it is underlined, it's got little comments in the margins. I studied this thing. I studied every word, and I became the greatest expert in the government on MD 102. You can quiz me if you want.
And it's a beautiful document. It's absolutely brilliant. If you wanted to instruct a large group of people that they must use the waterfall approach, you couldn't possibly write it better. I mean, it's gorgeous.
It says when you're developing software, you have to divide everything into nine phases. You have to have a solution engineering phase, and a requirements phase, and a design phase, and a development... You know. And there's a gate review in between each of those phases.
There were 11 gate reviews altogether. My favorite was the test readiness review, which is the review to make sure that you've absolutely finished development completely before you can start testing. So there's this gorgeous waterfall document that is official policy. It says that we have to write for each release somewhere between 90 and 110 documents, depending on the kind of release it is.
It's functional requirements document or whatever, an integrated logistical support plan. Big mountain of documentation. I actually showed Gene this the first time he visited my office. Here's the documentation that we've just spent two years preparing without actually touching a keyboard.
It's not done yet. So clearly, MD 102 was the enemy. I had to fight MD 102. So I took it on.
I, first of all, brought in a whole bunch of agile coaches. I made up my own policy. It was called MD 001 or something. It was my first policy.
And it said, "From now on, we're agile." Figured try it, see what happens. You won't believe how well it actually worked. That's the thing. So Mark 001 says we're going to be agile.
It defined agility in a very reasonable way. It had eight core practices, and it said from now on, all of our development within USCIS is going to use these eight practices. Things like individually testable requirements, and a few other things that wouldn't really surprise you. So I rolled that out within USCIS, and I somehow got myself...
Everybody heard that there was this thing called agile methodology and wanted it somehow for DHS. So I got myself appointed to a group that was going to rewrite and improve MD 102. Boy, did they choose the right person. So agile coaches everywhere.
We have about 75 legacy systems. Within a year, we had done 100 some odd agile releases. We were starting to move. We had great product ownership from the business side.
Everything was going fine. And I was working on rewriting MD 102, the big enemy, and suddenly I had this weird epiphany. I started to think about MD 102 in a very different way. What happened was I sat down with the people who had written it, and I discussed some of the changes that I wanted to make.
They didn't like them, of course. But they told me a little bit about the context and what they were thinking when they wrote this thing. So what you have to imagine is it's 2002, 2003. The Department of Homeland Security is being stood up, and it's being stood up in the wake of 9/11.
And it's being stood up by putting together 22 different components, basically gluing them together in one big agency. It includes USCIS, and ICE, and CBP, and TSA, and FEMA, and the Secret Service, and Coast Guard, and all this other stuff. And because of the way it's stood up, it's overseen by 104 different congressional committees. No committee in Congress wanted to give up its control over whatever it was overseeing.
So when you put all this together, you've got 104 committeesAnd it's total chaos, right? It's a merger of 22 companies suddenly, boom. Total chaos, and there's this group of people that is being held accountable for overseeing all of the spending of the organization, especially the IT spending, which is a lot of money, and there are projects that aren't working out, and there's total chaos, and nobody understands what money is being spent on what. And this poor group of people is accountable for this, and they're good civil servants, public servants.
They want to do the right thing, and they're in this sort of nightmare situation. So they turned to what they knew. They didn't know much about IT in particular. A lot of them had come from the Department of Defense, so they went back to the way they used to do things, and they wrote what I really think is a brilliant document if you're committed to that way of doing things.
And it was the best they knew. It was the best they could come up with. So I started to read MD 102 again as I was studying it, and reading between the lines, it was this tremendously human document all of a sudden. It's a little strange to say that when you're talking about test readiness reviews and stuff, but in the words of this document, if you read it the right way, you could see the fears of these people who are writing the document, the hopes, the dreams, the integrity, the commitment, because they really are trying to do the right thing, and they're in an untenable situation.
So what I realized after talking to them, first of all, what seems like outright horrible bureaucracy, the faceless bureaucracy, that's not what it is, actually. It's human beings in a government organization trying to do the right things. Don't ever let anybody tell you it's a faceless bureaucracy. It's real people.
I know them. So what I realized is that in order to bring agile practice into the government, what we have to do is somehow meet the needs that are expressed in MD 102. It's not the right way to do things, but it's there because of these needs, this oversight bodies, the accountability to the public, and so on. And so the challenge became, can I create an agile practice that I can map to those same needs?
Or if you think about it, the way we handle agile requirements is we try not to have requirements. We try to hear a business need and figure out the best solution to meet that need. So here, underlying the culture of the government, underlying MD 102, there are real needs that are special in the government. Can we find a better solution to meeting those needs?
That's what the problem became for me in trying to figure out a lean bureaucracy. The government is low trust, necessarily. If you think about how our government is structured, from the very first moment, it was really a low trust government. We have a system of checks and balances.
We have three different branches of government. The idea is you can't trust the president, so Congress has checks and balances on the president. You can't trust Congress, so the president can veto things, the judiciary. The three are intended to control each other because of a lack of trust.
We have freedom of the press. Why? One of the good reasons for that is that we want somebody looking at the government and finding the problems. We don't really trust the government.
We want to hear about the problems. You might have noticed that the press rarely writes articles about how wonderfully the government is doing. I'm hoping someday to be on John Stewart, by the way, being made fun of. That's one of my ambitions.
So can we take this necessarily low trust environment where we have congressional oversight looking over our shoulders all the time. The press is looking at what we do. All of you, the public, are looking at what we do critically. We have rules that we have to follow, compliance, all sorts of low trust sorts of things, and can we find an agile or a lean practice that will satisfy all those things?
It turns out, DevOps, it seems like it was created to solve this need. So where we were using a Scrum-based process before, we're now moving towards more of a DevOps-based approach, and I'll give you some of the reasons why. Let me tell you a little bit about what we've done so far, though, at USCIS. So now everything is agile because of Mark 001.
But we have started tooling up. We have three different big projects going with continuous delivery pipelines. The same stack everybody else mentioned today, basically. It's Jenkins and Git and Chef and Gradle, and we have some wonderful production monitoring from New Relic and I don't know what else.
The usual sort of thing. And we have developers working in Java, Spring, Hibernate sort of stack, and some working in Ruby on Rails. We have automated testing, continuous integration, and continuous delivery into the Amazon cloud. It's all set up and waiting for the first deployment, basically, which is going to happen in a few weeks.
Why is it going to wait a few more weeks? There's an election coming. We're going to wait until after the election. That will be release one.
I actually am very excited to say we also have a chaos monkey based on the Netflix model. So the chaos monkey, if you're not familiar, it's a script that causes havoc. Basically, it tries to screw things up in production. So if someday you are reading "The Washington Post" and you see on the front page that the chaos monkey has shut down the Department of Homeland Security, take it as a success.
Right? It will mean that our DevOps practice is out there. Big success. So, I'm going to try to run through a few of the reasons just to show you the logic of why a DevOps practice can help meet these underlying government needs and how I think about these things.
So one thing about the federal government is we have very strict procurement laws. So what might be good strategy for a company is something that we legally have to do. And it's very common for us to have to change contractors over time as a different contractor wins contracts, and we have to try to create a level playing field where different contractors can propose on things. So, one of the ways that in the past we've dealt with that is we've created a lot of documentation.
Right? Another contractor is going to have to take over for you, so we need you to document absolutely everything you're doing so the new contractor can get started right away. Well, with a DevOps practice, with a continuous delivery practice really, everything's scripted. There's no documentation to be written for the next contractor.
The day they start, they can deploy to production. The test suite, the regression test suite lets them start working right away, refactor, make changes, mess things up. It doesn't matter, they're going to learn from it right away. So, to deal with the fact that we have lots of contractors coming in and out, a good, well-thought-out DevOps practice helps support it.
There's one for you. Metrics. The government loves metrics, and there's a good reason for that. We have to be able to prove that every decision we made was made objectively.
The idea of bureaucracy, basically, is we follow rules. It's not arbitrary. It's not based on whim. We make our decisions objectively.
Great. I can instrument my pipeline. I can instrument production. I can create lots of data that we can use for decision making.
If we are looking at our cycle time and we make a decision to try and experiment to reduce our cycle time and it goes from six days to five days, somebody asks me, "Why did you make that decision?" We were looking at cycle time. We figured this might help improve cycle time. We tested it out. We measured it.
We can show that it really worked. Great. No arbitrariness there. It's no whimsy or anything like that.
Compliance, another area that's been mentioned. I have a lot of thoughts on compliance. Obviously, it's a big part of what we do. One of the things we try to do is if there is something we have to comply with, let's turn it into an automated test, sort of test-driven development.
So FISMA compliance, the Federal Information Security Management Act security compliance, if we have automated test suites that check for FISMA compliance, then compliance becomes a matter of doing exactly what the tests are testing for. Similar situation with Section 508 of the Rehabilitation Act. This is to provide accessibility for people with disabilities. In the past, we would develop a system.
The gatekeepers would come in and review it and say, "Oh, no, it's not compliant with Section 508. Go back to the drawing board." Now we are in the process of putting in place an automated test suite for Section 508 compliance. The developers can keep testing against that and our code, the subtlety here is I want the system to be in a deployable state at all times. We can do that because it's test-driven development for the compliance.
Our other quality control rules, the QA rules that we follow, those can be implemented as rules in Sonar, for example. We can do static code analysis. We do static code analysis for security as well. So by taking the compliance issues and incorporating them into automated tests, we know our system is always deployable.
We're not going to find out at the end that it's out of compliance with something, and it's a well-defined problem. As long as it meets those tests, it's compliant by definition. Couple of other examples. Security is an interesting area right there.
I think DevOps gives us the potential to really take FISMA compliance to the next level. We can improve the government security posture considerably, I think. One change that was already going on in the government before we started to introduce DevOps is that we were doing continuous monitoring of production to find vulnerabilities, and we have now a process where we don't even just check compliance of a system once when we release it and then every two or three years. We have ongoing authorization, ongoing monitoring in production to look for vulnerabilities.
With a DevOps approach, this is just feedback from production. Right? We instrument production for feedback with continuous monitoring tools along with our performance tools. We can have constant feedback to the developers on how secure the system is or what vulnerabilities are found.
If there's a problem, we can patch itPush the button, deploy a new version. In fact, in the cloud, we can tear down the old version that might have been compromised and spin up a new version. We can do static code analysis so that the developers are starting to produce code that already meets the security requirements that we've set up. It's easy enough to do static code analysis to check for the top 10 OWASP security issues, and have the developers run the static code analysis against their code constantly, find the issues in their code, repair them right away, and over time, I hope, this is future, the developers get used to not creating those problems.
We can push the compliance issues right to the very beginning of the development pipeline. Risk management. The last thing any government bureaucrat wants is to be on the front page of "The Washington Post" with an unflattering article. Just let them try to find a problem here.
We're doing tiny experiments. Our risks are all very tiny, and they're built right into our process, and we deal with any issues as they come up. Last thing I'll mention, reducing waste. Everybody in the government loves reducing waste.
It's the biggest, everybody can agree. Democrats, Republicans, waste is bad. Bureaucrats, everybody says waste is bad. So to be able to go out with a lean approach and say, "Here's what we're doing about waste.
Here's how we're taking it point by point," very easy sell in the government. So to bring it all together, the point that I'm making here is that cultural change, the government has a particular culture for particular reasons. It is a bureaucracy because that's the nature of it. It's something that's run by rules rather than arbitrariness.
That's really the original definition of bureaucracy. Government has special needs, and we can be agile about how we bring agility into the government. We can find ways to meet those government needs using, just as a side effect of the way that we want to do things, agilely. And that is a cultural change.
People start to do things differently, but it's a cultural change that doesn't rip out the foundations, essentially, of the culture, what it's really trying to accomplish, and the reasons why it's there in the first place. So don't ever let anybody tell you that the government is a faceless bureaucracy. It is a bunch of people trying very hard to do the right thing and to meet the needs that are imposed by all of us really, as the public. And they are trying to find solutions to the situation they find themselves in, which is that they're overseen, they have people looking over their shoulders at all times.
They have to comply with laws that are designed not based on business sense, but on policy, social policy, fairness to all of the contractors who want to bid for government business and so on. And even though it's deliberately a low trust environment, that's the way it's built, the best way to solve its problems is still an agile way. And a DevOps process, in particular, maps very nicely into all of the government's special needs. So, think about us that way when you see Chaos Monkey wreaking havoc.
Think about what kind of a success that is. Because what it means is that we've brought the practices that we want into the government, and at the same time, met all of its needs. Thank you all.