DevOps at Kaiser Permanente: Overcoming 75 Years of Inertia in Healthcare
In this talk we will convey how the currently-ongoing DevOps transformation at Kaiser Permanente, became a reality. KP is rich with history and known for driving innovation dating back all the way to ship building for World War II.
We will highlight the challenges with implementing transformative change in a compliant industry and risk averse organization, providing real world examples of how the team overcame significant resistance at all levels within the organization. The focus will be on how critical a mutual tops-down and bottoms-up approach is to the success of a major transformation.
Chapters
Full transcript
The complete talk, organized by section.
Alice Raia
Thanks, everyone, for coming in. I know I'm between you and cocktails, so we're going to make this as interesting as possible.
My name is Alice Raia. I'm a vice president for Digital Presence Technologies at Kaiser Permanente. I'm going to talk a little bit about who we are and bring you along on our journey of DevOps that we started about 18 months ago.
Really quickly, can everybody see me? I'm not a big fan of standing up here. We all good? You can see me, you can hear me? Okay, great.
Let's talk a little bit about who Kaiser Permanente is, for those of you who may not be locals. We are a very large integrated healthcare system. Here are some of our fantastic statistics. We pride ourselves in really not only treating our members with healthcare, but with wellness care as well.
We are over 200,000 employees. We have over 21,000 physicians servicing our 11.8 million members. We expect to cross 12 million next year. We do a lot of volunteering. We are a not-for-profit organization, so we are very mission-driven.
But what I'd like you to take away from here is we have a lot of things we're very proud of in our quality and the type of care that we give to our members. But please keep in mind, we are 208,000 people. So the story you're going to hear about how we entered into our decision to go with a DevOps practice, and some of the challenges we have hit at this enormous scale, I'm hoping will resonate with some of you who work with some larger companies.
So that's who Kaiser Permanente is.
This is the most frightening thing, seeing your face this big on any kind of presentation. But who am I and what am I responsible for?
I lead all of the digital delivery at Kaiser. For us, that means digital engagement with our members, our consumers who may be thinking about becoming members of Kaiser Permanente, and on the other side of those experiences, our physicians, who also participate in that digital engagement, as well as our workforce.
I've got delivery for kp.org, for our mobile apps. I've got the Mobility Center of Excellence, which manages about 40-plus apps within our organization. I also lead the consumer digital strategy, which is a multiyear mega-program for the organization where we are really changing the way we engage with our membership, bringing our care to them the way they want it.
I also own the digital delivery for the Kaiser Permanente School of Medicine, which will be opening in a couple of years in Pasadena. I am the Agile SDLC owner for the organization, and I own the DevOps practice for kp.org. Lucky me, I just found out I now own it for the enterprise.
There's a lot going on here, but I think one of the benefits of the role that I'm in is, because I own Agile and DevOps, we can really pair and get this practice solidified for our first digital asset, which is kp.org. Then the intent is to spread for the enterprise to a lot of different types of platforms and engineering teams.
A little bit about who we are. Our digital engagement with our membership is specifically about care delivery. If you look at some of these stats, we have over 300 million visits to the site per year. Our registration rates rival banking. We've got about 70% registration rates on our sites and in our mobile apps.
If you look at what we do within a year, these are our 2017 annualized numbers: we expect to fill 25 million prescriptions online. Twenty-seven million emails have been sent to doctors. Fifty million lab tests viewed online. You can see video visits are really starting to ramp up for us. I don't have a stat here, but we recently introduced Chat with a Doc in our digital asset.
So it's not about coming to a medical office building or a hospital anymore. It's about enabling an experience for our members that brings care to them in a way that's convenient to them. In order to be able to realize a lot of this, we've had to take a good hard look at how we're delivering. Speed to market and meeting consumer expectation is really critical to us, and that's what started us down this journey.
I'm going to spend a little bit of time on this slide because this is really the "oh shit" moment. I hope I didn't offend anybody by saying that.
When I took this role about three years ago, we were releasing every two months for kp.org, but every release was red. Every release was in trouble. We barely made the scope. We sometimes slipped the date. It was really a very painful experience for everyone involved.
We were growing pretty fast, and we really had not solidified our engineering practice. I want to talk a little bit about what happened here.
We were doing those big batch releases, and they were exceptionally risky. The scope was introduced late into our releases because our business partners were like, "Well, crap, if I don't get it in now, I don't know when I'm ever going to see it." So scope was always a moving target.
Our teams were very good at optimizing for their silos. "I did this really great thing, now I'm going to throw it over the wall." And the further away you got from the actual delivery, the less people could relate to what was actually going on in their area.
We had a lot of lengthy reviews. If you look at this picture, which kind of cracks me up, we have some great presentation specialists. I believe somebody had an idea, then somebody rained on it, then somebody went to look at something on Google, and then somebody took a coffee break.
The amount of breaks, handoffs, inefficiency, and waste in our pipeline became very evident when we went back and did a value stream mapping against one of our releases. We had no feedback loop for our developers because we get to the end here and no one had any idea what was going on in here. And we had a lot of snowflake processes, environments, et cetera.
When we executed that value stream mapping against one of our releases, I was working with a vendor partner, and they presented the results to me. The results showed that we had 60%, six-zero percent, waste in our pipeline.
I didn't know if I should bring that and my resignation to my CIO, or if this was going to be a pivotal moment for us. When I showed my leadership those statistics, they were not surprised. At that point in time, our DevOps practice and the idea that we were going to take this journey materialized.
We took that number and quantified it. At the time we did this value stream mapping, we had five scrum teams of about 75 people. That quantified into $864,000 a month of productivity we were losing. So we had absolutely nowhere to go but up.
I've talked to a lot of folks here and in our journey with this practice, and there are a lot of times where you're like, "Well, how do I tell that story? What can I bring that shows and quantifies where we are and where we need to go?" Believe me, that 864K number got my CIO's attention. That is wasted cycles.
That lack of predictability, lengthy gaps, all of that was where we started.
Then we had to figure out our compelling why. Obviously we've got inefficiency, but why do we want to do this? I alluded to the fact that with this new consumer digital strategy program and the way we're changing our engagement with our members, we wanted to be able to react to the marketplace much more quickly.
If you look at what's happening in the healthcare space right now, I don't think any of you would argue that it's highly efficient, but our competition is really changing. Google just hired doctors. Amazon is getting into big data with healthcare. That's what we have to be able to react to.
As a very large, risk-averse, traditional organization, we had to find a way to accelerate our pipeline. So we looked at our attributes of release. That top release, that complex bimonthly, we will never get rid of that because we're tied to backend electronic medical record systems or very large membership systems that don't want to change every two weeks. It would be disruptive operationally to us if they did. So we'll keep that.
On the bottom, we've got our content releases. Those go daily. We enable our business partners to be able to do that. They can push content whenever they want.
What we were missing was this middle ground, this real sweet spot, and being able to accelerate out. We chose a cadence of every two weeks. So we went in saying we want to be able to push code every two weeks. That quote-unquote "code" might not be ready for consumer-facing, but we're breaking it down into smaller batches, practicing, getting very good at the automation, and doing this kind of regular cadence of getting things out.
I'll show you some statistics later on how we're actually doing. But this was our compelling why. When we sat with our business partners, they got it. Now, they got it conceptually, but of course they're like, "Well, you need to prove it to me that this is going to happen." And we took them on our journey with us.
Being an Agile shop and having product ownership, we had to be able to convince our business partners that this was a risk we would all be willing to take together.
What does our practice look like? Where did we start? What did we focus on?
We have affectionately branded our practice the Factory of the Future. The intent is to build a pipeline with kp.org as the first tenant, and we are going to spread this out to systems that you traditionally would not expect being able to use a pipeline like this, but it will be enterprise-wide. So the factory is an operational factory that will support multiple applications, platforms, et cetera.
I want to talk about the four basic threads that we have going because they're all really critical.
The platform, pretty obvious: what is the infrastructure? What is our tooling? What are all the basic things that you need to be able to build this pipeline?
The pipeline itself. Right now we've got five pipelines for kp.org. What is it going to take to build those, and how are we going to continually improve those pipelines to increase predictability, to increase quality?
Our products are the way we are reimagining the kinds of work we do. In our past, we have been very project-focused. As an example, we had things like the password reset project. We're now reformulating into a product-centric organization where instead of the password reset project, we have a team that does identity management.
As we introspect our teams and lift and shift them over to this practice, we're going from scrum. We're adopting the Spotify squad mentality, and we're looking at all of the folks who are in those teams to make sure they're appropriate from a skill and behavior point of view as we lift and shift over.
Then finally, culture. I'll just be really frank with you guys. Out of all of those four tracks, that is the hardest track for us to get past.
Very well-intentioned people in our organization. Because we're not-for-profit, we're very attached to our mission. But this requires a change in behavior that is difficult for people who have been at a company for 10, 15 years. And it's not only a change in behavior, it's a sustainable change in behavior.
We may be good for a week, and then we start falling into that muscle memory of going and doing things that are comfortable for us. This was the hardest nut for us to crack in this new DevOps practice.
One of the ways that we found a sustainable way, not to change our culture, because I don't know that it's a culture change. It's an adaptation of the best of your culture in order to work in this model.
We had a very formal, focused change management effort. We created a group of change champions who were typically at the middle management level who went out and worked with their teams, who went out and were saying the same things, had the same leadership message, had the same behaviors.
And we measured. We measured teams before they onboarded to the new pipeline. We measured them after they onboarded. We measured them 90 days later. How was their attitude changing? How were their behaviors changing?
We had formal coaching embedded with the teams. As you all are either in the middle of this or thinking about doing this, please do not ignore that. The technology's easy. The platform's easy. The culture and the people is where you need to spend most of your time.
I'm kind of racing through this, but I'm going to spend some time here because this is the brick wall slide that I referred to. How did things go wrong, and how did we get past them? Not necessarily went wrong, but what were some of the blockers that people brought up?
The first one that we heard a lot as I was new coming in, and we had some vendor partners that were helping us: "You don't know how kp.org works. I've been here forever. That's not going to work. That won't work here. That won't work at Kaiser. Forget it. Ain't going to happen."
That was the first one.
The second one was personally interesting to me: "Oh, I'll just wait till Alice leaves. We have all these people coming in here, and they try to make this transformation. Other people have done it before. It's about a year shelf life. If I can wait it out a year, I'll be good to go back to the old way."
By the way, DevOps is not for prima donnas. I don't know if anyone's told you that yet.
"We do things for a reason, and that won't work here." In a very compliant industry and a regulated industry like we're in, we do have some special hoops we have to jump through, but they should not be limiters to us successfully transforming our engineering practice. That's what we got here a lot.
"Oh, we can't do that thing with change management because someone got fired." I would push back and say, "Well, who and when?" Nobody could answer that question, so clearly we were holding onto baggage that didn't exist.
And then, "Why is this attempt going to work?" Quite frankly, this is a very valid question for folks to ask because you've got folks who have been through two or three of these attempts. They didn't pan out. We really had to drill into why not.
A couple of the things that I discovered was we had a very entrepreneurial take on DevOps. We had team A in this corner doing kind of their version of DevOps. We had team B over here doing another version. We had the infrastructure team doing it, and nobody was bringing this together. There was no standardization. There was no really full understanding of the compelling why. We were not driving to the same things.
This is what I heard for about the first six months. Then things started happening, and here's how we got past some of this.
"You don't know how kp.org works." Well, we involved a lot of people who do, and we had a lot of sessions with them. We built trust with them. We listened to them. Sometimes people need a venting session. I'm sorry, we had a few therapy sessions going on.
But we also asked people to, instead of saying "no, but," say "yes, and." If you're going to bring something up, help us figure out a way past that. Suddenly, you're now empowering those people with part of the solution. Some folks grabbed that. Others did not, but it was one way to get past that.
When we talked about the executive sponsor leaving, well, clearly I didn't go anywhere, and you've got to be pretty resilient to lead an effort like this. We've heard several leaders here say that you've got to be willing to be unpopular because you're going to make some decisions that will make some people unhappy.
You've got to be willing to push on your executive leadership and tell them stories they are not going to want to hear, like the one that says, "I need to go a little slower to go faster. I need to slow down some of this consumer-facing work so that we can shore up our engineering practice. Oh, by the way, we're going to make a bunch of screw-ups, but we will get better over time."
You have to be willing to have those conversations and demonstrate that resilience and commit to a formal program. I'm sure you've heard this, and I hate this saying, but this is a marathon. It is not a sprint. We're going into 18 months right now. We expect this to take about three years. For an organization our size, it's like the Titanic. Turning it is a two- to three-year event.
"We do things for a reason." kp.org's a pretty big property, right? Six, seven million users. But we chose a couple of smaller efforts within that very large ecosystem to try some things out, and we didn't wait until we were done.
We put some time in. It was done enough. We moved a team over and said, "Oh, by the way, team, you're going to come over here. You're going to start using the new pipeline. It's going to hurt because we don't have everything figured out, but you guys are going to forge the way for the rest of the groups. You're also going to be able to make some calls for us. You're going to help us make decisions that will influence this entire practice for the entire organization."
That's where you want your really pliable people who have a lot of energy and a lot of passion for this kind of work. And we handpicked them. It was painful for them, but they helped us figure out some really key, critical elements of our practice, and we actually changed our mind on some things when we got the smart people in there and they started using it.
We were able to show some of those early wins. Internally, we use Facebook Workplace at Kaiser, and we have a Factory of the Future group. Every time a release went out, we were putting stuff up, and we were showing it. We had pictures of the team doing their release that only took three hours instead of four days. So people started to really see that this was actually taking hold, and they wanted to become part of it.
Finally, "Why is this attempt going to work?" Again, there is that piece of resiliency. You've got to make sure that you've got the right leaders in who are willing to take some of those risks. It helped to have a CIO who had been through this before and was willing to back us up. But we really did it kind of at a middle level.
We formalized it. We identified measurable outcomes. We tracked to those measurable outcomes. We were pretty clear about whether we met them. We also were very clear where we failed.
Not only did we celebrate our successes, we acknowledged our failures. We said, "You know what? This didn't work out so well, and here's why." We actually have a Confluence page where we list all of our retros, and we put those links on our Workplace. Anybody can go look at them.
So we're being exceptionally transparent about the fact that some things don't work out. But so what? Look where we've gotten.
Again, I cannot stress, the organizational change management is really, really critical. The other thing we found here is, in our world, we were able to modernize our engineering practice. But at Kaiser, we only have a one-size-fits-all for everything else. So now we're making everybody around us really uncomfortable.
We've got change management, incident management, problem management, financial tracking, sourcing. All of those things have to start adapting as well. It helps to identify the key partners in your organization who are going to want to go there with you.
I had a recent conversation with our change management guy, who thankfully has come from FinTech, where they did this 20 years ago. He is willing to say, "You know what? To accelerate this pipeline, you can't use a change management process that takes two weeks to get through. So how do we set your change records up so that you just execute orders off of them?"
You have to go identify those kind of gems in your organization who are going to be willing to help and take that risk to go there with you.
So we got past the brick wall, and we've got some critical mass going right now.
Where are we? This is my slide that I'm very, very proud of.
We started releasing on this pipeline in April of this year. We have 600 users, of them 500 developers using the pipeline. Four hundred fifty average code commits per day as of September. Five automated test frameworks. The five integrated pipelines that I talked about. Ninety-four active projects.
All of those product teams, of which there are only five squads on these pipelines right now. Five of them. We anticipate at our end state, we will have about 20 squads that are either consumer-facing, working on consumer-facing products, or common services behind them. But right now we've got five, and this is what they're doing.
Twenty production releases since April, every two weeks. Look at the number of wiki articles. We are a learning organization right now. Prior to this, something was in somebody's head, and there was only one of those somebodies.
Look at all of the pipeline service requests we've completed, and look at our averages against industry. So a big, cranky healthcare company that's been in business for 75 years, we were able to do this via kind of just finding the right people in the organization, changing minds, people being flexible, and showcasing that it could be done.
At scale, we are going to be huge. So we're pausing a little bit right now. We've got the five teams on, and I talked a little bit about learning. There are still a handful of things we have to figure out before we go to that 20 to 25, just for the kp.org team.
Things like, again, I talked about the change management and incident management, problem management. What does ops look like for us? We kind of have been doing mostly dev. Our approach was start with dev, not start with ops, because we had a dev problem.
I've heard a lot of people talking about, what problem are you trying to solve for? You need to really understand where you're going to be able to make some significant gain before you just start up a practice. So we started on the dev side, and now we're pulling ops in.
What should we reserve for capacity for ops? I can tell you, my business partner was like, "Nothing. Just do all new." Well, that's not going to work either. So how do you have those conversations and negotiate, should it be 25%, should it be 50%? Our funding mechanism doesn't account for that, because ops is paid one way, investment is paid another. So we have to conquer that.
There are some things we still need to figure out before we finalize our scale here and then get ready to spread this to the rest of the enterprise.
Where are we going to go next after kp.org? We are going to spread this practice to other platform teams inside of Kaiser Permanente that run pharmacy systems, claims systems, membership systems, sales, and, oh, by the way, what about our workforce?
We've done all this great digital work for our members, and quite frankly on the backend of that sometimes is somebody still dealing with paper. So how are we going to modernize the experience from our workforce as well?
If you look at some of these systems, I mean, pharmacy systems, FDA, how are we going to get past some of that? But we see opportunity in every one of these areas to be able to spread this practice and to be able for these teams to manage some of those gains.
Now, they may not use everything that we've put out, but they should be able to leverage a lot of the learning, a lot of the pipeline work that we've done already, a lot of the cloud work that we've done. We have, as a healthcare organization, moved into the cloud. All of kp.org will be in the cloud by 2019. That's almost unheard of.
Again, there are some decisions and some risks that have to be taken here, and convincing leadership to do those things is backed up by where we're going.
Again, I'm going to go back. Twenty production releases. Actually, we're higher now because this is about two weeks old. But those teams go in, everything is automated, there's no manual, and if something goes wrong, they back out and they try again two weeks later, again, as a learning organization.
I raced through. I'm going to thank you, and I think we actually have six minutes for questions in case anybody...
Oh, before I do, there are a number of folks in the room who I would not be as successful with this work if they weren't here. So can the folks at Kaiser Permanente who are working on this please stand up? I want to give these guys a hand because they're the ones actually doing the work.
Q&A
I would be happy to take questions and deflect it to all those people who just stood up. Just kidding. Yes.
Q: Can you show the slide, the success slide that you got? That one. You said that you have more than 500 developers, but how many were total developers in the company?
A: Kaiser's IT department alone is about 7,000 people. I would say kp.org is one of the largest development shops that does custom. A lot of our packages are COTS, but we have to have at least 2,000 developers in the organization. At least. Now, the bulk of them will be in kp.org.
Q: I'm just curious, when you said COTS, right? You're talking about total 50 pages you bought or something else as well?
A: Yeah. No. Our COTS applications are primarily these. We buy a membership system from a company, and we build integration around it, where kp.org is purely custom.
Q: But after this, COTS applications are not yet in the pipeline?
A: They're not yet. The only folks on our pipeline right now are the kp.org teams. Those 20 teams, those 20 squads will be kp.org and mobile, primarily our consumer-facing assets, I guess, is the best way to push it.
The next group who are... Believe me, they've been lining up for months, and I'm not ready for them yet on the pipeline. We've got a group from membership systems that wants to build some producer and employer broker portals. We've got a group that wants to build a better "sign me up" experience for our individual line of business. We've got some sales and marketing folks, and our School of Medicine folks will be on here.
Q: What made you choose the squads approach?
A: The question was, what made us choose the squads approach?
I think we have traditionally been challenged by the fact that we're so siloed. We had the frontend team in scrum, then we had kind of the system of record teams that we had to integrate with in a different building, in a different business unit, et cetera.
What we tried to do with the squad approach was to bring the majority of the skill sets necessary together in order for us to deliver independently. Along with the squads, we're going to microservices as well. We are creating vertically deep teams so that we can break the monolith that is kp.org apart, and we can have these microservices teams focus on a product that has longevity and move.
It was more around creating an entity that was independent and can move more quickly.
For those of you familiar, we're also adapting the chapter model where we've got to create standardization and uplift those functions. We will have chapters for things like the different functions. And our first tribe is the kp.org tribe.
So it was really to take benefit of having those skills. We've brought people into those squads from our system of record teams. As an example, we have a pharmacy center squad that has pharmacy... We have a very large pharmacy system. We have people from that system of record in our squad. We have their testers in our squad.
Now what you're doing is you're extending that end-to-end understanding, and really quite frankly, you're gluing those teams together so that they all have the same incentive, which is to make things work for the consumer.
Our pharmacy teams never thought about the consumer before we went to this methodology. They're like, "So what? It takes 40 seconds to get a prescription." I'm like, "Do you understand what 40 seconds in front of a browser means?" This approach we're taking is also breaking down those organizational silos end to end.
Q: How many people are in a squad?
A: This is one of our learnings. I'll talk about our first two-pizza team. I think all of you are probably familiar with the Amazon two-pizza team. Our first two-pizza team meeting, we ate 12 pizzas. I think there were about 36 people in the room.
We had to figure out what was optimal. In our squads, we're vacillating, because remember, we start with frontend. We've got Adobe as our stack: Adobe, Node.js, BFF, Bluemix, system of record, services. We have about 12 to 15 per squad, but when you're covering all of that stack, that's actually a pretty decent size.
Now we're trying to figure out what should be a decentralized service versus a centralized service. Things like database, load balancing, et cetera. That's kind of another level of our maturity. But they're about 12 to 15.
Anybody else? Any question?
Q: This is a pretty big investment, both in time as well as money. How did you get here, and how are you sustaining it?
A: The question was, this is a pretty big investment, and you're absolutely right, both in time as well as money, and how did we get here?
I'll expand a little bit: how did we get here, and how are we sustaining it?
We took advantage of the consumer digital platform or strategy program to build this engineering competence, because without it, we would not be able to meet the goals of that program and getting to more accelerated time to market.
The upfront investment in building the pipeline, defining the practice, and building the top pipeline took about a year. It was funded particularly by that program.
Sustaining it, the conversation we're having now with our leadership is we're moving away from project-based mentality, which I know is a really hard conversation for a lot of organizations. We're now talking about funding teams instead of projects.
Clearly, you don't stop after you do all of this work on digital. You continue to make an investment. We're trying to figure out what should that ongoing investment be for these digital assets that will support the squad.
Really what we're trying to define is what is that end state for those long-term products, and what should we be investing in? Again, this is breaking every financial model we have, so we have to work with our CFO to figure out this is all OPEX, right? There's no capital. And how do you define success measures? NPV is out the window. ROI, it's really hard with some of this stuff.
The conversation for us, we've been lucky that we've switched because we've got executive leadership that understand where you need to go. But taking advantage or tacking onto an effort where you can tie to the outcomes. We tied to the speed to market and predictability, to those two outcomes, and said, "This is what it's going to take to get there."
We started with a couple of teams. Now we've showcased that. Moving over, we actually have a pretty good rhythm of onboarding these teams. There's a formal onboarding for these teams. It takes two weeks. They're sequestered in a room, and it's all about behavior change, coaching, and training, and making sure that they can sustain that change.
Backing up to see how much time we have left. Fifty-eight seconds. Who can talk fast? Anybody? Yes.
Q: Did you have to change your technology partners?
A: Did we have to change our tech... That's a really good question. Did we change technology partners?
I would say we took a very fresh approach to the utilization of the ones we had. Let me be specific. In order to get here, we're a very big company, and we love the big, name them, Cognizant, TCS. That's our tier-one kind of vendor. Clearly, we weren't going to get there.
We went with smaller, more niche vendors. Or when we did go with larger vendors to help us build some of the operational pipeline, we changed the conversation and said, "I don't want the people who you just throw into a big delivery shop. I want your A players who are part of your digital team, who are going to help us transform."
So we have two groups. We have the transform team that's helping us rebuild our practice. Then we have the operations team that's operating and building the pipeline.
Again, a completely different mindset shift, especially for CIOs that are used to spending their political capital on companies with three-letter acronyms. I'll just leave it at that.
Okay? All right. Thank you, guys.