NM's SDLC Journey, One Year Later.

Log in to watch

Las Vegas 2023

Download slides

NM's SDLC Journey, One Year Later.

Uma Vandegrift

Vice President, Engineering Shared Services · Northwestern Mutual

Jason Cowdy

Principal Engineer, Engineering Shared Services · Northwestern Mutual

A year ago NM was just beginning our journey of transformation. We partnered accross the enterprise to focus on simplification and standardization. A year into our journey, we have made some incredible progress and continued to make this program top of mind for leadership. In the past year we have rolled out multiple opinionated paths for developers, gotten our program included in every engineer's yearly goals, and cut out processes changing our lead time from ~2 weeks to 2 hours. In our progress update, we will share more about the milestones we've hit and the strategies we've used to keep the momentum going.

Chapters

Full transcript

The complete talk, organized by section.

Uma Vandegrift

First of all, thank you so much for that incredibly kind intro. Jason and I are so grateful to be here and share our one-year progress update, and what a busy year it has been.

First, let me give you a little bit of background about Northwestern Mutual. We are the largest direct provider of individual life insurance in the U.S., and we have over $350 billion in assets under management. We've also been around for quite some time. We're over 160 years old, which means our technology ecosystem is incredibly vast and complex.

We pair that with a wonderful culture. We're incredibly caring and empathetic, and we're focused on building the best digital engine.

Now, before we dive in further, let me give you a little bit of context on where Jason and I sit in the organization and our organizational structure. Our technology organization is split into two pillars. We have our CTO organization and our CIO organization. Our CTO organization is where the majority of our engineers sit. Our CIO organization is where we have our infrastructure, our security, and our architecture functions.

Engineering Shared Services, or ESS, the organization I lead, sits within our CTO organization, but we really function between the two. We provide that bridge between our CTO and CIO organization, but aligning to our engineers is so incredibly important to our mission.

That being said, we couldn't do what we do without the incredible partnership from our infrastructure organization and everyone in our technology organization. Because of our unique position, we're often called upon to talk about our engineering pain points and the needs of our engineers. This is a position that we're incredibly proud of and we take very seriously.

However, ESS hasn't always been in that position. The broader ESS organization has been going through a transformation ourselves. About a year and a half ago, the mission of ESS was quite different. It was more aligned to testing, and the organization was comprised of around 500 people, mostly manual testers that were then federated out to different delivery teams.

This left ESS with under 200 people and in need of a major strategy and mission refresh. We started that refresh with our SDLC team. We used that team to spark change by focusing on engineering pain points and improving our DORA metrics.

We quickly realized that we needed more engineering capacity and that we wanted to co-create with our engineers, so we started a rotational program to bring in more skill sets. Our very first rotation was incredibly successful, and Jason Cowdy was actually one of our first rotational members. He sparked some incredible change when he joined ESS permanently.

We were so incredibly lucky to have Jason be one of our first engineers in ESS. He's an incredibly talented engineer, and he helped shape our culture. He brought about a culture that was more empathetic and tuned into the needs of our engineers, and that started a snowball effect where we were then able to recruit some of the most talented engineers across NM to ESS.

Now we have a small but scrappy team that is able to focus on engineering pain points, and we are laser focused on outcomes.

In the early days, we called those outcomes that we were focused on moonshots. These were the moonshots that the SDLC team was first focused on. We had these as our anchor and guide and used them for the past year to propel us forward.

These moonshots follow a similar theme of automation, simplification, and standardization. We wanted to have our engineers up and coding on day one with ready-to-pull estimated backlogs, and also give them an opinionated platform to deploy with. And I'm happy to say that that opinionated platform has been one of our biggest successes.

We've called it our Golden Paths platform. And to go into a little bit more detail, I'm going to hand it to Jason.

Jason Cowdy

Awesome. Thanks, Uma.

Last year, when we introduced our Golden Path moonshot, we started by explaining kind of the last six or seven years of how NM has approached adopting the cloud. Over the last six and seven years, our engineers have basically been enabled to use whatever programming languages, whatever tools, whatever frameworks, whatever libraries that they needed in order to get the job done.

And to be very clear, they did. We moved incredibly fast because of that. But that came with consequences, and those consequences were that we had an extremely complex technology environment. We had incredible tool sprawl, and it was extremely hard to automate any sort of large-scale fix or to make broad-scale changes across the company.

So Golden Paths, that's our answer to this problem. And if you're curious what a Golden Path is, it's basically an opinionated and well-supported path. And it's made for engineers to deploy and create a very specific piece of software. This path is intentionally curated, and it uses blessed tools and patterns that we want to encourage them to use.

The outcomes that we're looking for our engineers using our Golden Paths is, first and foremost, to reduce the toil for our engineers, to make them have a better engineering experience. We also want to reduce that complexity and that technology sprawl across our organization. And ultimately, our goal is to deliver for the business. We want to reduce that lead time for change, and we want to decrease our time to market so that we can deliver for them.

And while that might sound great to you and I, and it sounded great to our leadership, our engineers were not nearly as convinced. We heard a lot of feedback from our engineers that, "You're going to ruin everything. You're going to take all the fun out of my job. You're going to kill creativity, and you're going to stifle innovation."

So we had to pause, and we knew that whatever we built and whatever we did, we needed to partner with our engineers, and we needed to bring them along with us. Otherwise, this effort was going to be dead on arrival. No one was ever going to adopt our Golden Paths.

So we started that partnership by grabbing some of the best and brightest from around the company. These are different individuals and engineers from all of our different organizations, from infrastructure to cloud, to DevOps, to security, to each one of our delivery organizations. And we grabbed all of them, and we threw them in a room for two days for a VSM exercise.

We started, and we picked a very familiar path, and that was building an API using Node.js. And we picked it for two reasons. It was a path that we use very broadly across the company, and a path that my team also happened to know very well and have a lot of experience in.

So we dug in, and our goal was to document every single step along the process from, "I have an idea," to, "I have something running in production." And as you can imagine, there were a ton of steps. There were so many steps that we actually had to cut back and narrow our scope even more.

We cut our scope back to the point where an engineer bootstraps their project to the point where they have something running in non-production, where they could start to build the rest of their business application.

As we continued down this path and documenting each one of these steps and each one of the choices that our engineers were faced with, we actually found that there were so many choices that it was incredibly hard to even quantify how long this process would take because there was so much variability.

So we dug a little bit deeper, wrote a bunch of scripts. We went out to GitLab. We looked at our pipelines. We pulled a bunch of metrics, and we found that there were two groups of engineers.

There was one group of engineers, a very small group that we assume are our most senior engineers, the ones with the most experience, and who have probably done this dozens, if not hundreds of times. And those small groups of engineers, they could bootstrap a project and get it running in probably about a day, maybe two at most.

But the vast majority of our engineers formed a long tail. And when we looked at the stats, it actually took, on average, 160 hours just to start a project and get something running in our non-production environment. On top of that, it took 10 attempts of our engineers pushing a pipeline, having something fail, fixing it, and then trying it again before they could actually get something to work for the first time.

So armed with that information and kind of an idea of where we needed to focus, we moved into our co-development process. And that process starts with my team. And this is one of the teams that Uma was talking about, where it's a very senior team that is intentionally stacked with a lot of our top talent.

We did our initial research. We went and we looked at: What are all the different patterns? What are all the different libraries? What are all the different tools? What are all the different pipelines? What are all the different practices that our teams are using across the company, and why are they doing that?

In addition to that, we did a lot of industry research. We looked at what all of you are doing. We looked for any trends. We looked for any patterns that we wanted to promote. And we also looked for the opposite: What are some practices that we want to stop?

After we did our initial research, our team jumped in, and we actually started developing. And our goal there was to build something as quickly as possible so we could click a button and get something out the other side. It didn't matter what it was. We just wanted to click the button, get something out.

Once we had that working, we moved into our collaboration phase. And this is where Uma mentioned, we brought in engineers from around the company on rotation to join us in that development process. On top of that, we also brought in what we call our subject matter experts.

These are some of the most well-respected engineers from around the company. These are the folks that when you have a problem and you think, "Oh man, I'm really stuck. I really need help. This is super complex," those are the people that you go to.

But as you can imagine, those folks are also the key players in their organizations, and it is incredibly hard to get them from their regular day job out of there and joining us on rotation and helping us build. So I've got to give a shout-out to our leadership, and Uma as well. They had great foresight, and they made those individuals available. They gave them the time to work with us, knowing that they were going to see an ROI in the future.

So those folks jump in, and we start dissecting the problem. And it turns out, when you're building a Golden Path, you actually have like 50 smaller problems that you need to solve, that you need to agree on, that go into that Golden Path that you want to deliver.

Ultimately, as we worked on those problems and we solved each one of those, we went into what we call our influencers review. So our influencers is a group of lead engineers, principal engineers, and architects from around the company. And their job is to be an ambassador. They represent their organization, and they're there to give us feedback. Good feedback and, more importantly, the bad feedback.

We want to know if what we are building is going to work for them. And if it's not, we need to adapt and we need to change.

So after doing a whole bunch of rounds of that, we move into our release process. And that process actually starts a little bit sooner, during our development process. As soon as we have a piece that we think is working and that it's worth sharing, we do what's called an open beta.

And that's where we take what we have, and we put it out in the open, and we say, "Hey, this isn't done yet." We put a nice warning up there that says, "Don't use this in production, but get in there, test this out, kick the tires, give us feedback."

Again, feedback is the most important thing for us, especially the bad feedback. As we move through that process, we keep iterating, we keep looking at that feedback. We keep working on the project until we get to the point where we think we have something that's good enough and that it delivers on what we're looking to accomplish.

And at that point, we move into our white-glove onboarding. So at that point, our team picks a handful of teams across the company that are interested in adopting our pattern. And we take an engineer from our team along with a product owner, and we actually embed them for an entire week with that team.

We get them started. We help them bootstrap their project, get it set up, and then we kind of back off a little bit, and we sit there and we look over the shoulder, and we watch what they're doing for that week. How are they using what we gave them? Is it what we expected? Are they doing some weird stuff that we didn't expect? Do we need to change what we're doing?

We repeat that process probably about three or four times, and each time we iterate, we change a little bit and we try again. And ultimately we get to a point where we think, "This thing is ready."

And at that point, we go for our first release. And it's at that point that it's a signal to the rest of our engineers that this path is now production-ready. We want you to start using this. We want you to start adopting it. Use it in your daily work.

After that point, we move into what we call a little bit of a stabilization phase. About three months after our first release, we have a checkpoint. My team, our leadership team, we all reflect on what we built. We look at our feedback. We look at how people are using it. And basically we have a decision point.

And at that point, if everything looks good, we stamp it with what we call generally available. And what that means is, for a certain subset of our teams, on very core projects, they're actually required to use our Golden Path because it is so important.

So you're wondering, did it work? I think the stats speak for themselves. We took that 160 hours that it took to start a project and get it to non-prod, and we've now cut that down to 20 minutes. We took those 10 failed pipelines and 10 attempts of trying over and over again, and now it's a single button click and a pipeline that works every single time.

Thanks.

And seeing our success with that first Node pattern, we repeated the process now, and we've iterated multiple times now, where we are about to deliver our fifth Golden Path. In addition to that, we have baked our Golden Paths into our new engineer onboarding, where every single new engineer, on day one, uses and learns about Golden Path to deploy a working application on their very first day.

But I think more importantly than the stats is the fact that we've earned the trust of our engineering organization. And that's because we listened to them, we included them in the process, and we actually helped.

I'm going to hand it back over to Uma.

Uma Vandegrift

Thanks. Thanks.

So our engineers noticed, and our leadership noticed as well. We had incredible support from all of our leadership, from the CTO org and the CIO org. And this has meant some good things for us.

For Golden Paths, this has meant continued funding and even an enterprise-wide adoption goal. For ESS more broadly, this means that we were given more challenges to solve and that we're able to be at the helm of some of the most cutting-edge problems that we're solving. Recently, we were even tasked with rolling out generative AI to all of our developers.

This is a huge shift from where we were a year ago. And throughout that shift, we did learn a lot of lessons. From just this, we learned that each problem has layers. We needed to take apart our problems and make sure that we were looking at them carefully and really decomposing them.

Another thing that was so important to us was our organizational buy-in. We needed influencers from all of our engineers and also heavy leadership buy-in to get what we were doing done. And those folks that we got buy-in from turned into advocates, and we returned their advocacy by staying transparent with what we were doing.

We let them know. We were visible. We made changes, and we were adaptable when we needed to be. We made pivots, and we were proud of the pivots that we made.

Now in typical DevOps Summit fashion, I'm going to close with a couple of places that we would love to chat about further that we still need some help with.

First are test environments and test data. We are still trying to tackle that challenge, and we would love to talk further about that.

Also, we have so many services and quite a bit of a sprawl that we're trying to rein in. So if you see any of us after this, we'd love to chat further.

Thank you so much.