Log in to watch

Log in or create a free account to watch this video.

Log in
Las Vegas 2018
Share
Download slides

Scaling Continuous Delivery to Walmart

As we continue Walmart's journey to accelerate value delivery, incentivizing the outcomes we need from development teams is critical to success. We'll discuss the tight integration of metrics, open source tooling, education, and community we use to drive DevOps at the scale of a Fortune 1 company.


Bryan Finster has been a developer since 1996. In 2001 he joined Walmart, developing warehouse management systems for their global supply chain. In 2017, he joined Walmart’s Software Delivery & Enablement organization as the product owner of Hygieia development to bring metrics visibility to the teams. Currently, he leads the CD Sherpa team who assist product teams with removing constraints to continuous delivery.


Dana Finster has been a software developer since 1998. She joined Walmart in 2015 and currently works on the InfoSec SIRT Tools Support team. In 2016, she organized our 3rd DevOps day, bringing together thought leaders from all over the country to share experiences. She is the founder of our grassroots CI/CD community of practice, Continuous Chai.

Chapters

Full transcript

The complete talk, organized by section.

Bryan Finster and Dana Finster

About scaling continuous delivery to Walmart.

I am Dana Finster.

I am a CD evangelist and senior software engineer in information security.

I'm Bryan Finster.

I'm a staff software engineer and team lead for the CD Sherpa team, and we work for a small retailer in Northwest Arkansas.

You may have heard of it.

In 1950, Sam Walton opened his first small little Walton five-and-dime.

But today, Walmart employs 2.3 million associates who support almost 12,000 stores in 28 countries worldwide, with half a trillion dollars in sales annually.

This is our scale.

Yeah.

And on the IT...

Ooh, that's loud.

On the IT side, we've got hundreds of development teams worldwide deploying to hundreds of thousands of nodes supporting every business we have.

And we have really diverse tech stacks, everything from mainframe and C to Go.

And we're here to talk about scaling DevOps to this size, to Walmart size.

So let's start with the first rule of DevOps.

Everyone knows the first rule of DevOps, right?

Don't talk about DevOps.

DevOps is overloaded.

The term is interpreted in many different and often confusing ways.

You can't just go out and buy the DevOps.

You can't hire the DevOps.

But at its core, DevOps is really simple, right?

The people collaborating together using lean process and heavy automation to deliver quality software rapidly.

But if we don't talk about the DevOps, what is it that we talk about?

And what we do is we focus in on the outcomes that we're looking for and foster the culture to attain them.

We're all here looking for the same outcome, right?

To deliver quality software rapidly.

And the key to the culture change that's needed to attain that outcome is our people and our teams.

And we know from experience, we spoke about this last year, that we can grow really effective development teams by having that team focused on trunk-based continuous integration, real continuous integration, and reducing the delivery increments, and keep driving down that batch size and asking, "Why can't we deliver today?" And solving those problems.

The act of that team solving the problem not only makes the team be really good problem-solvers, but it generates a lot of teamwork.

You get a really effective team that can deliver value very rapidly.

The team that I came from, we went from zero to 12 deliveries a day to production.

Yeah.

So we started by holding annual DevOps events to educate people about the concepts of DevOps and continuous delivery.

Oops, sorry.

Go ahead.

No, go ahead.

The way we're approaching scaling this to Walmart, you can't go team to team to change it, but we're taking an approach of using gamified metrics, a culture and sharing community, a unified deploy platform, which is really key, and Sherpa guides to help teams with any struggles that they have.

And we started by educating people, holding DevOps days to teach people about the concepts of DevOps and continuous delivery.

These really got people excited, and it started getting the word out.

I went to one a couple of years ago and was really excited to bring continuous delivery back to my team.

I knew that it would make our lives easier.

I knew that it would allow us to work better with our business partners and deliver value faster.

The problem I encountered was that I couldn't find a central area within the organization to learn more, to find out what initiatives were currently going on, and how to actually implement continuous delivery.

I looked around and I found a lot of pockets of really good progress.

We had teams that were building pipelines.

We had teams that were focused in on testing and continuous integration.

And many, many teams were all trying to solve exactly the same problems independently.

I had to figure this out.

So I decided to host another DevOps Day, and I brought in leaders and developers from all over the country to share the vision and highlight the progress that they were making within the organization.

But I knew that this event was going to garner a whole lot more excitement.

People were going to learn more.

They were going to want to accelerate faster.

But these same excited people were going to end up just like I was, looking for that central area to kind of guide them in what those next steps are.

So I built us a home.

I started Continuous Chai, which is a CI/CD user group where people can come to share and learn about continuous integration, continuous delivery, and the myriad of topics that go along with it.

This community of sharing is the first of four initiatives that we want to share with you today.

I believe that when we want to change culture, it works to help use that culture to teach the culture.

Sharing is a key tenet of DevOps, and it's important to share early and often.

We have old habits and human nature that just hold us back, right?

We only want to show people beautiful, shiny, finished products after we're all done, and we say, "Look how successful I was." What we've built in Continuous Chai is a forum where people actually have the freedom to share off-the-cuff ideas, to share their work in progress, and to highlight not just their successes, but a trusting environment where they can honestly speak about their failures, their struggles along the way. And by sharing early and openly, teams can learn a lot from each other and avoid wasting time with duplicate work efforts, struggling to solve the same problems alone.

So an active, trusting user community is key to enabling large-scale change.

Yeah, and having that network in place has been a really valuable tool.

As people start onboarding the new tech stacks, they start asking the same questions.

We see it over and over again in ChatOps, where, "How do I test this React app?" Or, "How do I plug Sonar into this thing?" And we say, "Well, have you asked in Continuous Chai?" And they go to the community, and you have the community dive in and help them.

You get solutions so much faster than you do by Google or Stack Overflow.

And in fact, when Dana and I were working on the stack together, we went to Continuous Chai to get feedback because we knew not only that we had a trusting environment with friends, but we knew we would get actual, real feedback, not, "Oh, yes, it's wonderful." The positive reinforcement.

And it was a little daunting.

Yeah.

It was a little daunting seeing all the notes come by, but it absolutely helped us improve this material.

So we've talked a bit about where we are and what an asset that a user community can be.

I hope that if you don't have an engaged community, that you might be thinking about starting one.

I've learned some things along the way.

First of all, a leader of a user community has to have passion.

It's not something that you can just tell someone to take care of and expect it to be finished.

Building a community takes ongoing work to engage associates, keep a consistent schedule, and bring interesting demos and discussion topics to the group.

Secondly, it takes a lot of patience.

When we first started, there were many times when I was sitting in a room by myself or with one or two people.

But even just a handful of people can brainstorm ideas and start bringing true value to the group.

We have iterated in different formats along the way.

We've done informal coffee chats, we've done demo and discussions focused on specific topics, and even offsite meetings.

And every time, we've come to find that we have the most success in our environment with meetings that have specific demo topics.

We just keep iterating on the format and on the timing, and we currently have over 600 members and offer weekly demo and discussion sessions.

It can't be built in a day.

And when the momentum does periodically slow, because it will, there's one fail-safe way to incentivize people to keep showing up.

Swag.

Free food.

Yeah, and it's true.

It's amazing what we'll do as developers for a T-shirt.

But the important thing is this T-shirt is not something you get for showing up.

You only get this T-shirt for contributing to the community.

It's a badge of honor.

And so people celebrate.

Look, I have a Continuous Chai T-shirt.

It's been really important.

You only get one that looks like this if you're me.

So the other thing is a unified platform.

We first started on this journey, we had several areas, and the areas that were really digging into CD, they'd go and spin up their own Jenkins instance or whatever tooling they were using.

Or other areas weren't focusing on it all.

They didn't have the tooling, they didn't have the bandwidth to get it done.

But it doesn't scale for every single area, every single team to get their own platform stood up.

Product teams should be focusing on delivering products.

Having a consistent, unified platform they can use, that's easy to use, is absolutely key.

So I work in software delivery enablement.

We're the area that's responsible for building the CD platform, and it's a set of tools we're building.

It's delivery as a service.

We want teams to be able to focus on those products and then just use the automation to deliver them.

We don't deliver it for them, we just build the automation.

We're using open source tools and scaling them to Walmart, and I'll tell you, we break a lot of tools.

And we want to make the right thing the easy thing.

We want you to flow downhill to success.

The initiative we work on is called Irresistible Developer Experience.

We want you to use the tools because they believe that they are better, that they're easier to use.

And we find it's really fast to onboard people on these tools.

Our delivery platform is designed to be implemented by all the teams across all the tech stacks in the organization.

Having this single pipeline allows for security and code standards to be consistent across all the products in the enterprise.

New tools and controls can be injected, and all teams can immediately benefit.

Our platform uses simple configuration files that hide the complexity of the implementation from the development teams.

And not all of the features are able to be configured by the developers.

Things like code scanning and security controls are automatically turned on.

Developers don't have to set those up, and more importantly, they can't be skipped.

We actually showed this slide in a Continuous Chai presentation, and one of the developers there noted that he'd never really seen all the intricacies that go on within the pipeline.

To him, it was kind of like magic.

He said, "As a developer, it's almost transparent to me.

It goes to Git, it gets built, magic happens." An example is Concord.

It's our workflow orchestration engine.

It's a general automation tool.

We use it mostly for our CD pipelines.

We also use it for any just automation we want to do, including signing people up for classes.

It's got plug-ins for all the tools we use.

It's easily extensible for other plug-ins that we need, and more importantly, developers don't need to understand the underlying implementation.

All they need to understand is how to call those things from Concord. And because it's been planned from day one to release this back to the community as open source, it's enabled a use case for Dana's team.

Yeah.

My team actually supports our security infrastructure and incident response teams.

And because of that, we are on a completely segregated air-gapped network.

Because it's designed to be released to the broader community, it's designed to be very easy to install.

We're able to implement our enterprise platform in our segregated network and very easily pull in the new features and still be able to take advantage of all the work going on across the enterprise.

And here's an example of how simple it is to configure the tools from the developer's standpoint.

We simply have a configuration file located right alongside the code using a simple declarative language.

This allows for configuring and versioning individual repositories to be very simple.

It hides the complexity from the developers, and each feature is just a simple function call.

We can see right here that a single line of code calls Hygieia and publishes the build metrics from this repo.

And metrics are also very important.

If teams have pipelines but they don't have goals to deliver to, they have no idea what the outcomes are supposed to be.

So it's really important to make those goals clear and make the metrics clear so they understand how they're trending against those goals.

And to do that, we use Hygieia.

So if you don't know about Hygieia, Capital One open sourced Hygieia several years ago.

There's been teams around our building who've been using it for years, and we now have it integrated into our pipeline.

We've also-- I'm sorry.

Wow.

Anyway, it gives you a real-time view of the CD pipelines, and it's a really important tool for the teams.

Yeah.

The product dashboard gives teams the metrics they need to understand the health of each individual repository with metrics including build stability, the frequency of commits to master, static analysis, and test results, as well as code coverage and the frequency of deploys per day within each environment.

We've also added scoring to Hygieia.

The metrics are weighted and aggregated to give an overall health score, and this scoring gamification helps drive improvement, and it allows teams to quickly see which code bases are more hardened and which might need a little bit of attention.

Teams can analyze where they might need to put attention by drilling down into each individual metric widget.

For example, we can see here that most of the code repo score is determined by frequency of merge to master.

But if teams are committing directly to master without using a pull request, then they're going to take a hit on that score.

Yeah.

And I was talking to Scott here from Columbia Sports last year, and he was talking about we should have taken a psychology class to get this done because to get teams to change, he's called it hacking the biggest undocumented API.

You poke and prod and see what the outcomes are going to be.

Metrics are incredibly dangerous things if used inappropriately, right?

So you really need to understand those metrics and understand how people react to those metrics.

Just because you put a metric in place and expect an outcome doesn't mean you're going to get it.

Go and investigate.

An example, we had teams coming to us when we implemented the scoring and said, "Okay, now I'm having to go and make changes to repositories that currently don't need changes just to keep the score up." Well, that just generates waste.

We don't like waste, right?

We want value.

And so we made some additional changes in response to that to make things better.

We created a higher level view on top of Hygieia that aggregates the metrics up and averages them across the team.

We have a tool inside Walmart that tells us how big a development team is, how many engineers are on that team, and so we can average the scores based off of the team size, and we can get those deploys per day per developer, commits to master per day per developer, and find out how teams are doing and use that to say, "Okay, here's our goals.

Here's where you are.

How do we help you achieve those goals?" But even then, currently, all those scores are weighted equally, but commits and deploys are far more important than code coverage.

And we have teams that are right now trying to raise their scores by raising their code coverage, which is incredibly easy to do.

So what we do is we have on our backlog shortly to drop the weighting on code coverage and increase the weighting on commits and deploy to get the outcomes we want.

Now, one of the thing, all of these scoring, and the widget changes, we have pushed those back to Capital One, and you can find those on their master today.

Yeah.

And this team view also adds some competitive fun over and beyond with the teams because there's a view above this one where we can see the scores of all the teams in the enterprise.

And I know this firsthand because I have a tech lead who pulls up this dashboard every day to make sure that we're still winning, that we have the highest score of all the teams in our area.

So it really goes a long way when you provide the visibility, especially in this case for the more competitive teammates or teams. And while Hygieia does give us the metrics to evaluate how we're doing and as a team where improvements might be needed, sometimes teams need a little extra help to actually determine how to implement those next improvements and get to the next level.

So we have Sherpa guides.

We're a group of developers who've done this before, and we can embed with teams and help them out.

So the team that I lead is the CD Sherpa team.

We have been up and down the mountain.

We know where the ravines are.

We know where the landmarks are.

We don't want you to become one of the landmarks.

And so we will help with anything required to get it done.

We do platform support for the tools to make sure you understand how to use the tools.

But we also run tech workshops on domain-driven design or strangulation.

My favorite one's agile rehab.

That was a really popular one.

We do leadership outreach, where we explain that this is a change in how teams should work.

You need to understand how this impacts how you incentivize the teams to get the outcomes you want.

And we do team boot camps, where we will embed with a team for six weeks, run two and a half day sprints.

It's very similar to the other dojos you may have heard about from other companies.

And help the team with whatever their biggest constraint is, help them move the needle, show them that improvement's not only possible, but it can be really fun and build teamwork.

And we are pie-shaped developers, not T-shaped developers.

We have to have breadth and depth, and it's really hard to find people for this team.

You can talk to anybody trying to build teams like this.

It's incredibly difficult.

We tell the teams we're not agile coaches.

We can coach you in agile because you have to be good on this stuff to get this done, but we will also help you with planning out a legacy strangulation if you need help with test architecture.

We'll pair program with you to teach you how to unit test if you need.

We'll do anything required, and if our team doesn't know, we've been here for a long time, we know people that do, we'll bring them in and get that knowledge to you as fast as we can.

So at Walmart, we're focusing on outcomes and fostering the culture change to attain them.

It's not an easy task, and it's taking work from all directions.

Growing an excited base of people to advocate every day is important.

We're reaching out to leadership and developing a strategy that includes a single enterprise deployment pipeline, metrics that are focused on the right outcomes, and teams dedicated to training and enabling teams to deliver value safely and quickly.

The single pipeline across all teams and tech stacks makes the right thing the easy thing to do.

And standardized metrics make progress visible and understandable at different levels because it's standardized.

And the gamification makes it competitive and fun.

Our people really keep the momentum going to enable large scale change.

Now, this works for us.

We're seeing a lot of improvement using this process where we were not seeing it before, but context is really important.

You need to make sure that you understand that nothing you see is a cookie cutter solution.

You need to find out what works in your culture.

If people are incentivized by badges, give them badges.

If they're incentivized by certifications, do that.

Whatever it takes to move the needle- Cash ... get it done.

Cash, money.

Money's good.

But also understand, give people permission.

Dana and I are not management.

We are developers.

Okay?

And Walmart has a strong culture of grassroots improvement, and we didn't ask for permission.

When Dana needed a DevOps day to learn more, she said, "How do I reserve the auditorium?" Not, "May I have a DevOps day?" When she decided to start Continuous Chai, she just got meeting rooms together, spun up a Slack channel, started Continuous Chai, and then made all of the leadership help.

And there are people that are passionate about this in your organizations.

Make sure they know they have permission.

Don't assume that they think that they do.

Find those people, elevate them, and give them all the runway that you can give them to bring everybody else along.

Now, like everybody else here, we came with a problem, and it's a common problem we hear.

How do we get non-technical people aligned to the changes required on the technical side?

How do we build that empathy?

We're looking for effective ways to get that done.

I'm a technical person and I can do a little bit, but how do I get them really to understand the change required?

If you have any ideas, if you've had any success on that, please come see me afterwards and I would love to hear what that is.

Go ahead.

Okay.

We want to just wrap up by sharing some of the outcomes that we have seen so far.

First of all, teams are collaborating.

Lots of collaboration between teams is helping to remove duplication of efforts and really shortening the learning curve for teams.

Teams who are focused on continuous delivery are delivering faster, and they're delivering with higher quality.

And when they see that and start to do that, they realize that CD removes the drama from delivery.

And improvement is addictive.

Teams are actively trying to improve and using metrics to measure that progress.

And teams are having more fun.

The motto of my team is deploy more, sleep better.

And when I get to sleep at night, I can find more entertaining ways to get things done at work, and I have time to just find joy.

And here's an example of a team finding joy.

So this is literally how this team deploys to production.

That toggle switch there, if it's not blinking, that means that Hygieia shows that everything is good on Hygieia.

He then flips the toggle switch. If it's still good, Looper says, "CI build is green, you're good to go." Hits the button, deploy, and Concord sends it to production.

And that's exactly how he gets it done every single day.

Actually, version two is all steampunky with...

It's really cool.

And if you'd like to have fun like that, we're always looking for good people, especially on my team. careers.walmart.com.

Feel free to come talk to me after.

And we're going to be in the speaker's corner at 3:15.

We're always happy to share.

That's the best way we know how to learn.

Thanks very much.

Thank you.

Now we have a few minutes-

Q&A

Now we have a few minutes for questions if anybody wants.

I think we have four minutes and 39 seconds.

And there's a mic coming up right behind you.

I think just a quick question.

Great talk by the way.

Thank you.

Appreciate that.

Was there any governance model in place?

Did you have to go through any kind of a governance model to improve that adoption of DevOps culture?

Not so much a governance model, but there is an ask, and this is super important.

It's not enough to have developers pushing for this.

You've got to have executive pushing for this as well.

And we have an ask from our CTO that every team deploys at minimum to production once a day.

Right?

And having him drive that and having him look at the metrics on Hygieia to say, "Okay, this is where we're at today.

What do we need to do to get there?" Right?

And you have the bottom pushing up because it's better for us as developers.

You have him pushing down, and then we just get it done.

As far as safety goes, we bake the safety in the pipeline.

We don't use process.

We automate the safety.

So as you release the code, do you store those metadata somewhere like in configuration management or somewhere, so as a management, you could go back when was the patch applied or when was this defect fixed, et cetera?

Yeah.

We know what revision went to production, okay, and we can trace that all the way back to GitHub.

I honestly would prefer more tracking, but we're building that all the time.

We're hardening the pipeline constantly.

Yeah.

We have the same challenge from release engineering team, and we have a release challenge right now.

If you're running or you have it, let me know.

Sure.

Yeah.

I'm curious about your Sherpa team.

Is that a permanent role?

Is that a full-time role for people on that team, or are they balancing that with their regular workload?

Yeah.

Ross Clanton from Verizon presented last year on the DevOps dojos at Verizon, and I gave that to my VP.

I said, "We really need something like this." And he said, "Okay." And so now it's my job, right?

Yeah, it's our full-time job, and it's a really hard job.

And it's hard, number one, to find those people, and it's hard not to get them burned out.

The other challenging part is we also have to do some development work.

We've got to keep our fingers dirty because you lose that so easily.

So I'm trying to make sure on a backlog that we have development work that also adds value for training.

If we want to teach people how to test, we just build an application that's tested appropriately, that has all the information in it, and show it to them.

Right?

But it is a full-time job.

It used to be my hobby.

How do you guys navigate the challenges of getting consistent trunk-based development versus development teams that may want to say, "Oh, no, no, GitFlow workflow is going to work really well." "We're going to do that way." You know very well it gets into religious wars over that topic, so.

Outcome metrics, right?

Yeah.

You're not going to be able to go to production every single day if you're running GitFlow.

You're just not going to do it.

It's going to cause too much drag.

You're going to be spending all your time on trying to deal with merge conflicts or all that nightmare.

Right?

Also, your score is going to suck.

To get a good score on the SCM widget, you have to use trunk-based development with a feature branch that is less than 24 hours old going to master.

Right?

And it'll be locked.

If you're not merging to master, you won't even show any score because we're only going to measure master.

And so the scoring of that widget and the outcomes you're looking for help drive that behavior.

And then it's just like, "Well, I can't because..." Okay.

Well, let me help you with why you can't.

From a team buy-in standpoint, I also feel like being able to show it.

If you have teams that are doing it and they've had those aha moments going, "Wow.

Yeah.

We really need to work this way because look at these outcomes and look what we're doing." If you can share that with the teams that are not doing it and try to build them into having those same kind of...

I call them aha moments.

It's like when the team has that, and then there's no turning back.

Continuous delivery all the way.

Well, the other thing is because my team, we're all developers that come from product teams, and we've worked this way.

We know this is a better way of working, so we can absolutely tell them, "For reals.

This isn't theoretical.

Your life will be better if you just let us help you move this way." How many people are on your team?

Right now?

Right now there's four. careers.walmart.com.

And I think our time is up, so thank you very much.