DevOps at Nike: There is No Finish Line
There is No Finish Line
Chapters
Full transcript
The complete talk, organized by section.
Scott Boecker
I joined Nike about two years ago, and we are on this digital transformation. And so we want to spend a little bit of time today talking about where we've come from, where we are now, and give you a little bit of a taste of where we're headed.
Nike has always been: our mission has been to bring inspiration and innovation to every athlete in the world. But you'll notice a little asterisk up there, because at Nike, we define an athlete as: if you have a body, you're an athlete. So it expands our consumer base very well.
But as you all know, on and off the pitch, the field, the court, our world is evolving and changing at a pace that we've never seen before. We are communicating constantly and in new ways. Our expectations for personalized service have never been greater. We pay for goods with our phone. We pay our friends back without cash. Commerce happens just about anywhere, everywhere, and at any time. And the technology to support the Internet of Things is changing to support all of this.
These trends were apparent in 2014, 2015, and we knew that in order to keep up with our athletes' needs, we had to change how we build and operate digital experiences. And that change started with our team. So we first focused on creating a very tight partnership between product and engineering, with a shared vision for exceeding our consumer expectations.
Then we looked for a business problem that directly impacted our consumers, our business, and would also be able to prove how a new platform and a way of working could bring scale and leverage to the business globally.
We did not have to look far. I'm not sure how many people in the room are aware, but basically on any given Saturday at 7:00 a.m. or 10:00 a.m. Eastern, we often release new shoes, and they're very highly coveted, and have hundreds of thousands, if not millions, of consumers simultaneously trying to cop the newest Jordans, Dunks, Kobes, or LeBrons.
I think Gene mentioned my prior history at Ticketmaster, so not unlike the new U2 concert going on sale, we have a very, very similar challenge and problem in how we scale and how we serve that demand that hits all simultaneously, which I know all of you are very well aware.
So the experience that normally happens on a Saturday, or what was happening in the past when all of these people hit, you'd come in super excited, on time, find your shoe, go into the process, and this is what you'd get.
In order to manage that, we were queuing and throttling everyone at the edge, and therefore managing that flow in order to manage the consumer experience and how they would come through, get the inventory, go out. If you bounced, if you left this, you were out of line. Imagine being on the phone, sitting there waiting, and you get a call. If you take the call, you're out of the line.
And we had moments of these with high demand, high consumer base showing up, where some of these lines would last anywhere from two to three hours. We would literally have people waiting in a line like this for hours.
And unfortunately, because the demand of the shoes far outweighs the supply, the result was often this.
So as you can imagine, this is not an ideal consumer experience. This does not leave people feeling the warm and fuzzy success of getting that shoe they needed. And so we knew we had a challenge and a problem that we had to solve for our consumers, for our brand, and fundamentally for the business to be able to meet the needs that was clearly there.
So we pulled together with product and technology, and we created shared principles. How do you approach this together? And we created our own kind of hierarchy of needs and said, in order for this to succeed, it fundamentally needs to be reliable, secure, and stable foundation. If the site's not working, if the app's not working, everything else is basically irrelevant.
Next, we wanted it to be fair. Fairness is a really large challenge when you're talking about limited units for a lot of people. So there was a process by which we would understand the selection process and the different ways to buy, depending on the inventory. We created unique lottery systems, and then we created unique first-in, first-out lines based on what we had.
Finally, it's got to be fast, especially as we think when we were developing here, was we wanted to develop a set of services that would be utilized on the web, in an iOS app, and in an Android app simultaneously. So they had to be fast for people on different connections and with different devices.
And then once you get all that right, you get to make it fun. And so that was kind of the start and how we set this up.
And I'm going to turn it over to Ron now to talk a little bit about how we approach the technology to deliver against this.
Ron Forrester
Morning, everyone. Thanks, Scott.
Before I jump into the main part of my talk, I want to highlight something that I think everyone needs to sort of internalize, if you haven't already in your organization, which is a key to how we got from where we were to where we are today, was the relationship between engineering and product.
We call it two in a box. Maybe some of you guys use the same sort of terminology, but it's super critical that you have that collaboration and that understanding across those boundaries. There's lots of times when Scott and I should be in a meeting together, but we can't be. But we know if one of us is there, we're both representing the shared interest and the shared success of what we're trying to get to.
So a little bit about what Nike was like when I got there. Let's call it Nike Digital, about six years ago. It's called Nike Digital now. Back then it was called, with a really sexy name, North American Lean Business Solutions.
Pretty awesome. But note the use of the word lean. So we were cutting edge back then.
So when I got there, the culture was very much one of vendors, agencies, contractors, data centers. And we were doing amazing business. We had transformed Nike, even at that time, from really a very physical product-oriented company to the beginning of a digital offense.
And the key to that really was looking to see, how did we want to transform our platform? And at the time, the platform was very monolithic. Again, it was driven much by contractors, lots of big iron. We'd throw solution architects at it, and we'd have nike.com at the end of the day. Did great business.
And monoliths get a lot of bad raps. The monolithic system that we had at the time, it wasn't really the problem. The problem was that we couldn't scale our monolithic system, and we couldn't innovate on it. It was serving our business needs, it was driving a lot of revenue, but it wouldn't take us to the next level.
So the first thing we needed to do was come up with a set of aspirations around how do we want to move forward? How do we want to transform our digital footprint in the world? And that started with internalizing a lot of what our product owners were talking about, what they wanted, what they had, what they saw as a future for our consumers.
And being engineers, we tried to decompose that into the simplest statement possible, which was premium experiences at scale. That was kind of our mantra. And scale had a lot of different meanings. It wasn't just infrastructure scale, it wasn't just scaling to our consumer. It was also scaling to the needs of innovating on that platform. It was scaling the number of internal experiences we had on that platform, et cetera. So that was a key insight at the time.
And the next step to move forward in that was, because we didn't have a strong in-house engineering team, we needed to go out and hire some talent. We needed to attract people to Nike, let them know that we're not just a shoe company, we're not just an apparel company, we're serious about digital.
And so we started to do that, and we made some great hires in the early days. And as a matter of fact, starting about four years ago, we were hiring an average of about 250 people into our technology group a year, which was pretty astounding for a shoe company.
And then we started to look around the industry for luminaries and people we could learn from and ride on the success of. And some of the obvious ones came to the front, for us, were Netflix, especially from a big infrastructure, high-scale standpoint. And then I think Spotify. We tended to look to Spotify for that cultural touchstone and how they built their teams and how they operated internally. And so that was another key insight for us.
And then we talked about, okay, what do we want our DNA to be? Nike uses the term DNA a lot. What do we want as we move into this era of developing software ourselves? How do we want to look at it?
And there was a lot. We sat for days and days in rooms with the top 30 technologists in the company, which sounds like a small number, and it was back then. And we had lots of bullet points, but these are sort of the high-level talking points or aspirations that we wanted to have.
So first of all, obviously, we're going to make a massive shift to in-house development.
And secondly, obviously, the power of the platform that we wanted to create was that we could do small releases every day. Again, when we were part of that sexy North American Lean Business Solutions, we were doing, at best, monthly releases, and oftentimes it was more like quarterly releases, and we'd have maybe two big enterprise releases a year. We could not move at the speed of the business. So small releases every day.
Open source first. We used open source back then a little bit, but it wasn't a concerted effort to always look at open source first before we started to develop our own products or before we started to buy things.
And then also publish. And if you guys have been to engineering.nike.com, you see that we have a really great presence right now in the open source community with a lot of great projects across different types of platforms, mobile and back-end and front-end.
And that's a whole other talk. I could talk about how you get a company like Nike, whose brand is so valuable, to allow you to put out what they think is IP into the world. It was an amazing journey, but they've embraced it and we have a great presence now.
And then, of course, consistent engineering principles across all the different teams that we're working. These are things like security, maintainability, reliability, performance. And we had seven of them, and we purposely put features at the end. So everything else needed to come before features.
And then, as I've talked about a minute ago, product engineering own the solution. Two in a box. We have to own it together. We have to understand it. Our success is tied to it.
And if you haven't caught up yet, I'm using a sports analogy. I'm sure you're not surprised. So we need to come up with a game plan.
Where do we start? There's process, tooling, patterns, architecture. We could jump in anywhere. I could go deep into our architecture and our tech stack, but I don't think it would be very surprising to most of you. I think more what I want to talk about is the cultural disruption that we did.
Just as a quick aside, we jumped straight into Netflix stack for sure: Karyon, Eureka, Hystrix, all of that. Started to build our architecture around those tools at scale. Obviously embraced microservices, and we created our own internal blueprint for that, so that if you were firing up some services, you could quickly grab our blueprint and get started, and it would have all the tools and build systems that you needed to do that. And then again, some strong principles.
And then the next few minutes I'm going to spend popping a word up here. And all these words will be very familiar to you guys, but I'll talk a little bit about what they meant at Nike and maybe a couple anecdotes around them as well.
So obviously, Agile. Again, four or five years ago at Nike, this was fairly disruptive. We had a very IT enterprise-driven culture, lots of giant requirement documents, all of that, and needed to embrace Agile.
And for those of you who have started Agile in an enterprise environment, you know that Agile often, to the business, means, "Yeah, we can change anything anytime we want, and we don't have to tell you when." So that was an interesting... And to be honest, right, that's kind of true. Maybe that isn't how it works nuts and bolts and mechanically, but we had to adapt to that environment. We had to adapt our processes and our ceremonies and the way we worked to kind of make that possible at Nike.
Regardless of how the business internalized the Agile mindset, we had to make that possible.
Pizza squads. I just threw that up there because I like pizza, and I think that makes everything go better in a business situation. So I think if you actually put pizza in front of all these words, pizza Agile, pizza... Everything would just go much faster.
But obviously, again, Amazon, Spotify, other companies were talking about two-pizza squads or teams and stuff like that, and we embraced that as well.
Sprints. This one's awesome because, and this is a true story, elements of the Nike business thought, "Oh, my God, this is amazing. The engineers are embracing our legacy running culture, and they're using terminology that means something to us." They had no idea that this was already in place. So it was amazing. It was great.
We ended up, the way we work in our day-to-day process is we do six one-week sprints that we put into milestones. The sixth week is generally supposed to be for things like innovation, stretch, a little bit of planning, stuff like that.
And the reason we settled on that, and I'm sure it's obvious, is that if you're going to do one-week sprints, oftentimes the ceremonies around Scrum and Agile can overtake the amount of time you're spending actually doing work. And so we wanted to accelerate that and make sure that we were putting the ceremonies in one place at the milestone level and do all of our planning at that point and then let people actually sprint forward. But we still had our stories and epics broken up into the sprint boundaries.
Obviously, cloud-native, without a doubt, wanted to get out of data centers. Data centers were killing us. We're very seasonal. We couldn't scale without actually adding a bunch of iron, and then we were paying for that iron the rest of the year. So it made sense to get into the cloud.
Microservices, already talked about.
Automation, and I'll talk more about this later, but clearly a key unlock for everything. And DevOps, in my mind, is nothing without automation.
Continuous delivery, everyone gets that. Again, at a big brand that's very valuable. This sounds to them like, "Oh, my God, you're going to push changes all the time, and I don't get to tell you when or look at them or stop them." So that was an interesting disruption for our business as well.
But once the automation gets in place and we can prove quality through the pipeline, et cetera, the business gets pretty excited about being able to have an idea one night, get up the next morning, and have it deployed by the end of the day, worst case.
Canary deploys. I threw this up there because this is actually a really hard problem. It's not so hard if you have one experience that you're supporting with a platform, but it gets really difficult when you have many experiences on one platform. That platform is global. You're trying to follow the sun with that platform. Canary deploys are very complicated in that area. And again, I threw this up because I think it's critical to the way we need to work in the future.
And I'll tell you, we haven't cracked the problem yet, but we're working hard on how to get there for the way we do business.
Decentralized quality. This one's fun. Typically, a centralized quality organization gets to be the one that is responsible for the quality of your product. They don't like that idea, right? But that's what happens. It's sort of that one throat to choke.
And when we talked to our partners and people throughout the company, our stakeholders, "Hey, we're going to decentralize quality. We're not going to have a QA org anymore. The engineers are going to own quality. Product's going to own quality. We're all going to own quality together," they're like, "Well, who do we go to when the product is broken? And we can't really talk to the engineers because they're super fragile and emotional, and if we tell them that it's their fault, they're probably going to quit, and they'll stop doing their magic."
So that was a fun one, but we're well through that, and I think it's working really well. There's some pockets where we still have a bit of centralized quality, and we do it more of a center of practice or a community of practice. And we don't have that anymore. The engineers own that.
Monitoring and alerting, again, huge unlock, for obvious, but huge unlock for DevOps culture and just doing your business better.
And then I'll throw DevOps up there. So we did a lot of this before we talked about DevOps. A lot of this was very disruptive to a company like Nike and the way we did our digital business. And so it was a little bit of a piecemeal, kind of capture some value a little bit at a time through each of these technical and process plans.
But when we started talking about DevOps... I should say, when we started talking about monitoring and alerting really carefully, the engineers were really paying attention, and they knew something was up. They were like, "Yeah, we do need monitoring and alerting, but do we really need it at that level?" And they started to sense that something was going on.
Then when we said DevOps, they were like, "Timeout. What is going on here? Are you telling me that I have to wear a pager? Are you telling me that I'm responsible for the infrastructure? This is getting a little worrisome for me."
And this was sort of summed up with this statement, like, "Why can't we just have a DevOps team? We already have a production support team. We could just call them DevOps, and it'll be fine, right? And we don't have to do all that stuff, right?"
So this was a very culturally disruptive idea, and that's what I really want to highlight to you guys. A lot of you guys know this, but it's not about technology. It's not about the tools. It's not even so much about the process. It's more about this cultural accountability for the work that you do.
And I try to think about it as we spend a lot of time decomposing our technical problems into pieces that we can solve with software. Part of this is decomposing our organization into the simplest autonomous units, which are the people who do the work, and giving them all the power that they need to do that work, but also giving them responsibility and accountability for how they do it and its quality.
But a lot of legitimate questions come back. It's fair to say, why can't we have a DevOps team? The engineers are talking about, "Well, what does production support do now if we're responsible for what's in production? What happens when my service is deployed to five regions around the world? Am I really on the hook for when it goes down in all of those regions and has consumer impact? How do I follow the sun? When are we going to get our features done? I'm spending a lot of time deploying infrastructure and managing it. When do we get our features done?"
Our answer back was, and it was a cultural mechanism by which we wanted to affect this change, was: sorry, there's not going to be a DevOps team. There's no DevOps roles. There's no people with DevOps in their title. DevOps is accountability.
Thank you.
And that's just a forcing function. It's something that you kind of have to pull the Band-Aid off and say, that's what it is.
There's lots of stuff for production support to do. Production support can change in an organization that has a longer view into what's going on with your infrastructure and how consumers use it. They can collaborate with you to create and use their experience to create the dashboards and the monitoring and alerting that they need. There's a lot of value in that production support organization. It doesn't go away just because the engineers own the infrastructure they deploy to.
And more importantly, I think, for me, is the idea of a DevOps team is really just the idea of putting another wall in place where people can throw stuff over it and say, "It's no longer my responsibility." And we have to squash that. It can't be part of the way we work.
So over time. Sport, right?
Just wanted to put up a few things that I think are still on our mind, and things that we'll really never stop working on. And if you saw the title of the talk, it's No Finish Line. There is no finish line. That's a famous Nike sort of statement at work where we're never done. And yeah, there are finish lines for each little race, but there's lots of races. And so we internalize that as just, we've got to keep improving. We never stop. We just go after every little detail that we can.
So these are, again, some obvious spaces that we're working in.
Better monitoring and alerting. I think what we're looking for there is more finer-grained monitoring and alerting.
So what happens a lot when you have a platform, which we do, that's used by many experiences, when one of the experiences starts to seem to fail, even if it's a platform issue, it's obviously going to be visible in the experience. And so the first people to get paged are the experience engineers. Experience engineers get up, they spend an hour looking to see what's going on, or five minutes, whatever it takes, and they're like, "Wow, this has nothing to do with the experience. This is a problem with inventory or payment or whatever. It's a service level."
So how do we get a monitoring-alerting system in place where we're alerting so carefully and so accurately that the right people are alerted to the problem that's happening, and we don't wake up teams? Because what happens is the interface or experience teams become sort of the L1 support, and they get really tired.
The next thing is, hey, we have a great platform. Can we all use it now? Next is all about dependency management. You can no longer just deploy code and know that you're only going to impact one experience. You're impacting 15 experiences globally. So how do we make sure that we have the right test automation in place, that we understand the dependencies between all of the platform pieces?
Tool consistency. For me, that's mostly about, let's stop developers working on commodity stuff. I don't need another Jenkins. I don't need another pipeline. I don't need another test automation framework. They're out there. There's plenty to use. I don't need another service discovery layer, all that stuff. Let's stop working on commodity. Let's create features. That's what our consumers want and that's what we need to do. So it's more about gluing that stuff together.
And then the internal telemetry automation, for me, is really about... It's less about monitoring and alerting at the service level. It's more about on those Saturday mornings that Scott talked about, are we scaled correctly? Did everything return to sort of nominal after the previous launch? What is our readiness for that?
And it's not just readiness of our services and systems. It's readiness for our operations, our content, all the things that go into a successful launch. So it's really kind of surfacing that view into the infrastructure and into the business overall operationally, and we need to spend more time doing that.
So I think that's just a quick view of where we're going with that, and I want to turn it over to Scott to finish it up.
Scott Boecker
Great. Thanks, Ron.
So to pull back to the story we started in terms of where we were implementing all of the great things that Ron and team have been doing, what we did together to drive a cultural change, to organize the teams between product, design, engineering, how did that net out?
We ended up launching a new platform for what we call our SNKRS platform, which is the SNKRS website, and then the SNKRS iOS and Android. We had that running on a new platform and an old platform simultaneously for about six to nine months, where we were able to test our way into it. That culminated in Christmas of 2016, a year ago, with the new Air Jordan 11, inspired by Space Jam, where we basically had the largest and most successful shoe launch in the company's history.
We went from moving, as I mentioned before, up to three hours, to minutes. And that is now the current platform that we have built on.
So if you go back to the principles and the hierarchy, we made it stable. We made it secure. We introduced the fairness with new ways of buying, and we satisfied the speed. So we were able to deliver on all that, and that has brought a core way in which we are moving forward with how we operate and re-platforming the entire nike.com platform. And it has brought tremendous business value to the company.
But it's not the end. As Ron said, there is no finish line.
This foundation also allowed us, during that time, we acquired a company out of New York that focused on community line-based services. It was called Virgin Mega. They focused on this idea that people are standing in lines, whether it's for ticketing for concerts, at festivals, or for shoes, that during that time, there's a community, and you have an opportunity to engage that community and inspire that community.
And so we've stood up a new digital studio in New York. So we have remote teams working on this, connected back to our groups in Beaverton and Portland, working together on this platform at a global scale.
And so what we wanted to leave you guys with was a short video, because if you remember the top of the triangle, it was fun. And that's ultimately what our consumers and people want from the Nike brand, is that inspiration and that innovation, and how innovation brings inspiration to them.
And so with that, we will leave you with this video showing just a taste from our studio in New York of what's happening and what's coming.
[Video plays with music and a Nike SNKRS-style launch message: "Kenny here. Look out for a push notification for an exclusive chance to cop my Air Foamposite One. Five hundred pieces."]
Thank you, guys.
No, go for it.
Appreciate it.
Take care, everyone.