Scaling Personalization: DevOps at Stitch Fix

Log in to watch

San Francisco 2017

Scaling Personalization: DevOps at Stitch Fix

Stitch Fix takes a unique approach to retail which combines art and science -- we send our clients clothes we think they will love; they keep what they want and return what they don’t. Based on years of data science and machine learning, we develop personalized algorithmic recommendations for each client, and one of our thousands of human stylists hand-curates those recommendations to choose what goes in each box. By continually iterating and improving how we serve our clients, we have grown to 6000 employees and $700M in yearly revenue.

This talk will discuss the technology approach and modern development practices we have put in place to make this model succeed. To reduce the coupling that stifles velocity and innovation, we are actively rearchitecting around microservices and event-driven approaches. To get the agility that comes from rapid iteration, we practice TDD and Continuous Delivery. To get the ownership that drives optimal results, we organize around small, independent teams that run what they build (DevOps). To take advantage of our rich data, we maintain a robust data pipeline and offer data as services. All of these practices work synergistically, and each benefits from and reinforces the others.

Similarly, these scalable practices have allowed us to both support a large organization with a relatively small engineering team, but also to seamlessly grow that engineering team from 25 to 75 over the course of a year.

The talk will conclude with lessons we learned on this journey, and will offer concrete ways other organizations can do the same.

Chapters

Full transcript

The complete talk, organized by section.

Randy Shoup

All right, cool.

I am, again, still Randy Shoup. I'm the VP of Engineering at Stitch Fix. I want to talk to you about scaling personalization, DevOps, and AlgoOps at Stitch Fix, and I will tell you what I mean by AlgoOps as we go along.

First, a little background about me, just so you can hear and understand where I'm coming from. Right now, like I say, I'm VP of Engineering at Stitch Fix.

Who has heard of Stitch Fix in the room? Wow, everybody. That is awesome. I am modeling it a little bit: the shirt and the pants. Because we now have men, in case you did not know that, and plus-size women in addition. Great stuff.

We use technology and a ton of data science to help you find the clothes that you love.

Before that, I was sort of a roving, my friends used to say, CTO-as-a-service. For about a year and a half, I helped a bunch of small startups in the Bay Area, a few larger companies in Europe and Asia, do the kinds of things that we're talking about at this conference: scale their organization, scale their technology, be fast like the unicorns.

Before that, I was Director of Engineering at Google for Google App Engine. App Engine is Google's platform as a service, like Heroku or Cloud Foundry or Engine Yard or something like that. Earlier in my career, I was Chief Engineer at eBay for about six and a half years, and I spent most of that time working on eBay's search engine.

A bit about Stitch Fix. You've all heard of it, so that is awesome. For the one or two people that have not, Stitch Fix is sort of the reverse of a normal clothing retailer. Rather than coming and shopping on our site or shopping in a physical store, what if you had an expert to choose your clothes for you?

You come to our site, and you fill out a really detailed style profile. That is your size, your height, your weight, your age, your parental status. Do you like to flaunt your arms? Do you like to hide your hips? All these really detailed and pretty personal things.

Why do we ask you such personal questions? It's because if there's somebody in your life that knows how to choose clothes for you, and maybe that's you, or maybe that's a spouse or a sibling or a best friend, what does that person know about you that makes that person good at choosing clothes? Those are the kinds of things that we want to ask.

You give us that information. We're going to send you a box with five handpicked items personalized to you, actually handpicked for you by a real human. You're going to keep the things that you like, and you will pay us for those, and then you return the other ones for free.

Behind that is a lot of data science and cool technology. We have a bunch of inventory in our warehouses, as you might imagine. We're an actual physical business.

We do a ton of machine learning models over understanding our clients, understanding our inventory. Every day, we take every piece of inventory times every client, and we compute a predicted probability of purchase. What is the conditional likelihood that if we send Randy this shirt, he will keep it? I don't know, 65% chance for this shirt, 72% chance for the pants, 47% chance for the shoes, something like that.

Those scores are personalized to me. They'll be different for everybody here in the room. There's a ton of machine learning models that go into that, and so that generates these scores, or we say personalized algorithmic recommendations.

Those algorithmic recommendations are then surfaced to a human stylist. We have 3,500 of them all around the United States. They are mostly part-time, but they are all employees, so they get benefits and all that good stuff. They are the ones that are curating those recommendations and deciding the five things that go in the box.

Does this make sense? Cool. All right.

Behind that is a bunch, like I said, of data science. I believe this is unique or near unique in the world. We have a near one-to-one ratio between data scientists and engineers. We have about 125 engineers that work in the team that I'm on, and we have 80 data scientists and algorithm developers. That, again, is a completely unique ratio in our industry, certainly in the retail space, because we know all those companies.

What do we do with all those data scientists? Well, one thing is we personalize clothing recommendations for you, but we also do a ton more. It turns out that if you are smart, it actually pays.

We use that same model of machine-driven recommendations with human curation. We do that for deciding what to buy. We have human buyers that come out of the fashion industry. They are very fashionably dressed all the time. It's great to be on that floor. They are actually making the buying decisions, but they are powered and empowered and augmented by machine-driven recommendations based on all the number crunching that the machines can do about things that went well last year and so on.

Inventory management. What warehouses should we store those things in? When somebody is picking those five items in that box, they are wandering through the warehouse. That is a traveling salesman problem.

If you're a computer scientist, you have heard of this idea of traveling salesman. I can explain it offline if you like. But it is a computationally intractable problem. The only way to get the optimal correct solution is to enumerate every possible combination of different things. When you're choosing five, it's not so hard to enumerate all those things. When you choose 25 or 100 or 1,000, the choices are more than the number of atoms in the universe. So we do heuristics, and we use data science to figure those out.

We do logistics optimization. We want that box to show up on your doorstep on the particular date that you'd like it. We want to ship it to you with least cost and most likelihood that it's going to get there.

The styling recommendations I described, and then also demand prediction. We're a physical business, as I mentioned, as many of you are, I'm sure. For us, unlike the eBay and the Google in my past, which were virtual businesses, where if we had 2x the number of clients or customers, we'd be super excited, and we'd have a party. If we have 2x the number of clients that come to Stitch Fix, that's a disaster. Because it means that we can't serve half of our clients well at all. We don't have the inventory. We don't have the warehouses. We don't have the people.

Does it make sense? Cool. All right.

All the model here is humans and machines working together. Use the machines for what the machines do best. Use the humans for what the humans do best.

I want to talk about how we have scaled this personalization business. First, I want to talk about organizing our company for speed. Then I want to talk about what to build, and even more importantly, what not to build. I want to talk about how to build it, and then I want to talk about continuous experimentation to continue to improve.

First, we're going to talk about organizing for speed. I suspect this is not the first time in the conference you have been introduced to the idea of Conway's Law. I will reintroduce it to you the n plus oneth time. The idea is that the organization that you have in your engineering group determines the architecture that you are going to build. More particularly, Mel Conway in 1968 observed that the design of software systems reflects the communication paths among the people in the organization.

He meant it as a descriptive law, but we can use it in a normative way to form the architecture that we want. Perversely, and maybe strangely, we can actually form the organization in a modular way: small teams, well-defined areas of responsibility, which we'll talk about in a moment, and that will engender a modular architecture.

Does it conceptually at least make sense? Cool.

The idea is that we can engineer the software system we want to build by first engineering the organization to reflect that thing that we want it to be a mirror of, if that makes sense.

What do I mean by that? I mean small service teams. All the unicorns that you might have heard of, the Amazons, the Googles, the Netflixes, all of them are organized in the same model. To borrow Jeff Bezos's phrase, two-pizza teams. Teams that are individually no larger than can be fed by two large pizzas. That's typically, ish, four to six people. Four to six Americans, I guess. Hungry Americans.

The funny little corollary, which my Amazon friends tell me about, is that it means you can't have too many younger people because they eat too much pizza.

The other idea that's co-located with this is that the team should be full stack. They should have all the skill sets, all the capabilities within that team boundary to do the things that they need to do.

That doesn't mean they should do everything in life. We're not building our own chips and writing our own operating system. But if the application that that team is building has a user interface component, a persistence component, all those skill sets are in that team for some value of n.

Those teams are aligned with business domains, so a team has an actual function. Hopefully, there's an actual business metric associated with the output of that team. They have a clear, well-defined area of responsibility. They typically will write a single service or application, or a set of related services and applications. The wonderful corollary of that is that they then develop this deep understanding of the particular business problem.

My team that builds software for the merchandisers, the people that buy the clothes: when we have new merchandisers come in, they're like, "Oh, you're going to need to meet with the engineers to tell them what you want them to do for you." And they start with, "Okay, here's how the buying process works." And my team's like, "We've been doing this for five years. We know how you do stuff. We know your business. We know your domain."

Which is wonderful because it means that the engineers can be just as creative as the business people on figuring out how to solve the problems, and even more importantly, understanding what problems need to be solved.

Does it make sense? Cool.

When we grow these teams, we don't just let them get bigger and bigger and bigger. They grow through, remember high school biology and cellular mitosis? You take this cell, and you divide it into two, and each of those cells divides yet into two, et cetera. That's how we grow the teams: not by making the teams bigger, but by subdividing the responsibility of the team into more and more granular areas.

This is sort of a heuristic that I like to use as I form the teams. I like to think of how can we form the teams in such a way that on the order of 80% of our project work is done within that team boundary, and a minority of work on the team is done cross-functionally.

Cool. So that's about organizing for speed.

Now I want to talk about what to build and what not to build. Mary and Tom Poppendieck, who I hope have been quoted here, but they really should have if they have not been, wrote a fantastic book called Lean Software Development. Mary, in particular, is one of the most brilliant, and lucid, and clear thinkers in our industry.

She says, or they say, "Building the wrong thing is the biggest waste in software development." That's the first waste. Figure out the right thing to build and build that, because it doesn't matter how well or quickly you build a thing that doesn't matter. If it doesn't matter, don't build it at all. Right?

Here is the way. If you take nothing away from my talk, it's going to be this one question: what problem are you trying to solve?

This often happened as an engineer, and this is totally legitimate, by the way. I will have a business person or a product person come to me and say, "Hey, Randy, can you and your team add this button to do this thing for me?"

Okay, we could do that. Let's take a step back for just five minutes. What problem are you trying to solve?

When you reframe it in that way, the answer, once we talk through the answer, the problem, the solution may legitimately be we add that button, and we're very happy to do that. But often, it may be that we don't build any technology at all.

What I like to think of is helping to frame the problem in partnership and collaboration with the business person, with our business partner, as we like to say at Stitch Fix. That is the first step to doing the right thing.

Charles Kettering, General Motors, they know a thing or two about building stuff. "A problem well stated is a problem half solved." Once we can think about what the problem is, the engineer mind... Because one of the things we learn as engineers is how to problem solve. If the only solution hammer we have is typing into an editor and building software, that's not great. What we can do, though, as engineers, is help the business people and product people think through in a disciplined, structured way.

Okay, here's a thing I could build you now, or here's a thing you could do yourself, and then maybe the next step we'll do another thing, and the next step we'll do another thing.

Does that conceptually make sense? Yeah.

A thing that I think engineers underappreciate about themselves, about what they bring to the table, is even if they don't understand the domain, they do have this trained, structured way of thinking and discipline around approaching problems. That's a thing that we, I don't know about uniquely, but engineers and scientists tend to approach problems in that way.

I also have a liberal arts degree, and I did not learn that from that side of my thing. That's not wrong. I have a political science degree. I'm really happy about that. But yeah, I learned other stuff. I learned subtlety and nuance and a bunch of other things, but not structured approach to problem-solving.

This is obvious, but common sense isn't so common. Focus on these problems on what's actually important for your business. As I mentioned before, we might be able to solve the problem without any technology at all.

I and my teams love to type into editors. We will build you software from here to eternity, but that might not be the best thing for the business. Obviously, that's why you hired us, but again, like I say, if we can maybe redefine the problem: what if we change the business?

"Oh, I have this business process. It's got to work this way, and I need to have this person click this button at the end of the assembly line of putting the box together."

Well, I don't know. What if you put that in a different spot? Just have this conversation and ask questions and be curious about it. Again, at the end of which, we're happy to build you a button, but maybe through this exploration of problems together, we can figure out an even better way to do it.

Great. That is about forming the problem. Now we have decided that there is a solution and it involves somebody typing into an editor. But maybe that somebody doesn't need to be us.

What if we buy/borrow instead of build software? Again, super obvious, but common sense isn't so common.

A thing that one could do in 2017 is not build your own data centers and not build your own infrastructure. I do appreciate regulated industries. We can talk offline about solutions that we might have for you. And I'm not a cloud vendor anymore, so I can be very objective about it.

But starting from, hey, maybe have Amazon, Google, Microsoft, Joyent, somebody else help with things that they are expert at and maybe you are not, would be a good way to jumpstart stuff that you need to do.

Stitch Fix has no owned physical infrastructure anywhere on the planet. We have our laptops, and then in our warehouses, we have a few Wi-Fi devices. That's it. Everything else is in the cloud. We are more a unicorn than a horse, I'll openly say that. We started in 2012, but we took advantage of that, and we've been able to move fast because of it.

The other thing is, hey, in 2017, there is a lot of really excellent open source software. There's a lot of software that is, in many cases, better than the commercial alternatives, and let's start there.

Open source container management, so Kubernetes, Docker, all that kind of stuff to help us make deployable units that we can easily ship through our deployment pipeline and deploy to production. Open source databases that don't cost us and don't have license audits and don't double their prices. And I used to work for that company.

Now you're laughing because you know who I'm talking about. We'll talk about lock-in and how they poisoned the well for everybody over beer.

Machine learning models. Again, as I mentioned, we do a ton of machine learning. We should not be re-implementing logistic regression. We should not be re-implementing neural nets. That is not where our data scientists are best used.

Rather, hey, let's have somebody else, or maybe collaborate with other people, what the heck, and build models in a language that works well for us and well for them. Then we just share that, because there's nothing proprietary for us about building neural nets or deep learning or decision trees. All those things are just commodity, and why are we building those things rather than solving business problems?

Again, as I mentioned, usually in 2017, these open source projects are at least as good, if not better, than commercial alternatives.

The other thing I like to quip is that I am not going to pay you for software. I'm just not. If you're a software vendor, I'm sorry. I'm sure you're wonderful. I love you. I'm not going to pay you for software, but what I will pay you for is a service. I will absolutely pay you for a service. Make this entire problem not mine anymore.

At Stitch Fix, we use more than 50 third-party services. Everything from the operational side, like logging and monitoring and alerting, to project management and bug tracking and billing and payments and fraud detection. All these things that are not our core competency as a clothing retailer, but are somebody else's core competency, and I am very happy to pay them for the privilege of leveraging their services.

Does it make sense? It allows us, again, it's so obvious, it allows us to have our small, resource-constrained team, as everybody's are, focus on the things that matter for our business, rather than on undifferentiated heavy lifting or things that other people are even better at.

The quip that I like to make in the cloud area is soon it's going to be just as common to run your own data center as it is to run your own electrical power generation. There are companies, Google is one of them, that do actually run their own electrical power generation because they are at the scale for which that makes sense. Chemical plants, there are companies for which that makes total sense, and equivalently, there are companies for which it makes sense, even in 2017, to build a new data center.

But for the most part, for the vast majority of us, even a lot of the horses, I think not. Again, I'm happy to explore that more with you in the breaks if that's helpful. Cool.

The other thing I want to talk about is a bit about, again, what to build is we need to do some experiments. That's a theme, I'm sure, of this conference. I've been at another conference, which I've been running down the road. I would've loved to spend all week here, and would if they weren't overlapping exactly with my conference.

Let's have some experimental discipline. Let's state a hypothesis. Let's be scientific about it.

I'm going to try this new thing. I'm going to add a new widget to the website. Okay, cool. Susan, what do you expect to happen?

"Oh, well, I expect conversion to go up," or, "I expect this metric to change."

Cool. All right. Well, we're going to track that metric, and then we're going to see if that actually worked. Obviously, step zero is understanding what the baseline is of the metric.

Now we run an actual A/B test. We don't say we did a thing in October, now we're going to turn a new thing on in November, and every difference between the November metric and the October metric, well, surely that was because of the change. No, super not. There's lots of stuff going on in November. Certainly, you can remember Novembers, right? Novembers, lots of things happen that are different from other things, particularly a year ago.

You need to make sure you have a good sample size. You need to isolate the treatment and the control groups. And this is important: no peekies. You don't look at the results until you have achieved the full sample, until you have a statistically significant thing upon which you can base a principled decision.

Of course, what you've done is you've obsessively logged and measured all those metrics. You understand the consumer behavior. You understand the system's behavior. Now even if your experiment doesn't do what you hoped it would do, now you have much more detail to diagnose why it did the thing that it actually did, and that will help you to design the next one, and the next one, and the next one.

Does it make sense?

I hope this was quoted somewhere else here, but the wonderful number that I have in my head is Roni Kohavi, who I think is still running Microsoft's experimentation group. They did a bunch of experiments at Microsoft, meta-experiments about how well experiments worked, and they learned that basically a third of new features actually improved a business metric, a third of them had no effect, and a third of them actually decreased or worsened the business metric they were trying to move.

If you're not measuring, you don't know which is which. Microsoft, they're no dummies, right? Even they can only get a hit a third of the time. I don't think I can do any better. It's super important for me to not implicitly assume I'm going to get 99% hits, but rather more like 33, but obsessively log and measure to see how I can improve the experiment next time, and next time, and next time.

Listen to the data. Again, hope is not a strategy. Intuition is not science. We look at the data and follow what that does.

Here is the art and science combination. Thinking of what to experiment about, that is absolutely the art. That is the creativity. Evaluating the results of the experiment, there's no creativity there. That is science.

Does it make sense? Cool. Now rinse and repeat. Do it again, and again, and again, and again.

Great. That was about what to build and what not to build. Now let's talk about how to build things very briefly.

Microservices are a wonderful way of, again, if you do your Conway's Law: small teams, well-defined areas of responsibility. They will be much more likely to produce an architecture that looks like this.

A microservice is single purpose. It has a simple, well-defined interface. It is modular and composable, and also, critically, independently deployable. Now I can make changes to that service without releasing the whole rest of the world.

One thing that we have learned now in 2017 is that having those services not share a bunch of data, not share really any data with anybody else, is a great way of making sure they continue to be isolated and continue to be independently deployable and independently modifiable.

Not every company starts with microservices. In fact, few do, and few should. I will give you some examples of companies you've heard of that have evolved from where they were to where they are now.

I used to work at eBay, as I mentioned. eBay is on its fifth complete rewrite of its infrastructure. It famously started as a monolithic Perl application that the founder, Pierre Omidyar, wrote in a three-day weekend over Labor Day weekend 1995. He was exploring this new cool thing called the web, and he wanted to build a thing that would... He was playing around. I mean, he worked at Adobe at the time. Whatever.

That ran on his, I always want to say laptop, but it wasn't a laptop for sure in '95. It ran on his 486 Tower or whatever.

Next generation was a monolithic C++ application, which at its worst grew to 3.4 million lines of code in a single DLL. They were hitting compiler limits on the number of methods per class, which is 16K. So if you guys think you have a monolith, I guess I challenge you. That's pretty bad. I don't say it's not bad, but that was really bad.

The next iteration was a rewrite in Java. Not microservices, but like mini applications. Here's the search application, which served the search pages. Here's the buying application that served the buying pages, et cetera, times 220 different parts of the site. Now it's fair to characterize eBay as a polyglot set of microservices.

Twitter has gone through a similar evolution. They started as a monolithic Rails application, which they called the Monorail. Awesome. They extracted a bunch of the front end out into JavaScript, a bunch of the back end out into services. They were early adopters of Scala, so mostly written in Scala, and now it's fair to characterize Twitter as a polyglot set of microservices.

Amazon has gone through a similar evolution. They started as a monolithic C++ and Perl application, which you can still see evidence of in the URLs. There are some product details pages that you will see Obidos, O-B-I-D-O-S. That was the code name of that original monolithic application. It's a city in Brazil on the Amazon, so that's why the name. It's still in the URLs. Why? Because of search engine optimization and search engine ranking.

Does this seem like a pattern here? Yeah. So no one starts with microservices. If there was any eBay competitor or Amazon competitor that in 1995 started out building a distributed system, there is a reason why we have not heard of that company. Because they did not focus on having a business model, product-market fit, meeting the needs of their early customers, like everything you should learn from lean. Instead, they built the distributed system that they super didn't need.

But past a certain scale, which legitimately may be only 0.1 or 0.01% of companies are going to actually get to, everybody ends up on something that we would now call microservices.

Does it at least conceptually make sense? Cool.

Here's what I like to say. If you don't end up regretting your early technology decisions, you probably over-engineered.

Re-architecture is not a sign of failure. It is a sign of success. It means the thing that you built is actually worth reinvesting in, and it also means that you have the resources to make that reinvestment possible.

I want to say a bit about quality and building solid stuff. Move quality to the left, quality from the beginning, all that.

I like to think of quality and reliability as priority zero features. Why do I say priority zero? It's because if the thing isn't up, it doesn't matter how beautiful it is.

The developers in those full stack teams that I form are responsible not just for the features that they build, but also for the quality of the software, the performance of it, its reliability, and also its manageability. We co-locate all those responsibilities in a what? DevOpsy way. That's what allows us to move quickly.

A thing that we do at Stitch Fix, which is part of our core engineering culture, is we do test-driven development. What that means is developers are writing tests and they are writing the code at the same time. I'm not super religious about I have to write the test first before the code, but you are not done with a piece of code work, like a feature or whatever. You are not done unless you also have written tests to make sure that the thing you built actually continues to work.

I don't do this to slow everybody down. I do it to speed them up. Tests make better code because it gives us the confidence to do otherwise crazy things, like let's refactor everything underneath this interface because we figured out a better way to do it. It actually increases our development velocity over time, as good investments do, because we are not built on quicksand. We are built on solid ground.

Tests make better systems because they catch bugs earlier, and it allows us to fail faster with a tighter feedback loop. Does that sound familiar to lots of things we've talked about at this conference? Yeah.

It also optimizes developer effort. Microsoft did a study of what do developers actually spend their time on, and it turns out it is not all, or even most, or even a little bit writing new code. Most of the time they spend reading and understanding existing code. Next, they spend modifying the existing code to make it possible to add the new thing. At the 5% level, maybe they're writing new code. So you can imagine that test-driven development helps with both of these first two.

Does it make sense?

We have made this investment up front so that we are never going to manually test whether this thing works again. We're just going to run the automated test and make sure that it works.

Okay. I'm being given the hook or whatever, so I'll move it forward.

Has anybody heard, "We don't have time to do it right"? Here's what I say: do you have time to do it twice?

That's it. That's it.

I would much rather build one really solid feature than two half-assed features.

That's it.

This is the perverse but really true thing that I have learned over the, look at me, very many years I've been in the industry. The more constrained we are in time and resources, the more important it is to build it right the first time. Why? Because we super do not have the time to go back and do that thing again.

I will finish with this thought. Build one great thing instead of two half-finished things.

Right does not mean perfect. I mean build the minimal viable feature or whatever. Build the 80/20. Build the thing that is mostly solid, and what you have left out is not the thing doesn't work, but you have left out other embellishments. Does it make sense? Don't build the gorgeous thing. Build the thing that just works. And then if the requirement, the next ladder up, is it needs to be gorgeous, cool. Come back and do that.

The implication that has on development at Stitch Fix is this: we basically don't have a global bug tracking system. Do we produce bugs? We super do. Absolutely we do. We are just as fallible as any engineers on the planet. But because of that test-driven development, we discover those bugs very much earlier in the development process than I've ever seen in other places, and that includes, to be honest, Google and eBay.

Our backlog, we absolutely have a backlog of things that we're going to do in the future, but that backlog is not a list of all the features we have released and are broken. It's the things that we want to do to repay intentional technical debt that we've taken on, or new features that we want to build.

Does this make sense? Would you like to work in an environment like this? Yeah. Yeah, it's pretty cool.

I will just finish with this thought because it is the... No, I don't have it. I was going to have the shameless plug that we're hiring, which we are. It doesn't have to be you. I hope you guys are all very happy in your own jobs, legitimately. But you might have friends that would be interested in working in a company like ours.

We are based here in San Francisco, but a minority of my engineering team is here. A majority is all over the United States, so we are a very, I like to say, remote-first culture. Even the people that live and work here in San Francisco act and behave as if they are remote engineers, and there are whole talks that we've given publicly about how that works for us.

Slack, Google Docs, video chat, all those things are tools that we use in our daily lives. I love the ability to hire excellent engineers that live in rural Iowa, that live in legitimate nowhere, beautiful horse country in Kentucky, the Carolinas, all these places where otherwise these awesome people are going to have to take jobs that don't really exercise their skills.

Cool. Thank you very much.