From 6-Eye Principle to Release at Scale – adidas Digital Tech 2021
From 6-Eye Principle to Release at Scale – adidas Digital Tech 2021
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
Thank you, Peter. Our second presentation today is from adidas. Four years ago, the team from adidas attended this conference, and they left inspired by the presentation from Jason Cox from Disney, and I have been so amazed by the adidas journey since then. Fernando Cornago, who is now VP of Digital Tech, has presented at this conference for the last three years, describing the progression of their incredible journey. Last year, Fernando presented on their response to the global pandemic, which included both cost control, as well as a focus on improving their e-commerce capabilities, which has suddenly become one of their most important channels for generating revenue. This year, he will present on how top business leadership has chosen e-commerce to be a critical part of their strategy, where they are choosing not just to compete, but to dominate. But it is also a story that has some challenges and setbacks.
Describing how they overcame those challenges, Fernando will be presenting with Vikalp Yadav, Senior Director and Head of Digital SRE Operations, and Andreia Otto, Senior Platform Engineer and SRE, in charge of SRE for one of the most critical parts of the adidas commercial ecosystem. Here's Fernando, Vikalp, and Andreia.
Brand Video
This is for all the optimists. All those brave enough to be optimistic. We will not stop until we are done. Rebellious optimism. It's what allows us to see possibilities. Germany are the champions of Europe. Where others only see the impossible. Seeing possibilities is putting spikes on a shoe. Man, they call it- Changing sports and minds forever. Man, I can go all over the world. All over the world. As-salamu alaykum. Seeing possibilities is opening the door for all to play. This is who we are, and we're going to own it. We will never shut up and dribble. Let's do this together, baby. Let's go! Seeing possibilities is doing something that's never been done before, and then going even further. You have to be willing to seek. We're at the very edge of what's possible. Seeing possibilities is saying, "Why can't we leave the planet better than when we found it?" Shaping a better future together.
We're building a whole new mindset of what is possible. Seeing possibilities is taking three bold stripes and making them a cultural icon for all people. So many three stripes. Seeing possibilities. That's when we're at our best. adidas. adidas. We are adidas, and we see possibilities where others see impossible.
Fernando Cornago
Fantastic. Impossible is nothing. And welcome, everyone. I'm blessed to be here for the third year in a row. And as usual, I will start with the progress report. And it's a good timing because we just released our company strategy for 2025. And this year, you will see, is heavily, heavily linked to technology. Afterwards, like every year, we will deep dive in one topic. And after three years talking about growth, DevOps, and freedom, we will tell you the story about outages, control, and release fitness at scale. So my name, for the ones connecting this year for the first time, is Fernando Cornago, and I am, since this month, May 2021, heading up the digital technology department for adidas. After six years in the company foundations, working into the platform, platform engineering, and technology, engineering expansion and talent across our tech hubs. This year, as I said, we deep dive into operations at the scale. And for this, I brought with me Vikalp, the head of digital operations and SRE, and Andreia, one of our super talents and currently in charge of the SRE team in one of our most critical areas, web and mobile services.
That contains some of our most critical and need to be stable value streams, like content and product data management or our checkout API. You all know adidas, but we, as a company, have a revenue of 20 billion euros per year, with a net income, more or less, of 429 million. And we do this thanks to our 62,000 employees around the globe. But, well, let's start from where we're coming from. So we had four record years between 2016 and 2019. And the relentless execution of creating the new, our former strategy, deliver exceptional brand momentum and financial results for us. Then 2020 came and allow us to prove our resiliency as a company in all aspects, not only in technology, and prepare ourselves for the future. Despite, of course, unprecedented challenges coming from COVID-19. It was also a year that accelerated structural trends in our industry, making the sporting goods market even more attractive in the future. But 2020 was especially tough for technology, where we had to strip down our budget by 20%, more or less, while our most demanding area, the mentioned area of direct-to-consumer, digital touchpoints, we experienced the biggest growth in our history, with
53% revenue. So this is an area moving for barely 500 million revenue, in 2015, to four billion in 2020. And needs to go to $9 billion in 2025, being almost half of our business. It was also, last year, a year without a CIO, and that situation forced all the tech management team, including myself, to move towards value-driven conversations with our board members. The final result, even if it was painful, you will see, is, in my opinion, three years of progress into one single natural year. As I said last year, our product domain map and our product areas is the center of every decision that we do. And last year, it was more clear than ever, the differentiation between the red areas, where we win and differentiate from our competition. So this is where our change budget goes, this is where we build unique solutions with our more than 1,500 engineers, with our extended network of partners and vendors, et cetera.
Blue and gray areas, however, are areas where we compete, where we maximize our efficiencies, where our selection typically ranges from renting or buying software, and these help us with regards to time to market and simplifying operations. These also, green areas and green movement, is also my personal journey. So I've been moving, and I'm moving now this month from the foundations of the company to the execution and shaping the future of our digital touchpoints. Definitely, this is the area where we should win as a company. This is our dot-com experiences, this is our different mobile applications, this is our digital retail experiences, and all this is fueled through our membership strategy and, of course, our data coming from consumer analytics. So let's deep dive into our five-year strategy. Own the game, that is full of technology, as you will see. Own the game is rooted in the sport, because sport is at the end our past, our present, and our future.
And we own the game because we have what it takes to control our destiny in a growing and very attractive sporting goods market in the future. We put the consumer at the heart of everything that our people are doing. And we have three strategic focus areas that are increasing our brand credibility, elevating the experience for our customer and consumer, and to push the boundaries on sustainability. Two enablers will set us up for success. Innovation across the entire company, and the acceleration of our digital transformation throughout our entire value chain. But let's focus today only on two things. Our experiences, I told you, is where we win, and our digital. That is how we are going to win there. With regards to experiences, by 2025, our experiences will no longer think in channel. So we move beyond online and offline to own the game by embracing consumer experience as our strategic focus area.
How do we do it? First, we will become a members first company. We will connect with 500 million members through personalized experiences and amazing brand moments. Five hundred million adidas members means that one every four WhatsApp users across the globe is an adidas member. Second, we will evolve our operating model to address consumers more directly. So we move from being a traditional wholesale company into a 50% direct-to-consumer business, and this is changing dramatically our company operating model. So we come from a wholesale model where our customers or wholesalers knew the final consumer, to ourselves knowing the trends and adapting quickly to the trends. We come from a wholesale-based two-year calendar, from ideation to sell to the store, into taking real-time decision, business or tech decisions, into a hype drop or a special sale, and these are decisions that we need to do in a second, or not even in a minute.
Last but not least, our key cities strategy keep being our global amplifier of trends, and therefore, we will expand our portfolio from six key cities into 12. And how to do that? We do it through technology and through digital. And the new strategy changes dramatically our positioning and requires tech repositioning from being a tech service department into a competency and value driver. And this is a journey that luckily we have started five years ago in adidas, in tech. And thanks to this, we are on the table of decision-making. So we are significantly investing in building up our internal tech hubs talent in the seven tech hubs that we build around the globe. And already in 2021, we will hire more than 750 people, increasing our engineering footprint by 50%. To ensure that also these people are driving the highest impact, tech and business are integrated in a product-led operating model, driving end-to-end accountability into the product ownership for the different products. We are also investing in digitalizing the core processes of our company. So by 2025, the vast majority of our net sales should come from products that are created and sold in digitally to our customers. For this, we create capabilities including our 3D creation engine, our digital go-to-market, or a complete reimplementation of our
ERP ecosystem. That's S/4HANA. Last but not least, with a digital first mindset, we need to leverage data and analytics, not only like we are doing in our digital consumer analysis, where we really use advanced analytics for our e-com business. We need to use these capabilities overall, from our end-to-end cycle, from creation, manufacturing, logistic or, of course, sales. But let's deep dive on the people topic. So onboarding 750 people in seven different locations is not easy. We make all the effort for every single engineer that we are hiring to fill a critical role for the company's success. So they need to feel, and you can see in the left side of the screen in this triangle, they need to feel this duality between their product and the value that they are driving and they are bringing to the company, and their capability, their technical capability, their craft, their career. They need to feel that they have a long-term career in adidas because they can develop their skills that at the end is super important for our technical people.
And all of this is empowered by our hub strategy. Our tech hubs cannot be places where we throw simply people to sit at the desk. Our tech hubs, each one of them has a model, has a purpose, and have a vision. And with this, will help not only on hiring but also on retention, on innovation, the sense of belonging, and on performance. Two examples. So we have our Zaragoza tech hub was the first one where I even started working for adidas six years ago and is moving from being the development hub for the company into owning globally some of our platforms, some of our critical services, and is going to be the hub that is taking one step forward into our site reliability engineering capability, and you will see later. Other example is the new tech hub of India, where we are growing 450 people only this year. It's going to be the hub that is going to build or rebuild our future capabilities in dotcom, retail, or even planning.
So what happens then when you invest in technology and you move from being a service department into a competency enabler? What happens when you integrate smoothly with your different businesses? Then our business leaders are proud of us and are proud of tech. And you can see here on the screen the example of the two business leaders that are interacting the most with technology these days, that are Scott Zalaznik, our head of digital, that basically for him, there is no barriers. There is no reporting lines. There is no organization. So we need to be as progressive in tech and data as we are with our digital strategy or our e-com strategy. We need to speak up. He's asking us to speak up. Or Nigel Griffiths, he's the head of sales of the company. He said very bluntly, he's not going anywhere to any meeting. He's not taking any decision without tech on the table. So my little piece of advice to other tech leaders that could be in my place around the globe, invest in what you do better. Invest in engineering, invest in data, invest in creating architectural roadmaps with regards to your products, your team, interaction models, your platforms, build versus buy decisions. Speak up. Show the data and facts for your value stream, lead and flow time, mean time to detect and recover. Be on the table, speak up, but please push the decisions into the product owners, into your business owners. Don't be a service department anymore.
Good. But let's stop talking a little bit about the strategy, and let's get into the story about volume. And yes, you can see on the screen Jesse Owens, and yes, by using Adi Dassler's unique technology with the spikes in his shoes, he was fast. Well, not fast. He was the fastest, right? It's like if you can see a small startup, they really can adopt technology very fast and put it to life. But even Jesse Owens could have never run 400 meters in less than 40 seconds by himself. For that, he needed a team. He needed three other colleagues, and he needed to synchronize very well the handovers in these relay races that he was running in the '30s. You can see this as a scale-up, right? Or also reflecting back is a little bit the journey of adidas in our growth in the digital ecosystem, growing from a bunch of teams to the more than 150, 200 teams that we have right now in the team. But also, the reality is that growth can kill you.
So imagine all these people on the screen crossing the river in the New York City Marathon. If one of them accelerates or jams or stops, 10 others might fall on the floor, and this is basically what happened to us in November 2020. And we call our volume growth and our strategy for this growth mindset 10X. Because why 10X? Because 10X is more or less our growth rate over the last years. Typically, we are growing almost 50% in digital business in revenue. But the reality is that in order to grow this, we bring between two or three more times visitors into our platforms. But the reality is that we store more data, we track more data from them. At the end, this is creating 10 times more technical traffic and technical load in our different platforms. But it's not only technical load and the numbers that you can see on the left side of the screen, it's also our number of teams and capabilities constantly growing.
As well, of course, as the dependencies among them and connections among them, as per Brooks' Law, right? And this is how we got to a point last November where after years of stability, after years of growth and freedom, we got really into a crisis mode. So we got four or five really, really bad outages, four or five outages telling us that we had reached our limit, and we didn't have things under our control, right? And it was in the worst time ever, after already being tired after a year of COVID, and also before our peak sales period. So November and December, we have Cyber Week, we have Black Friday, we have Cyber Monday, 11/11 or 12/12 in China, and we do almost half of our digital revenue every year in these two months. And we had then, with all the pain of my heart after these outages, to bring three VPs in a room to approve every single change or release during that two months. And definitely, if you ask me, it's not the best option by the book. And I was one of them.
I was one of these three VPs in the room, and I can tell you the reality how clueless at the end we were on some of the details. Let's not forget that we have more than 550 million lines of code in our ecosystem, and as I told you, almost 2,000 engineers counting our partners, right? But at the end, this crisis was good because the peak period was a success, and also this crisis was the beginning of a new wave of how we do operations and release management for adidas. And for that, Vikalp and Andreia will give you some details going forward.
Vikalp Yadav
Thank you, Fernando. Hi, all. Have a look at this slide for once. Read through it. Hi, I'm Vikalp Yadav, and I'm part of Fernando's team, currently spearheading digital IT operations and taking it to the next level. Why do you think we had three VPs personally sitting in that room in November and approving each individual change that goes into a productive environment? Yes, we had a challenge, but in this scenario, there is one part that is ingrained in our DNA. Right from the point in time when adidas, that you see here, made that shoe for Jesse Owens, which Fernando showed earlier. The consumer experience has to be as reliable as our products, and that's the motto, that's the main element of the DNA that we drive. In order to do so at adidas, you would look at the volumes at which we are scaling, and at that mind-boggling scale that we operate, it requires a 10X mindset, which is a complete shift from what we are doing right now.
These are some of the statistics that you see from the recent past, where on a peak day, the order rates that we were achieving for a 4 billion shop is at 3,000 orders per minute. Now, just for a second, imagine a scenario what happens that as part of our strategy, when we want to double that order rate. For those consumers that we are trying to reach out to, 168 million as of today, and which Fernando earlier mentioned, we are planning to reach to a 500 million by end of 2025. We are sending out 11 billion touch points every day. Imagine what would happen when we double the scale. We are driving a complete new generation of sneakerheads through our hype drop products. Imagine a scenario, we are shooting for 1.5 million hits per second when we drop those amazing shoes on our sites, and that too, when we are doing just 200 hype per year. Today, when we plan one hype drop per day, what will be the scale that we are looking at?
Ultimately, this leads to us deploying multiple times in a day, making changes, and addressing the experience for the customer. This requires a complete shift on mindset and a journey behind it. And this is a journey that I would like to share with you, how we evolve this journey from a place where we had reliable aspects of our experience and how we made that more stable and resilient. We, like any other organization, when we evolved into the DevOps model, we had angry birds fighting over dev and ops. But at that point in time, we really made that realization that if we want to stand out ahead of our competition, reliability is our key. With that realization, we adopted a bunch of ITIL processes, which involved measuring aspects of stability in terms of how fast am I closing off my P1 interruptions, how do I define my P1s, how do I have my SLAs being put in place, leveraging those best practices of ITIL.
As a result of which, we evolved ourself to a certain aspect, but still, there were some challenges that were brought in. And with the product-led setup, which Fernando had shown earlier in one of his slides, what we also saw that an opportunity where we had to-- The opportunity that we had was to implement some of the SRE practices, the site reliability engineering practices, where we were able to answer three key questions on stability. First question: How do we detect such interruptions that happen on our productive environment as fast as possible? Second question: How do we make sure that when such interruptions happen, that we fix them at a very fast pace? And the first important question of all: How do we make sure that these interruptions don't even land into the productive environment? While answering these questions, we adopted some of the best practices on observability in our SRE operational setup, and we scaled them across these products.
Through this learning, we also got some additional challenges which we could recognize as part of the journey. The first challenge that we saw was that individually, the products were super happy. When you look at the consumer journey, right from the point in time you log in to the point in time you check out, you go through a set of experience across those 53 odd products, which we had shared earlier. And each of the product were performing fantastic when it comes to the mean time to detect, mean time to restore. But the challenge was: How do we connect these experiences across? In a VUCA world, in a complex world, what you're seeing is that interruptions are not individually into one product, but they are connected across. And that's where we realized that everything is connected. In order to do that, there was a second realization that we also realized. That we need to think about value stream as a thought process.
How does it matter for us? For example, imagine an outage in a very small organization, in a very small business organization, like something in Brazil or a small market. What we observe is that the interruptions are lower in values, even though they are slightly longer in duration. Vis-a-vis interruptions happening in bigger markets, like in Europe, the interruptions are smaller in nature, but the amount of hit that we get just because of that mean time to restore is very high. And these two things we brought together and put ourselves in the shoe of our business stakeholders. And that's where we invented the KPI of percentage revenue bleed versus net sales. So the learning of everything connected in the form of value stream and the impact of that value stream being measured with a KPI, percent revenue bleed versus net sales, ultimately led us to the key three pillars of success while we are answering the questions.
Observability. The key motto there is: How do we detect interruptions before they happen, leveraging the principles of AIOps? Resilience. How do we make sure that we have the least amount of friction in the consumer experience and make their experience more stable? And last but not the least, release excellence. How do we ensure that our products buy the freedom of releasing new features into the environment? And this is exactly what Andrea is going to talk about in the next segment.
Andreia Otto
Thank you, Vikalp. So hello, everyone. I'm Andreia, an SRE lead in web and mobile services, as Fernando mentioned before. And now, I will walk you through the release fitness concept. So what is it about, how we use it, and why we need it. Right? But first, I want to just share this big picture with you. This is our web and mobile area. As you can see, we have many different teams, many different services. And for the last couple of years, we grew a lot. So our environment became huge, and the complexity grew as well. So this is not only about how my system is, but also how is the whole ecosystem. So it might be that my application will impact some other applications, some other applications might impact me, and also something in the environment itself is happening that I should be careful with. So that's why in this big picture, you can see that there are many variables right now. That's why we need the release fitness.
We need to know how is the environment, how is everything before we release. We had some challenges, as Fernando and also Vikalp mentioned before. So from a couple of services in production, we started developing digital products at scale. So then back in November 2020, we had three VPs in a room. Can you imagine how expensive, how time-consuming, and how boring it could be? Definitely something was wrong, and we had to work on the release process. So the release process needed to be standardized so all the product teams could work the same way. So that's what we did. So we worked with product teams and service management team to find what's a good set of KPIs or factors that need to be checked before any release. And as you can see in this picture below is an Excel sheet where we have all the factors, all the checks that the team need to do, and then it's a self-assessment, that at the end you're going to have, it's a go or it's a no-go.
But as you can see, it's manual, right? And we have that amount of product teams, and we want to be agile, so we want to deploy as much as we can, if possible, daily, if possible, multiple times daily. And can you imagine how the teams reacted when they received this spreadsheet, and they were told that they needed to fill that every time before a release? Of course, it was not the best reception. So that's why release fitness came into place. What if we can automate all those checks, and at the end, we have a single signal, go or no go? So we can remove all the communication, all the manual process, and try to get all this information, all those KPIs and factors, we can bring them automatically. And it's exactly what we did. It's exactly the release fitness concept. It's a release based on KPIs. What we have is a unique signal based on a set of KPIs. And we see that from three different angles.
So we have the system level. That's how my product is. We have the value stream. So value stream, one example at adidas is the checkout flow. For instance, for checkout, you have different product teams working for the whole value stream. So we have the front end. We have checkout API. That's where I am particularly. We have another back-end service. We have payments. So we have the whole value stream and other dependencies, of course. And we also have the environment. So for the environment, we have all the platforms. We have all the events that might be happening during that time. So a combination of those three will give me go or no go. We can do an analogy with our bonus and company bonus, for instance. We have how I am doing, how my team is doing. We also have how my company is doing, and we also have how the industry is going. So then a combination of those three different angles will give us one single go.
And for that, we have on product level. So on product system level, what are we measuring? It's not only one KPI, but it is a set of KPI. So we have the error budget. Error budget is one KPI that we see that's very well-known in the SRE community. So it's basically defined with the service level objective and service level indicators. For instance, if we going to take availability as an indicator, and depending on the criticality of my system, I can define that. I want to have, for example, availability of 99.9%. It's okay for my service. So it means that I have 0.1 of expected unavailability, which is okay to be unavailable for 0.1%. And with this error budget, it's a very special measurement that we can go from the product level to the whole value stream. And then we have CI/CD, for instance. How is my application code? How are my quality gates in the code? We have QA KPIs, for instance, functional testing coverage, defect detection percentage. We also have blocker issues.
Is there any blocker, anything that the QA team found that will block me to release? So everything is automated and nobody needs to check with different teams, with different tools. So it's all in the same dashboard. We have the value stream, as I explained it before. So we have different systems, downstreams, upstream systems that might depend on me or that I depend on. So they all need to be checked. And for the environment as well. For the environment, we have two key aspects. So we have the platforms. So the technical platforms is my Kubernetes cluster health, is my Jenkins health and all the other, and maybe some service that I use from AWS. Is it all up and running so I can use? And we have one very important and particular for adidas, that's the releases of the day. As of now, you already know that our environment is very big, and it might be that some event that's happening might impact my application. So for instance, there is one event we call as hype sales. And during hype sales, our system is expected to receive lots of load.
So it's one time that we don't want to add another variable to that. So we need to make sure that nothing will change during that time. So all this communication is automated, and the ultimate goal is that we have this single signal that is validated during the promote pipeline. So before I release anything to production, I have this check automatically done, and if everything is fine, it goes to production. If it's not fine, we'll have to stop. We block the build, and then the team that is responsible for the release will check the dashboard, and we see exactly what's going on. It might be there is an event going on. It might be that I don't have error budget, or it might be anything else. So the next steps is to onboard all the teams and have the whole value stream regulated with error budget. With that, I can hand over to Fernando for the closing part.
Fernando Cornago
Thanks a lot, Vikalp and Andreia. So this is fantastic and definitely taking us to the next level. I personally love systems that self-adjust and self-regulate, like in this case. So on one hand, we have strict release guidelines, and on the other hand, having automated business checks and error budgets telling every engineer if they can or cannot deploy. And this is especially important on systems as big as our ecosystem with more than 1,500 developers putting code into production. And I just want to end with a reflection that Gene Kim asked me the other day. So why adidas keeps coming every year to the DevOps Enterprise if we are already at the forefront? So for me, it's definitely reflecting. It's helping me to reflect the year successes, the year progress, the year failures, and also learning by teaching others the things that we do well. But it's definitely learning from others, things that they do better. There's always people doing things better than you. Let's accept that, right?
It's especially also being close to the community, people like Jason Cox, Jonathan Smart, Matthew Skelton, Mik Kersten. So our colleagues from Jaguar, from BMW, learning from them. So it's not only on the DevOps Enterprise. So whenever we have a question, we reach each other, and it's really a very enriching experience always. And new people, new companies, small startups, small digital companies, or big legacy enterprises are reaching you after watching your talk and asking you for one topic or the other. And last but not least, on top, I also get access and have the luxury of being an early reviewer of some of the nicest books of IT Revolution, which are fantastic. So in essence, I will be here next year again, and I hope that in person, in a big venue, and not remotely, because I hope that this effing shitty virus disappears from our lives and we can go together in a community face-to-face. Thank you.