Log in to watch

Log in or create a free account to watch this video.

Log in
Las Vegas 2024
Share
Download slides

Renovate to Innovate: Fundamentals of Transforming Legacy Architecture (NETFLIX)

Renovating old buildings and homes is commonplace, but why is technological renovation often overlooked? Just like a big home renovation adds to the quality of life, a successful architectural renovation has an outsized impact on the pace of innovation. Yet, why are software migrations perceived negatively? Frequently, it stems from past encounters with projects that were disruptive, costly, and executed poorly. In this talk, I outline my learnings on gracefully outgrowing technology and architectural choices, based on my experience scaling payment orchestration at Netflix to 250M members and preparing for the next 250M. You'll leave this session equipped with cognitive frameworks for evaluating architectural health and tactics to overcome common hurdles to transforming legacy architecture. I share battle-tested strategies for successfully navigating a ground-up architectural revamp to unlock innovation and enhance business value.

Chapters

Full transcript

The complete talk, organized by section.

Rashmi Venugopal

Hello everyone. Welcome to my talk, "Renovate to Innovate: The Fundamentals of Transforming Legacy Architecture." I'm excited to be speaking here today at the Enterprise Technology Leadership Summit. So thank you for having me.

Just a little bit about myself first. I'm Rashmi Venugopal. I'm a Senior Software Engineer at Netflix, and my team focuses on enabling seamless payment experiences for anyone signing up for our services. I've worked with Microsoft and Uber in the past, and as a software engineer, I spent the last decade building and operating reliable distributed systems at scale.

Agenda

Okay, with my intro out of the way, let's dive right in. We will begin with what legacy means in the context of this talk. Next, we'll talk about why systems become legacy in the first place. We'll review what a technical renovation is and when to apply one. We'll wrap up with guiding principles and best practices for technical renovation. This is also where we will be spending the majority of our time today.

What does "legacy" mean?

What's the first thing that comes to mind when you hear the word "legacy"? Can I get a quick shout-out? Old. Old. Outdated. Don't touch it. I love that. Let's see how y'all match up with OpenAI's word cloud for the term legacy. Funny enough, I don't see "old"… oh yes, it's here. Old is on… oh, right there. Okay, I should know my slides better.

As you can tell, the term legacy is overloaded — so let's spend a couple of minutes getting on the same page about what I mean by legacy systems.

I define a software system as legacy, or on its way to becoming legacy, if it hinders the organization's ability to meet and keep up with the business requirements. Let's make that definition more concrete and talk through some telltale signs of systems that are unable to keep up with business requirements.

Decrease in innovation velocity. As your systems get more and more complex, a simple change can take a disproportionate amount of time. Product and project managers expect productivity to scale linearly. Engineers know better. We're close to the ground, and our past experience has primed us to be more pragmatic. It is a sign of a legacy system when the reality of how long it takes to roll out a change far exceeds expectation in the negative way, and this is driven by complexity.

There are numerous sources of complexity in software engineering. Engineers that work across multiple teams are slowed down by the coordination tax that they have to pay. A lack of established boundaries between teams makes it difficult to isolate changes and roll them out quickly. Insufficient investments in observability and testing automation can lead to high operational costs just to maintain the existing features. To recap: there are various sources of complexity, and as it increases, developer productivity and innovation velocity go down.

Degradation of quality of experience. Another sign of a legacy system is when the quality of experience starts to degrade. QoE measures the overall satisfaction of an end user when they interact with the system. We've all experienced the very real frustration of waiting many seconds for a page to load. Amazon has an infamous study that quantifies the impact of latency on their business. They found that every 100-millisecond increase in latency impacts their sales by 1%. So a dip in quality of experience, despite our best efforts to tune them, is a sign of a legacy system.

Scaling challenges. Another signal is when it becomes consistently difficult to scale your software for increased user load. As your business evolves, you will find that the software infrastructure set up to serve a couple thousand users doesn't scale well for a million users, which again doesn't scale well for a billion users. So a system that is unable to keep up with an increase in user volume or data load is on its way to becoming legacy.

In summary, it's a sign of a legacy system when it is consistently difficult to add new features or even make changes to the existing ones — both of which are required to keep up with the business needs.

Legacy… but why?

Now that we've covered what I mean by legacy systems, let's talk about why legacy systems exist in the first place.

Advancement in technology. The most obvious reason is the rapid pace at which technology advances today. Who here has used two or more of these devices? Raise your hand at the risk of revealing your age, just like I am. Systems that were once considered cutting-edge struggle to keep up with modern industry standards just a few years down the line.

But in addition to this obvious reason, there are two schools of thought that explain software degradation.

Bit-Rot Theory. The first school of thought is the Bit-Rot Theory. It states that software gradually degrades over time due to changes to itself or its surroundings. Examples: code or feature duplication as your organization grows rapidly; outdated documentation; or loss of institutional knowledge.

Law of Architectural Entropy. The second school of thought is the Law of Architectural Entropy. It states that software systems eventually lose their integrity when features are added without consideration to the architecture, both existing and future. Imagine the growth of an e-commerce company in the early stages. They're focused on establishing a successful business. Evolving the architecture to be perfect is not a priority. Changes to the software architecture are primarily driven by business needs. And in this example, you can see how order history or the recommendation engine have just been added to the existing monolith, steadily increasing the architectural entropy for their systems.

In the real world, software systems are affected by all of these phenomena. This explains why outdated and legacy systems are more commonplace than we'd like them to be.

Do we proactively improve legacy systems?

Now that we've agreed legacy systems are commonplace, let's ask ourselves: do we always proactively improve the state of our software systems? I wish. The inevitability of software degradation on one hand — because of entropy and bit rot — and technical advancements combined with this lack of proactiveness on the other, can leave us with systems that are difficult to understand, maintain, and extend.

Technical Renovation: what and when

That brings us to technical renovation. But what does technical renovation actually mean? I define technical renovation as the act of upgrading or replacing outdated systems and technology to improve the state of your software systems and software architecture.

Often I get asked when I bring up technical renovation: how does refactoring fit into the picture? I view refactoring as organizing my closet. When I refactor my closet, I move things around. I make it easy to access all pieces of my clothing, and sometimes I even get rid of some stuff to make space for new things. It also has a side effect of reminding me what I already have, which influences my future wardrobe investments.

Sticking with the closet analogy, renovating my wardrobe is when you know I break down my walls to go from having a regular closet to a walk-in one. Renovation goes beyond moving things around — it is when you make a drastic change that shakes things up, and the end result gives you capabilities that you didn't have before. Renovation is also usually a larger undertaking and therefore occurs less frequently than refactoring. That said, refactoring is still important. It is a valid strategy to maintain a healthy code base, and there are numerous benefits of maintaining a healthy code base. Disclaimer: don't sleep on an opportunity to refactor.

When does technical renovation become compelling?

Okay, now that we've established what technical renovation is and how it's different from refactoring, let's talk about when technical renovation becomes a compelling approach.

Diverging business needs. Attempting to reuse existing systems to solve for drastically new use cases introduces complexity, and we've already talked about what complexity does to innovation velocity and developer productivity. Let's talk about an example of a business-driven renovation: Netflix evolved from a DVD distribution company to a streaming service. The capabilities required to deliver DVDs is very different from the capabilities required to stream video on demand. So the systems that served Netflix well in the DVD era weren't going to be sufficient to run a successful streaming service. So this drastic change in business needs will eventually call for a technical renovation.

Modernize technology stack. Technical renovation is also a valid strategy for ecosystem-driven needs like modernizing your tech stack. If you're trying to go from having a Python service that hosts REST APIs to having a Java service behind a GraphQL gateway, a renovation is in order. Just refactoring isn't going to cut it.

Accumulation of tech debt. Sometimes software systems turn out to be more successful than anyone imagined they would. While this specifically is a good problem to have, this unexpected longevity also causes significant accumulation of tech debt — and this makes technical renovation a valid approach to improve the state of your software systems.

Guiding principles for technical renovation

Now that we've identified a few scenarios that warrant a technical renovation, let's talk about how to approach one. Today I'd like to share five guiding principles to maximize the success of your technical renovation initiatives.

01Principle 1: "Make it work, make it right, make it fast"

This quote from Kent Beck encapsulates my approach to any new work — not just renovations — and it is my first guiding principle for a good reason. Technical renovation is usually a huge undertaking, which means prioritizing can be a daunting affair. Identifying what we should be doing first and what can be saved for later is a question that has kept me up at night.

Kent suggests that our first priority should be to focus on making it work. Beat the analysis paralysis by doing the simplest thing that just works, because you're out of business if you don't have a system that gets the job done.

Then make it right. This is a good time to prioritize adaptability, extensibility, and readability. But this is also when you have to be mindful to resist the temptation to over-engineer. As a rule of thumb, if you don't regret any of your early decisions, then chances are you over-engineered.

Finally, make it fast. Now is a good time to prioritize performance. Make the necessary tweaks to hit the quality-of-experience metrics that are critical for your business.

This structured approach to tackling the various aspects of a technical renovation breaks up a daunting endeavor into trackable and manageable milestones.

02Principle 2: Evolutionary Architecture

My second guiding principle is evolutionary architecture. Internalize the fact that complex systems cannot be fully designed up front. Building software in a rapidly changing environment comes with many unknowns.

Identify fitness functions. Start by defining the set of fitness functions that represent the desired quality of your end state — such as performance, security, scalability, et cetera. Once you've picked the system qualities that matter the most to you, use them to guide your decision-making. Make trade-offs that align with your priorities.

Continuous delivery and experimentation. Now you know what matters to you, and you're well set up to make trade-offs based on them. Focus on creating an environment and an infrastructure that is conducive to effective execution. Adopt continuous delivery practices to be able to release your changes frequently. Automate the steps between developing a feature and releasing a feature. This helps your team to experiment your way into making changes — and experimenting is key for high innovation velocity.

Incremental changes. Now that you have the infrastructure to release changes quickly, focus on making incremental changes. The big-bang approach to making changes makes it difficult to get things right — especially when something goes sideways, which it will sooner or later. So incremental changes make it easy to course-correct as you go.

The cycle of evolutionary architecture involves making incremental changes, validating your architecture against your fitness functions, releasing changes quickly to experiment, and iterating to course-correct by making more incremental changes. In summary, building an evolvable architecture is critical for the long-term success of your renovation initiative. As requirements grow and expectations evolve, it ensures your software systems adapt in line with the business needs.

03Principle 3: Innovating while Renovating

On to the third guiding principle. No organization or company will stop doing new work. So it's wishful thinking to hope for a dedicated window for renovation that is uninterrupted by any new feature requests. So the ability to maintain feature development velocity in parallel with your renovation initiative becomes a prerequisite for success.

It's equally important to also strike a balance between the extent of overlap between feature innovation and feature migration. Imagine a scenario where you combine the innovation and renovation steps for a large number of components, and you then A/B test your changes. There are two outcomes to an A/B test. One is that it's a success, which is great. But if your A/B test did turn out to be negative, fixes need to be rolled out. To be able to make the fix, you need to identify what is causing the negative results in the first place. Is it the innovation components causing the regression, or are the renovation changes at fault here? Debugging to pinpoint the problematic changes becomes a very painful endeavor when they're tightly entangled. So the takeaway here is: the two-birds-in-one-stone approach is not always appropriate.

On the other end of the spectrum is enforcing each new change to be either an innovation step or a renovation step, but never both. This is not practical. Not only is it expensive to create an environment where only one thing changes at a time, this approach is also very time consuming and has a negative impact on execution velocity.

So it's worth the trouble to sequence innovation and renovation for very business-critical features, but lean into risk and combine both for the less sensitive features in the spirit of increasing the overall innovation velocity. In summary, use good judgment to pick the right extent of overlap that is appropriate for your problem space.

04Principle 4: Deprecation-Driven Development

The fourth guiding principle: deprecation-driven development focuses on what we gain by deprecating, as opposed to what we lose. This approach makes the case that systematically removing obsolete technology is a prerequisite for healthy software engineering lifecycle.

Be honest about components that are better off left in the legacy system because the return on investment doesn't justify the effort that is required to renovate them. These could be features that are on maintenance mode, or features that are not significantly contributing to your business revenue anymore.

Netflix winding down DVD.com is a good example of a decision to deprecate that was based on business trade-offs. As the DVD business continued to shrink, it became increasingly difficult to justify the cost of providing the best service experience for DVD members. Once the long-term strategy for a product is decided, it provides clarity for engineering priorities. We hadn't invested in renovating technology related to DVD in the years leading up to the shutdown.

05Principle 5: Intentional Organization Design

My fifth guiding principle is intentional organization design. Innovation is equal parts discovering new products and iterating on existing ones. An inspiration for discovery or iteration can occur in any part of the company, so making it easy for ideas and inspiration to flow through the organization is important for innovation velocity.

In addition to flow of ideas, organization design also has an impact on software architecture. Conway's Law explains this synergy between the two. It suggests that the way our teams are organized influences the architecture and design of the systems they create. So for an organization that's undertaking a technical renovation initiative, it's a good idea to first take a step back and assess the existing org structure to identify any changes that might be beneficial for streamlining communication and minimizing cross-team coordination.

Let's look at an example that compares two ways of organizing teams. Design A groups engineers based on their function — like front-end, middle tier, back-end, et cetera. Design B organizes engineers based on common deliverables, but as full-stack teams. Each design has its pros and cons. Design A optimizes for engineers to ramp up quickly and provides a space for them to be experts at their craft. But if every new functionality requires making changes in all three parts of the stack, then they will need someone external to coordinate and manage dependencies and align tasks for them. In that case, Design B might be more efficient.

In summary, you have to choose to strengthen the communication paths that are most important for the nature of your organization and its software architecture, because every communication path cannot be the strongest.

Closing: embracing the growth mindset

While this brings us to the end of the guiding principles that I have for us today, I'd like to talk about a critical piece of the puzzle that is important for all the work needed for a technical renovation to actually come together. Embracing the growth mindset.

It starts with having the right perspective: the perspective that legacy systems are more often than not a byproduct of success. So it should come as no surprise when legacy systems emerge as your organization continues to grow and succeed. So building the muscle to work with and renovate legacy systems is imperative for the continued success of your organization.

The ideal approach to tackling renovation is highly context-dependent. Your strategy and decisions should be deliberated on a case-by-case basis to account for the unique challenges, goals, and circumstances that are relevant for you and your organization. So unfortunately, that means that there's no silver bullet — only guiding principles. Rest assured that the path to successfully transforming legacy architecture will be a bumpy and thorny one. So champion the growth mindset by learning from your mistakes, valuing feedback, seeking feedback, and continuously improving throughout your renovation journey.

The principles that I shared today may seem ambitious. It is because they're intentionally aspirational, in the spirit of shooting for the stars and landing on the moon. Thank you all for coming to my talk today and your engagement.