How Architecture Drives Value

Log in to watch

Las Vegas 2022

How Architecture Drives Value

Randy Shoup

Engineering Leader

Leadership Talks

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

So the next speaker is Randy Shoup, whose work I've admired for almost a decade. I met him at the Jez Humble SoCal conference in 2012, and it's difficult to overstate just how much I've learned from him.

Randy Shoup is actually the most cited person in The DevOps Handbook, which included references to his work as the engineering director of App Engine at Google and as chief engineer at eBay over a decade ago. During the pandemic, he presented this amazing talk on his return to eBay, when he was VP of engineering and Chief Architect, describing how he was increasing productivity at an engineering organization that includes over 4,000 engineers, along with his colleague Mark Weinberg, who is VP of Core Product Engineering.

Over the years Randy helped me understand why architecture is so important, and I've asked him to share those lessons with you as well. Here's Randy.

Randy Shoup

All right. Hi, I'm Randy Shoup, and as Gene just told you, I want to talk to you about how architecture drives value.

My background, as Gene quickly summarized: I spent about a decade at eBay, back from 2004 to 2011, mostly as an individual contributor working on the search engine infrastructure. I spent some time at Google running engineering for Google App Engine, and I'll tell some stories from there. Then I was VP of engineering at Stitch Fix, leading engineering up to and through the IPO, and then I went to WeWork as VP of engineering and up to and through the not-IPO. Then, most recently, for the last two years, as Gene mentioned, I was back at eBay as Chief Architect and VP of engineering, which I recently left, and we'll talk about that.

I want to start by talking about a couple of architecture stories so you can see where things are important from that. Then we'll talk about some reasons why architecture matters all the way through, and maybe end with some ideas about how leadership can be helpful in this area.

I want to start by telling the eBay story. eBay started, famously, in 1995. The founder, Pierre Omidyar, was playing around over a three-day weekend, the Labor Day weekend almost exactly 27 years ago now, with this new cool thing called the Web, and he built the thing that later became eBay. It was built in Perl, monolithic Perl. Every item was a file. I always imagine it lived on this little 486 tower underneath his desk in Redwood City. It didn't scale, but it wasn't intended to.

The next generation of eBay, which they cleverly called V2, was a monolithic C++ DLL. It was an ISAPI DLL, an Internet Information Server plugin from Microsoft. It grew, at its worst, to 3.4 million lines of code in that single DLL. They were hitting compiler limits on the number of methods per class, which I'm ashamed to tell you I know that limit of the Microsoft compiler of that age, which was 16K methods in that one class.

You can imagine how horrible it was for developers to work in there. It wasn't just that they were all working in the same repo. It wasn't just that they were all working in the same file. They were actually all working in the same class. And this was like a thousand engineers for this fast-growing Internet company. That was bad.

So they did a migration to what they called V3. It wasn't really microservices yet, but it was Java mini-applications: 200 different applications, each of which took a part of eBay's site. There was the application for the search pages and the selling pages and the buying pages and the payment pages, et cetera. Those Java applications were backed by a sea of shared databases that were shared among a bunch of those different applications.

In 2012 they started moving to something that was more what we would now call microservices, and then there's also been another iteration of that microservices thing. If you look back on eBay's stock price, it was exactly those years after that V3 migration that eBay really started to take off. So architecture really mattered for the history of eBay.

Amazon has gone through a similar evolution, and I just learned last week a lot more scary stuff about this Obidos application. Amazon started very similarly, same time and same kind of ideas as eBay. It was a monolithic Perl and Mason front end over a C back end. It was a four-gigabyte application inside a 32-bit, four-gigabyte address space, so the application didn't even all fit in main memory. They were regularly breaking the new linker every time they were trying to build new stuff. They were having to restart every 100 to 200 requests because it leaked memory so effectively, and because it was so painful to work in this environment, they ended up releasing once per quarter.

The very famous migration to service-oriented architecture started then. From 2001 to 2005, Amazon basically retooled its entire infrastructure around service-oriented architecture. They built services in C++, Java, and a bunch of other languages, and they had no shared databases. There was a famous dictate from Jeff Bezos saying services can only communicate through public interfaces. There's no backdoor ways, no secret ways of getting at it.

If you know the history of Amazon well, after 2005 when they were done with this, 2006 was when they started releasing the first services in AWS and they started their inexorable takeover of the world's online retail and a lot of our software as well. So again, if you ever think that architecture doesn't matter, this Amazon story really shows that it does.

Okay, but why does it matter? First, I want to talk about how a good architecture provides boundaries and constraints for the development teams that work in it. Next, I want to talk about how it enables a fast flow of change for the organization. Last, since this is a leadership talk, I want to talk about developing and maintaining an architectural sensibility as a leader in an engineering organization.

Let's talk first about boundaries and constraints. Before we do that, though, I just want to say the mainline, table-stakes purpose of an architecture is to provide these fundamentals of the organization. Lots of people will have slightly different ones. These are mine: I think we should do correct work. I think we should perform well and scale, and be reliable and operable, et cetera. This is the table stake. If an architecture doesn't do this, we should go back and start from scratch.

But something that an architecture will do that provides more value in terms of boundaries and constraints is by providing a common platform, and you've seen that theme through many of the talks here today already. That common platform, in most places, is going to represent shared infrastructure: compute, storage, data storage, maybe eventing.

That common platform is going to improve the developer experience: source control systems, development and testing environments, continuous delivery pipelines for deploying things to production. It's going to provide common capabilities either in the form of libraries or shared services. It's also going to provide standard frameworks. When I'm building a new service, I would have a chassis, maybe, that would make it easy for me to hook up to the monitoring system and be deployable, and then standardize, or at least have some small number of, communication protocols and data formats to share data between the services and applications.

Again, that's sort of table stakes and obvious. Now I want to take a page out of the Team Topologies book. Mick mentioned Team Topologies a moment ago, which taught us a great way of talking about teams. I think the other really great lesson from the Team Topologies book is it taught us how important cognitive load is for teams in development organizations.

In an effective architecture, the architecture is going to bound and limit the cognitive load on the individual teams working in it. In an ideal world, and I've lived in this world at some places, the team is mainly thinking about their own domain problem and it's not thinking about extraneous or accidental complexity from other areas of the system. Local changes that are local to that team have localized effects, and you can reason about them. When I'm making new changes and trying to add new features, it's pretty clear where those changes go, and then I have a limited number of architectural patterns that we've all agreed to as an organization.

By contrast, an ineffective architecture is where the team can't just live in their own domain world. They have to consider the entire system because God knows what they might break. Local changes have unexpected remote effects that are not easily predictable, or at least not obvious or explicit. It's maybe hard to find what and where to change to get the work done, and maybe I need to change many components across lots of areas of the system in order to get my feature done. And by contrast to a limited number of well-understood, well-documented architectural patterns, we have a bunch of snowflake systems where it's random for each system.

The other aspect of architecture that I want to talk about in terms of constraints and boundaries is that word governance. It always makes me shiver a little bit when I say this word because I really don't like it. Usually it implies something I don't want.

In an effective architecture, governance in quotes is more like enablement. We want to prioritize enablement and knowledge sharing. How can we help individual development teams be more productive and get their work done more quickly and more simply? That help comes in the form of discussion and advice. Central people, to the extent that we have them, are offering help to development teams to make them better.

The best way that governance and standards happen in the most common case is through tools and libraries and standards. Here's code. Here's a mechanism that you can do in self-service to get what you want to get done done. Those tools and libraries and standards that come from this organization are strictly better than any team's other alternatives, which would be building or buying or borrowing.

By contrast, an ineffective architecture, instead of being enabling and doing knowledge sharing, is about approval boards. I've lived in this world at old eBay, and that was pretty bad. Instead of discussion and advice, suggestions come as a mandate, centralized one-way pronouncements. Instead of having these super experienced engineers and architects working on tools and libraries and standards, they are in their ivory tower producing documents and PowerPoints. Those standards are enforced through a mandate rather than just being the better thing to use.

Okay, so we talked about boundaries and constraints. Now I want to talk about the fast flow of change. I'm going to take a page out of the Team Topologies book. It has this wonderful framing of a stream-aligned team. You can think of an application team or a feature team that's providing direct value to the customer or the business, and those teams in a good architecture are aligned directly around a customer and business problem. It's very clear what boundaries they have, and they have very clear, very explicit interactions between their service or services and the other service and services on which they depend.

One of those teams is going to be building a single service or application, or a set of maybe related services or applications. That team can design, can develop, can deploy, can operate its services and applications independently of other teams. In an ideal world, that team owns those services end-to-end, from cradle to grave.

The way I like to say it, as an engineering leader and as, most recently, leader of architecture, is ideally we want 80% of the work that the team does to be within that team boundary, and they don't need to have fine-grained coordination with other teams.

The overall organization with a good architecture is going to look like this: small individual teams that are organized around a domain, have all the skill sets that they need to do their work, and have very fast feedback loops in order to make forward progress and generate a fast flow of change.

When I'm making those individual changes in the team, a good architecture can help as well. In an effective architecture, as I'm making my individual changes, those changes are testable and deployable in isolation. I don't have to coordinate with other teams in order to test and deploy. Those changes are forward and backward compatible, whether we do that with feature flags or dark launches or a bunch of other techniques. Those changes are reversible, so in case something bad goes wrong, because software is hard, we can roll things back relatively straightforwardly. As a consequence and as a goal, we're able to produce small iterative units of work that continue to move forward.

In contrast, an ineffective architecture is going to be hard to test and deploy. I'm going to have to do coordinated releases because I have a distributed monolith between lots of different services and applications that all have to deploy as one big unit. It's hard to undo the changes that we make, and they're doing large batches of work that then come out in large batches.

The last thing I want to talk about here is, as a leader, particularly leader of engineering organizations as most of us are, trying to develop an architectural sensibility. Here I want to take a page out of Ron Westrum's presentations over the last several DevOps Enterprise Summits. I love this idea that he shares around a technological maestro. If there's anything I'd love to be in my job, it's this.

As defined, a technological maestro is someone who comes in with high energy. That person asks the right questions of the teams involved. That person has high standards, and that person is good not only on the large but also on the details.

As Gene and I were chatting a couple weeks ago when he asked me to do this plenary, we started brainstorming around the idea of what does it really mean to be a technological maestro, and we came upon this idea that really it means sociotechnical systems thinking.

What do I mean by sociotechnical systems thinking? You need to be able to understand the entire system, the overall system and how it behaves in the large. We need to understand the individual components, what their responsibilities are, what their interfaces are to other components, and how they behave. We want to focus on those interfaces, responsibilities, and interactions when we want to make changes in the system. We make changes locally in individual components, and then they have system-wide effects. Because it's a complex system, we want to notice and exploit both reinforcing and balancing feedback loops. Then I think this is critical as somebody who has gray hair or no hair or both: using the previous experience that we've developed over our years in the industry to notice patterns and anti-patterns so we can make good suggestions to development teams.

Along these lines, I want to talk very briefly about the work that I talked about in great detail last year at the DevOps Enterprise Summit 2021, taking this idea of an architectural sensibility and sociotechnical systems thinking. When I came in as Chief Architect at eBay two years ago, you would think that the first thing I would do is improve the architecture, and there are a bunch of aspects about the architecture that do need to be improved. But instead, what I did with the help of a bunch of my colleagues, some of which are here, is we did a value stream map with a couple of selected teams and tried to understand what their issues were.

It turned out resoundingly that the big issue, the big bottleneck for eBay at the time and now, is software delivery. So we spun up an initiative to improve software delivery across the company. We were working very closely and collaboratively with those individual teams to identify and then remove bottlenecks from them. We would often ask them the question, what would it take to deploy your application every day? They would give us a big long list of all the things that they needed to do, and because I was VP of platform, we could say, great, you just gave the platform team our backlog.

We were able to double engineering productivity, as Gene mentioned. I'm really proud of this, actually. We moved the DORA metrics way beyond that. Deployment frequency and lead time moved 5x. Change failure rate and mean time to restore, even though we weren't focusing on them, we were only focusing on the speed side, as the State of DevOps Reports research would predict, got higher quality as well.

The reason why we did this, again taking this sociotechnical systems thinking approach, is that the reason why we started focusing on the software delivery area is because it was the prerequisite for making large-scale architectural and other behavioral changes in the system. That's what we need to have as an engineering leader in one of these organizations.

As I was leaving a couple months ago, one of my directors said, a little bit ruefully, you're the first VP who's ever cared about the details. That's both sad and nice. I don't know.

Okay, so we talked about boundaries and constraints. We talked about a fast flow of change, and we talked about developing an architectural sensibility as an engineering leader.

As I mentioned, I don't work at eBay anymore, so Gene strongly encouraged me to say here's the gig I'm looking for. The gig I'm looking for is in a growing company. I really want to be proud of the organization and my contribution to it. I want to be able to make a big difference in the organization. That's the kind of thing that really gives me a lot of energy in my career. Also, just like Courtney, I want to work with people I like, respect, and trust.

Thank you very much. I hope you have a great conference.