Team Topologies in Action: Early Results from Industry
Since the book Team Topologies was published in 2019, organizations around the world have started to adopt Team Topologies principles and practices like Stream-aligned teams, modern platforms, well-defined team interactions, and team cognitive load as a key driver for fast software delivery and operations.
We will look at examples from the following organizations:
Gjensidige Insurance, a leading Nordic insurance company with 4000 employees and business in the Nordic and Baltic countries, uses the four fundamental team types to clarify team responsibilities and interactions and is moving towards several “thinnest viable platforms” with Stream-aligned teams as internal customers
PureGym is Britain’s largest gym chain - the first to gain over 1 million members. As PureGym expanded, so did the need for software to enable their members to book and manage gym sessions. Since 2019, PureGym has re-aligned its teams and team interactions based on Team Topologies patterns, helping to scale the engineering teams and improve flow.
uSwitch / RVU, one of the UK’s leading consumer price comparison websites, has grown a modern platform from scratch, allowing stream-aligned teams to focus on consumers needs, offloading infrastructure provisioning concerns to the platform which also provides cross-cutting services around scalability, security and data management
Visma is one of the leading software development companies in Europe with nearly 1 million customers in 21 countries. Team Topologies has helped to define and accelerate a transformation begun in 2015 to improve service ownership and speed of changes.
Wealth Wizards is a UK company making financial advice affordable and accessible to everyone through online tools and apps. The engineering division at Wealth Wizards has used the Team Topologies ideas around team cognitive load to help right-size their teams and align teams to the most important flows of business change.
For each of these examples, we explore how the ideas and patterns in Team Topologies were useful to the organization and the results of the changes.
Chapters
Full transcript
The complete talk, organized by section.
Matthew Skelton and Manuel Pais
Matthew Skelton: Hi. Thank you, everyone, for attending this talk on Team Topologies in Action: Early Results from the Industry here at the DevOps Enterprise Summit, London Virtual Edition.
Hi, I am Matthew Skelton. I am one of the co-authors of the book Team Topologies, and I am the founder of Conflux.
Manuel Pais: And I am Manuel Pais. I am an independent consultant and trainer. I am also co-author of the book Team Topologies.
Matthew Skelton: The book came out in September 2019 by IT Revolution Press. You can find it on teamtopologies.com/book. Some of the praise we have received around the book talks about a kind of new digital operating model.
We think the book helps address some problems that we hear about often. Some of the questions we hear are: why is our transformation not achieving the kind of fast flow that we were expecting? What is our purpose as a team and what is our mission within the wider organization? How do we interact with other teams if we do not have clarity around what we are supposed to achieve ourselves as a team?
Another common question is: why are our teams not able to respond quickly to business needs? Here, we are talking not just about new features, but also responding quickly to problems and changing customer needs. How can we safely remove low-level complexity from customer-facing teams in order to make more space and effort available to focus on the actual business problems and solutions?
Today, in Team Topologies in Action, we are going to look at some examples. The book has been around for about nine months since publication, and we have five case studies that we want to share with you.
As a reminder, there are four team types that we talk about in Team Topologies: stream-aligned teams, enabling teams, complicated subsystem teams, and platform teams. There are also three core interaction patterns: collaboration, X-as-a-service, and facilitating. You will find these shapes represented in the diagrams of the different case studies we are going to see.
This is an example diagram, a snapshot in time. Crucially, you should look at this as a flow of change being represented from left to right.
Matthew Skelton
Matthew Skelton: The first case study is from Gjensidige. This is an insurance provider founded in 1847, headquartered in Norway. They have a strong presence in Norway and the Baltics. They have 4,000 employees, and they have been on a cloud and DevOps transformation from 2015 to the present day.
Back in 2015, software delivery there was quite project-based, quite waterfall, so quite slow and not particularly responsive. They knew they needed to change. They have been on a journey since then, and a key aspect of their transformation was stream-aligned teams: teams with multidisciplinary skills that focus on a particular business product or service. They have responsibility for the full end-to-end lifecycle of that product or service. They have KPIs on business outcomes and development speed, quality of operation, operability, and security, so everything to do with that product or service.
These teams are supported by enabling teams in architecture and information security. There is a complicated subsystem team around the core mainframe system, obviously quite a lot older, so specialist skills and so on. Then there are platform teams, multiple platforms in fact: CRM, infrastructure, application analytics, and also a design platform.
The results of this way of working have seen, over the last five years, a 40% growth in digital sales for Gjensidige. They have seen a doubling of the amount of customer service handled on digital channels, and claims handling is now 80% online, with 40% automatically resolved. Those are great results for Gjensidige, and we would like to thank Christian Moe, the chief digital officer, for sharing that with us today.
The second case study is from PureGym. PureGym is the UKs largest gym operator, with more than one million members and 230 gyms throughout the UK. They were launched in 2009 and are currently expanding into other countries in Europe. They have recently provided details of how they are going to keep their gyms safe for people in a pandemic COVID-19 context, with separation that you can see in the photograph they have released recently.
Since 2015, there has been huge growth in numbers, with more than a million members now, particularly people joining using a mobile app, but also payments and bookings and so on. The diagrams in this case study come directly from PureGym, so this is exactly how they see it.
Back in 2015, there were fewer than 10 people and a pretty straightforward way of building software. By 2017, the team had grown to 15 people and there was quite a lot more work. At that time, it was still a case of defining some projects and then passing that work to a business-as-usual, BAU, team to run it and find bugs and things.
By 2019, the team had grown to 40 people and they were starting to see problems, particularly with handing over these projects into the BAU and run teams. It was difficult to resolve issues in the live environments. Inter-team communication became a problem. There was too much specialist knowledge in different areas, so there was a real awareness of a need to change.
The software was arranged as a monolith. This made it difficult. Teams were working on different services or different parts of the codebase, but it was still inside a single code repository, so there was quite a lot of crossover and difficulty separating different concerns for different teams.
Later in 2019, there was an attempt to break up this monolithic codebase into different areas. It took around three months to do this design, working out which areas of the website were distinct from the point of view of people using those features and from the point of view of what the business needed.
Early in 2020, the teams at PureGym looked at the Team Topologies book and found really useful patterns, terminology, phrasing, and so on to help them work out what kind of behaviors and responsibilities different teams should have. For example, they realized that what they called their Membership Management Gateway, MMG, actually really was a platform, and it should behave as a platform because it served many other teams and helped to accelerate delivery. They realized that their developer experience team and their site reliability team should be enabling teams. They had many other teams that were best suited to the stream-aligned type of team.
Having made these decisions, they redefined the responsibility boundaries for these teams. What happens next? How do we actually make this work? There was a very high collaboration phase across multiple different teams. The developer experience team, site reliability team, platform team, and mobile team collaborated very closely with the stream-aligned teams to work out where the boundaries should be and what kind of responsibility boundaries needed to be in place.
After that, it was possible to start thinking about the kind of services that should be provided by the platform. The teams started to have an X-as-a-service relationship, particularly with the platform. But there were still the two enabling teams, developer experience and site reliability, facilitating the understanding of multiple stream-aligned teams inside the organization to help them get up to speed with new ways of working, new technologies, and new practices.
This is a snapshot of how things look today. It does not always look exactly like this, but the point of the diagram is to emphasize that there is some collaboration happening in certain areas and some facilitation happening in other areas, depending on what exactly is happening at the moment. The idea is that there are multiple different kinds of interactions between different teams, and these are clearly and very specifically defined. We understand what we are trying to achieve by a particular kind of team interaction.
What is interesting at the top of the diagram is that the mobile team does not quite fit into these team types. They are still getting a feel for how the user experience and the kind of product features that the mobile app provides should be managed. That is an ongoing conversation, and that is a good thing because they are using the Team Topologies interaction patterns and sensing mechanisms to try to work out how that team should behave.
John Kilmister, principal software architect at PureGym, says: Team Topologies helped us at PureGym to evaluate the relationship between our teams and the business strategy, to increase team efficiency and evolve away from a monolith.
The results are that the technology teams are more business responsive. They have gone from projects and BAU to stream-aligned teams principally, and that means separate services can evolve at different rates. Particularly the joining service needs to have many more deployments than other services, and that has enabled the technology to be more business responsive.
There is more balance in the ownership of services. Team morale is higher. They have improved team morale, and the architecture is much better in terms of the long-term evolvability of the platform. I would like to say thank you to John Kilmister and Rich Allen, who have both been very key to this transformation and shared their experience and learning for this case study today.
Manuel Pais
Manuel Pais: The next case study is from uSwitch. uSwitch is the leading comparison and switching service in the UK, helping customers switch between different providers of gas, electricity, broadband, and so on. They have had quite an interesting journey in the last 20 years. There are two points that are interesting to look at. Around 2010, they started introducing autonomous teams, moving away from generic engineering teams and organizing with teams that had more freedom and were aligned to some stream of work. In 2017, another important change around team organization had to do with the introduction of a platform.
For context, back in 2015, they had stabilized around having these autonomous stream-aligned teams on things like energy and broadband, different services they provide to end customers. These were teams with autonomy to decide essentially almost everything they needed for software delivery and operations of their service. This meant they were dealing with a lot of the infrastructure parts. The only constraint was that everyone was using AWS, but each team had the autonomy to decide how they wanted to set up their accounts, which services from AWS they wanted to use, and so on.
The interesting thing about the graph is that it shows, for a period of two years, the number of direct calls to AWS APIs from the different services at uSwitch. During this two-year period, they noticed that at least half of the teams had doubled the effort they were spending on a regular basis around infrastructure management, those kinds of services, and managing the usage of AWS services.
As Paul Ingles, now CTO at uSwitch, said at the time, people were spending more time having to interact with relatively low-level services, thus spending their time on relatively low-value decisions, as opposed to spending more time focusing on business problems and solutions. This is when they decided to introduce a platform, and essentially change the way the teams were organized in order to benefit from the platform services.
They did this in a very product-oriented way, thinking about the platform as a product and therefore thinking about how to engage with their internal customers, the stream-aligned teams. One thing they did was identify their first customers: the stream-aligned teams that were having more pains around managing infrastructure, and some teams that were lagging behind a little bit on practices like centralized logging, metrics, autoscaling, et cetera. Very quickly, this new platform was adopted by those teams, one team in particular that benefited greatly and collaborated very closely with the platform.
As they wanted to grow the platform in a non-mandatory way, by having stream-aligned teams adopt the platform because they saw the value, they looked at metrics that helped them understand how effectively they were able to develop the right product in terms of platform. They looked at how many teams were adopting the platform, how many applications were using platform services, and even the traffic, how much of the interactions with AWS were going via the platform services or still directly using AWS APIs.
From 2017 to 2018, they realized more and more teams were using the platform. By early 2018, they realized they could start using more well-defined SLAs and SLOs, so the teams understood exactly what level of reliability, latency, and so on they would get from the platform. By late 2018 and early 2019, traffic against the platform was reaching the same levels as direct AWS API traffic.
In 2019, uSwitch was able to use the platform approach to address critical cross-cutting needs across multiple teams, including GDPR, more awareness about security, consistent alerts, and SLOs as a service.
But there were still some teams that were not adopting the platform, for a good reason. One team had the highest maturity around engineering and brought in a substantial part of the revenue for the organization. They were reticent to adopt the platform because they knew their success with end customers depended strongly on the performance and reliability of the service they provided. If they would now depend on the platform, they were not sure whether they would meet those same requirements around performance, scalability, and reliability.
The platform team had to start measuring and demonstrating their own reliability, performance, and scalability numbers. They did that by identifying their SLOs, service level objectives, and creating dashboards visible to all the other teams. They even created a service in the platform that provided these dashboards to anyone who wanted to create their own SLOs and manage that. Once they proved the platform was reliable and could provide the same levels of expectation as the stream teams had, those last teams also adopted the platform.
uSwitch is still relatively small, so we can have this broad view of their teams and interactions. But again, it is more important to look at the snapshot in time and understand what is happening at this moment, what we are trying to do, and how we should evolve. The diagram shows the current situation, more or less, where the cloud infrastructure team is working on a canary deployment service and collaborating with some stream-aligned teams to understand how that service should look. Meanwhile, they have other services that are consumed on a regular basis by all teams.
On the left, there is also an SRE team working as an enabling team. Because this team is quite small, only two people, they will at most be facilitating a couple of stream-aligned teams to understand the performance and reliability of their services. Interestingly, this SRE team works as a sensing mechanism, identifying problems common to several teams that might be good candidates for the platform to provide a service around and solve for multiple teams.
Another snapshot in another part of the organization shows different types of platforms. They have an affiliate marketing platform, which is not exactly, or not specific to, traditional IT services. It took a while for them to identify the good boundaries around this marketing platform and what should be provided by the platform versus what should be the responsibility of stream-aligned teams. But once they figured that out, this proved to be very effective.
Paul Ingles says that the engineering principles that guided the way they organized teams were loosely coupled and highly cohesive teams. With Team Topologies, the great thing is to tie a lot of these ideas together and, most importantly, give them some common language.
Some results uSwitch achieved include a curated platform experience from understanding what different teams actually need from the platform; reduced complexity, especially around infrastructure, so teams can free up more time to focus on the business; and addressing cross-team needs that perhaps no individual team would put in the effort to solve elegantly, but the platform team has the capability to solve for many teams.
They moved away a little from the idea of fully autonomous teams, understanding that having a dependency on platform services is okay as long as the team is still self-sufficient. They can self-serve and not be dependent or blocked in their flow of work. The patterns they started applying within IT were useful beyond IT, as in the marketing platform example. Along all this time, they have been trying and successfully balancing achieving fast flow with sustainable reliability and operations. To learn more about this case study, go to teamtopologies.com/examples. We want to thank Paul Ingles and Tom Booth for providing the information in this case study.
Matthew Skelton
Matthew Skelton: The next case study is from Visma. Visma is a software solutions provider based in Norway. They provide software in the accounting, ERP, and HR spaces for companies. They have 1 million customers, mostly in Europe and Latin America, and around 11,000 employees.
Before 2015, they had very large software releases about every two months. This made it very difficult to be nimble and respond to market needs and customer needs. In 2015, they realized they needed to make some changes, and one of the sets of changes they made was in how their teams were organized. They adopted what they called service delivery teams, which had end-to-end responsibility for a product or a service or some part of a product or service.
As their thinking and the shape of the organization evolved, they were reading books like Continuous Delivery, The DevOps Handbook, and Accelerate. In September 2019, Team Topologies was published, and they looked at the Team Topologies book. One particular thing they took from the book was around cognitive load, team cognitive load as a key design principle for thinking about the responsibility boundaries of teams.
The chief cloud architect at Visma, Tinius Alexander Lystad, puts it like this: Team Topologies has changed how we form development teams in Visma. Does the team get the right support from enabling teams? What is the sum of the cognitive load on this team? And so on. It has really helped them think about how they design teams, how they allocate work, and the responsibility boundaries around those teams.
The results of these changes going back to 2015 are that they have changed from six deployments per year in 2015 to two deployments per week in many areas, and in some areas even multiple deployments per day. They have increased the sense of ownership and responsibility within the teams. You can find more information at teamtopologies.com/examples. Thank you to Tinius Alexander Lystad for sharing that experience with us.
The final case study today is from Wealth Wizards. They provide financial advice online. Founded in 2009, their customers are both consumers and companies, and they have been increasingly successful over the last few years. Prior to 2019, there was significant growth in the number of different products available. Going back to 2017, there were just two products, one on pensions and one on retirement advice. In 2018, additional products around other investment advice, other pension advice, and so on came along. Effectively there was a six-times increase in the number of different products and services that needed to be offered, so complexity was increasing, codebase size was increasing, and so on.
This all came to a head in the middle of 2019 when, in the words of the CTO, they ground to a halt. Releases were taking a long time, sometimes months to appear. Teams were having to look after many different microservices. The platform they had built at the time made it very difficult to have a consistent user experience across multiple different products. Teams were having to think about many different things.
Fortunately, in September 2019, Team Topologies was published. Again, in the words of the CTO, it was just at the right time. Wealth Wizards found the patterns and language in the book really useful for a new way of thinking about this challenge, team boundaries, who should be responsible for what, and so on.
Here is an example of the new architecture they came up with after reading the Team Topologies book. At the top of the diagram there are some stream-aligned teams. Towards the bottom in blue, there is a series of platforms. On the right-hand side, there are some enabling teams. There is a complicated subsystem team in there for doing complex calculations. Richard Marshall, CTO at Wealth Wizards, said Team Topologies has given us the tools we were looking for and helped us build a plan and confidence that we know where we are going and how to get there.
The results of applying this Team Topologies approach are that they have clear patterns and language for conversations between different groups in the organization. They have a framework for making design decisions, and they have confidence in their approach to scaling the technology part of the business. Thank you to Richard Marshall, CTO at Wealth Wizards, for sharing that.
Matthew Skelton and Manuel Pais
Matthew Skelton: We are going to do a quick recap of the main points we talked about today. We saw these five different case studies. Some common aspects were that the stream-aligned team should be the starting point if we want to achieve fast flow and fast feedback from running systems.
We saw how Team Topologies provides a common language and a set of patterns for the whole IT organization. We need to explicitly design for team cognitive load, and have that always in mind when making design decisions. The platform is not just a set of services but is a curated experience for engineers to accelerate and simplify software delivery.
We are looking forward to seeing what is next and hopefully sharing more case studies in the near future. At the moment, we are working on a free workbook around applying Team Topologies for remote teams, given the current context with the pandemic and most people working from home, and the expectation that is going to continue for some time.
We have a number of free resources. If you want to hear more about remote-first interactions, go to teamtopologies.com/remote-first. We have tools and templates available on github.com/teamtopologies. We are open to feedback and other examples. You can contact us by email at info@teamtopologies.com or on Twitter, @teamtopologies.
We have remote-friendly training available. If you want to keep up to date on news around the workbook, training, or other industry examples, sign up for news and tips on teamtopologies.com. We will be answering questions on Slack now. Later today at 17:55 there will be an Ask Me Anything session on Zoom. Hope to see some of you there. Thanks for joining the session today. Thank you very much.