Paving the Road for 30,000+ Developers
At SAP we have around 30,000+ people working in our development organization, with >1000 products on our price list using various technology stacks.
How do you increase developer productivity at this scale?
Backed by in-house user research and industry trends we decided to lower our team's cognitive load by introducing an in-house CI/CD platform called 'Hyperspace'.
Dirk will talk about the obstacles of creating 'Hyperspace' with a "platform as a product" approach to an organization that was highly fragmented. Concepts of Paved Roads (a.k.a. Golden Paths) help us to provide guidance to teams with the aim to reduce team cognitive load and decrease support load on central teams
Join Dirk's talk for lessons learned, impacts that we already see, and an outlook on what we envision in the Hyperspace.
Chapters
Full transcript
The complete talk, organized by section.
Dirk Lehmann
Hello everyone. My name is Dirk Lehmann. I work in product management for the internal CI/CD platform at SAP.
I have worked for the company for over 21 years in various roles. For example, I was part of the first team that established the first full continuous delivery approach at SAP. Today I want to tell you a little bit about our journey toward an internal CI/CD offering serving roughly 30,000 engineers. Let me bring up my slides.
Here we go.
01SAP context and scale
Some of you might know SAP and our products, and some might not, so I thought it is a good idea to give you some context of who we are and what we do to set the stage, also on the challenges that we face when we create a CI/CD platform offering for an enterprise of this scale.
SAP was founded over 50 years back with the idea to standardize enterprise software. Every company has to deal with financials, employees, customers, products, and so on; the idea was to build a product to manage all of that.
Today our ERP suite covers 25 industries, from oil and gas, retail, manufacturing, public sector, or what have you. Even though 80% of our customer base is from the SME market, 99 out of the 100 largest companies in the world run SAP. I'm not sure who is the one without SAP software, but I'm pretty sure my sales colleagues try to excite them.
We have over 112,000 colleagues around the globe in 130 locations, and out of that roughly 30,000 are engineers.
Our products are written in several technologies. Some of our products are written in our own proprietary technology stack that comes with its own programming language called ABAP, which has its own lifecycle methods and tooling all integrated in the platform. Other products use technologies which are maybe more familiar to you, like Java, which is roughly 30% of our code, plus JavaScript, C, C++, Python, Go, and some others.
We sell our software and services through various channels, like on-premise, cloud, hybrid software, cloud and on-premise mixed, mobile; we run our own data centers, but we also support the big hyperscalers that you all know.
The reason why I tell you all this is that the figures are quite impressive, and some of them were not known to me before I compiled the slide deck. But it also outlines and gives you context of the situation where we operate our CI/CD platform.
If you just take the highlighted facts that I just outlined, you get some idea of the challenges. We were founded in 1972, so we have quite some heritage and a huge active customer base that we have to safeguard. Whatever change or larger innovation we do, we always have to make sure that we do not disrupt our customers' business.
I spoke about many industries, and you know the challenges that companies often have, especially in highly regulated industries, to comply with legal requirements and such. Well, we have all of them, right? We have the sum of all of that: the banking, the oil and gas, the healthcare, what have you. We serve with our software all of them, so we need to support all the requirements in our software.
Our customers come from everywhere in the world and make business basically with every country in the world. That implies the known internationalization concerns like translations and supporting left-to-right languages and such, but also the legal requirements that our customers need to adhere to, and hence our software needs to adhere to.
To handle all of this, our engineering force is quite huge, with 30,000 people in around 130 locations and thousands of teams. The simple amount of people and teams that we have to deal with is already a challenge on its own.
We work with very heterogeneous technologies. Tools and processes that work in one tech stack do not necessarily work in another tech stack that we support. SAP promotes certain programming languages simply by better tooling and process support, but we do not limit or mandate programming languages or technologies that our engineers use. Yes, we do have discussions again and again whether we should mandate certain tech stacks, but currently we don't have that. I think it has pros and cons.
We serve various delivery channels for our software, and that implies slightly different development models for each channel. Think about things like feature toggling, which is a common approach in cloud-based software. It simply does not work in on-premise software. Shipping changes fast and early as possible means something completely different in an on-premise approach or a cloud or mobile app. And we have to serve all of that.
02Existing tools, requirements, and cognitive load
Now, how did we do that in the past? First of all, we offered quite a bunch of central tools to the development teams so that they did not need to deal with setting up each development team's own tools. We offered them centrally to various teams to cover their needs.
For example, we run multiple own GitHub instances, which are quite large already. We run a farm of around 2,000 centrally provided Jenkins instances, and Jenkins is just one CI/CD orchestrator; many teams run their own instance of their own CI orchestrator. We run our own central artifact repository that has around 250 million requests per day. So what my colleagues are doing here is pretty much the heavy lifting.
Sometimes the tools that we serve are basically targeting the same thing, like pipeline orchestrators. We have multiple, or build tools: we support multiple. Sometimes we have the same tool but in multiple instances because they are in different network segments or have different configurations. Then it simply depends on the team's technology stack, programming language, location, what have you, which one is the best for them.
Also worth mentioning is that the tool ownership was in the past spread across the company. Security tools came from the security people, and if you had an issue with a legal tool, your legal colleague was your friend to fix that.
We have broad process requirements, which are basically best practices for how to create enterprise-grade software. This ensures that our software is following all the legal requirements, security standards, accessibility requirements, and data privacy requirements that all our customers around the world have in all those various industries that I mentioned earlier.
Some of them are treated as best practices, like good advice, and some of them are a bit more mandatory, so they are not negotiable. They differentiate.
We have guidelines, good advice, and recommendations for all kinds of things, like architecture, support, operations handbook, service management, and such. Again, some of them are very specific. Some of them are applied only to some products. Some of them are globally applied to our products.
All of this engineering had to deal with: choosing the right tools; keeping up to date with the latest and best tooling; ensuring fulfillment of all the changing requirements; making sure that you don't miss an update on the requirements, otherwise your delivery could be stopped; ensuring to stay close with the latest guidelines of the organization.
Back when releasing software was a matter of months or years, this was somehow manageable to the teams. Also, they had huge support by central organizations that took away some of those requirements for them.
Lately we seek higher delivery frequency to get into closer feedback loops with our customers, which I think we agree is a good idea. In order to deliver faster and reduce the handoffs, we empowered our engineering teams so they can take over more responsibilities. Shifting left is the keyword here, which puts more and more load on engineering, and this is a problem: the team's cognitive load explodes.
Cognitive load can be described as the total amount of mental effort being used in a person's working memory. As a team consists of multiple persons, we can apply that idea to the whole team: the team's cognitive load.
If the team's cognitive load gets too high because of too many unrelated tasks that we put onto them, their ability to deliver customer value will go down.
So is shifting left the problem that we all got wrong, the whole industry? I think no. James Governor, co-founder of RedMonk, put it to the point when he wrote in one of his recent blog posts that you need to have a good developer experience in place that allows you to shift left all the things. If your developer experience breaks and you shift things left, cognitive load will blow up. Development productivity, developer happiness, customer value, all of that goes down, and the bad things, failure rate, stress, burnout ratios, all of that goes up.
It is important that you first have a good developer experience, and then you shift things left. A broken process doesn't get fixed because you shifted left; it remains broken, and it just increases the team's cognitive load.
03Hyperspace as an internal CI/CD platform
Two years back, we started a program to reduce the team's cognitive load by implementing an internal CI/CD platform offering following the platform-as-a-product approach, which is nicely described in the Team Topologies book by Manuel Pais and Matthew Skelton, along with many other important things like the team's cognitive load. If you haven't read that, definitely worth reading.
We named that CI/CD platform Hyperspace, as we had a previous predecessor project called Hyperpipe and we somehow never got rid of the hyper naming thing.
The idea of Hyperspace is, first of all, to have one entry point for the development teams: the developer experience portal. If you know Spotify Backstage, which is now with the Cloud Native Computing Foundation, you know what we have here. It is one entry point where the teams get and expose all their vital information to deliver value.
Below that, on the left, we have the tools and services that we already had in place, but we have reorganized them now into one organization alongside a newly created product management side that takes care of all the tools getting integrated, harmonized, and aligned.
We renovated the process requirement framework so that it fits better to the tools and services, and so that we get into a higher degree of testing, scanning, compliance checks, and automation, and that it is all better tailored to the needs and the situations of the teams.
Also, the ownership of that process framework is now within the same organization as the people that own and build the tools and the whole platform.
Centrally, we have a new component, which is the paved road, and I want to elaborate a little bit more on this approach with the upcoming slides.
04Paved roads and development procedures
The paved roads, or some companies call them golden paths, and I even saw mixtures of golden roads and paved paths or whatever you call that, the idea is always the same thing: giving teams clear end-to-end guidance for a complete process, in our case a complete delivery process.
We wanted to give teams concrete, detailed answers when they asked us: how shall we use this tool in order to comply with this and that? Then we could say, look here: in your context, the best way that you could use this tool for fulfilling this requirement is this and that, because we own and understand all of the tools and processes now after the reorganization. Well, that's at least what we thought.
The first idea to get to a paved road was: let's just bring all the experts of the tools and processes into one room, give them some canvas and some time, and then they will tell us about what is the best end-to-end way to build and deliver software at SAP. That did not quite work out very well.
What you see here from far above is a simplified version of the canvas that we worked on, and we didn't even finish. Every color box that you see here is a tool or a service, and the lines indicate implicit or explicit dependencies. As I said, this is a simplified version. At some point in time, we left out all the obvious dependencies, all the well-known dependencies to basically everything, because we were unable to read the canvas if we drew all those lines into the canvas.
As I said, we didn't even finish. We had to admit that the situation was way more complex than we thought it was.
But this exercise also had some good points. We learned that we need to set ourselves clear guardrails and constraints in order to tackle the problem. Given our technology stacks and the legacy stuff, it is impossible to cover all tools and process combinations at once. But we could start with an ideal case with not much legacy constraints and clear technology choices to have a better chance to handle the complexity.
So we set ourselves a clear context and took a divide-and-conquer approach. We separated the paved roads into various segments that we called development procedures.
In plain words, one development procedure describes a thing that an engineering team needs to do in order to ship software at SAP. Let me give you an example. If a team wanted to use an open source library in their application at SAP, they touched various tools and processes, and still touch various tools and processes, which sometimes is a pretty neat nightmare to them because they need to figure out: what do we have to do in regard to licensing? The open source component might have a copyleft or infective license, or the license asks them to expose their users somewhere in the application. They need to fulfill global trade and export compliance because the component could use cryptographic methods which fall under some trade sanctions. They need to store a central software bill of materials because SAP wants to know at every given point in time which software is used within their products if there is any security or legal issue popping up. Security: is there a known vulnerability to that open source component? And making sure that there is a mechanism in place so that if there comes up a newly found vulnerability, we have measures in place to tackle them immediately.
For sure, all those processes and tools are not always very well integrated. Some are, some are not. Now a development procedure describes exactly this, and we formulate that by giving it a trigger, describing what is the trigger that the team brings into action, like: hey, we want to use an open source library. And the value that they perceive after they went through all the steps that the development procedure describes, like: okay, the value is you can now ship your code faster by using open source software securely and compliantly.
Important is that in the first version of the development procedures, we describe the as-is state, the situation as it is now. We look at which configurations, which settings, in which sequence, which tool needs to be used today. We do not attempt to optimize the situation as it is in place in the first version, because describing the development procedures while improving it at the same time increases complexity, and complexity would blow up into our office.
So the first version simply describes the as-is situation in all its beauty or not-so-beauty sides. Then we have multiple development procedures, such as how do I manage my backlog tasks, how do I release a feature to customers, and such things.
The development procedures use certain tools and describe exactly in which sequence certain tools are used, how to configure them, and when in the lifecycle you should approach them in order to fulfill certain requirements and guidelines, always in the team's given context.
If you have multiple of those development procedures and combine them, you have a paved road.
Important is that the sum of all the development procedures is more than its parts, because the paved road also ensures that the development procedures that are used in the paved road are internally consistent. Meaning that one development procedure is not contradicting or conflicting with another development procedure. The tool configuration in development procedure number one must not contradict the configuration of the same tool in development procedure number five.
Also, each development procedure has a clear ownership, a development procedure owner, who is the subject matter expert in the whole development procedure topic. The paved road has an owner, someone who watches the internal consistency of the whole paved road.
Now we are at the DevOps Enterprise Summit, and I believe you identified what I actually showed you in the last slides: value streams. One larger value stream that we call the paved road, and smaller ones which we call the development procedures.
That is the trick. We did not invent anything new here. It is the same old value stream idea, but now it has clear benefits to all parties. The engineering teams have, for the first time, an end-to-end description of how to deliver software at SAP in their context that gives them clear advice about what to do, when to do it, and how to do it.
What you see here is how the paved roads and the development procedures appear to the teams as documentation. We did not copy or move existing documentation, but we link to the parts of the existing tool and process documentation which are valid in the context of this specific paved road. So the teams don't get lost in gazillions of documentation wiki pages whatsoever. We point them to the paragraphs that are important in this context: first go here to that tool documentation, that paragraph is important; then go there, read this paragraph, and do that.
As a central unit, we finally now have a systemic description that the teams follow for their delivery process, the paved road, and we can optimize along that. We can see the bottlenecks, we can see the constraints, and we can finally improve the system as a whole. Spelled with a W.
The feedback that we have received for our first paved roads is extremely positive. The team that we worked with as a validator and reviewer of our work said that if this would have existed one and a half years back, it would have saved them weeks of work. I'm not sure what that is in money numbers, but it is a lot because it is multiple people that would have had an easier life.
05What comes next and lessons learned
So what is it that we want to do next with the paved road? The first paved road has been generally available since the end of September, so everything that I tell you here is quite fresh experience.
We want to extend the scope of the paved roads, having more development procedures describing operations, portfolio process, and even cultural transformation, but in total keeping the core amount of development procedures as minimal as possible and recombining them to create more context-specific paved roads for the various technology stacks and programming languages that are out there. We will always reuse the existing core development procedures so that we only have a very minimum set of domain-specific development procedures that only fit to one or two paved roads.
Two learnings in creating the paved roads have been: first of all, you have to limit your scope, otherwise complexity will kill you. Focus on small parts of the overall value stream. Divide and conquer. Make the first scope as narrow as feasible, even if the scope looks too optimistic or utopian or ideal and has too few adoption cases in the real world. Adoption is not the goal in the first place when you start. It is creating transparency, gaining insight into complex structures. Adoption will follow that.
Start with the as-is situation. Improvements will be done in later versions. Don't try to improve the situation while describing the system. It just adds up too much complexity.
And have clear ownership for the various development procedures or paved roads, the value streams and the value stream parts, so that you always have an expert on how to go on and how to improve in the development procedures and in the paved road.
Paved roads are just one very central component of our Hyperspace CI/CD platform offering. Let me use the remaining minutes to share some learnings that we had on the overall platform so far.
Currently, we have roughly 30,500 pipelines on Hyperspace. Important for us is that we do not mandate the platform, and we do not mandate the paved roads. If teams want to create their own delivery process and pipelines with our tools or without our tools, it is just fine. The main criticism on mandating a platform, tools, or paved roads is that it kills innovation. Processes and platforms describe known things, best practices, how it already worked in the past. Innovation tries to describe something that is somehow unknown, where we do not know how it works best. Innovation needs freedom, and mandating a platform kills innovation.
Put all tools and processes into one ownership. This helps a lot to avoid unnecessary friction and handovers.
Paved roads, or golden paths, are a pretty cool thing, as I hopefully could outline in the last minutes. Take your time. We started this two years back and it still feels like we just started. There are still a lot of things that we have to learn and a lot of things that we have to do.
Take the whole team that works along in the platform along with you. I have to confess that we could have done better in the last years, but we are improving on this.
One idea is that we create a storyboard, like a visual comic strip that describes a day in the life of a persona, how we imagine the work using Hyperspace will be in the future. This helped us see whether we have the same vision or painting in our head, how the future using Hyperspace will look to a certain persona. It helps us communicate to stakeholders: look, this is how we envision how developers' lives will look in the future when they use Hyperspace. We are not sure whether this is 2025 or 2030, but this is the vision; this is the thing that we will work on.
I hope I could give you some insights into our internal CI/CD platform and how we work, and maybe one or the other idea could inspire you in your daily work.
If you want to reach out, feel free to use any of those listed social networks, and thank you very much for your attention.