Developer Productivity Engineering – The Next Big Thing in Software Development

Log in to watch

US 2021

Developer Productivity Engineering – The Next Big Thing in Software Development

Field CTO and Chief Evangelist · Gradle, Inc.

By 2022, IDC has predicted that 65% of the global GDP will be digitally transformed. Two-thirds of the products and services that you pay for will be driven by software. There has never been a more important time to foster developer productivity, but many of our methods have not evolved.

In this keynote style talk you will learn why DPE is the most important development in the software engineering world since the introduction of Agile and DevOps concepts and tools. DPE is a new software development practice that uses acceleration technologies to speed up the software build and test process and data analytics to to improve developer efficiencies by as much as 10x. The ultimate aim is to achieve faster feedback cycles, more reliable and actionable data, and a highly satisfying developer experience.

Justin Reock, Field CTO and Chief Evangelist at Gradle is pioneering DPE as a practice and set of technologies and is one of the world’s leading advocates. Specifically, Justin will provide an overview of the key concepts and tools, business impact on key business objectives like time-to-market, cost and quality, the business case for DPE, and the role of AI/ML in DPE moving forward.

Chapters

Full transcript

The complete talk, organized by section.

Justin Reock

[00:00:12.860] Hi, everyone. Thanks for joining today.

[00:00:14.680] I'm Justin Reock, the chief evangelist and field CTO at Gradle. You may have heard some of my other talks before around open source and developer productivity in general. My background is predominantly in software development, moving into enterprise architecture, but now I really focus more on evangelizing developer productivity engineering, which I'm going to talk to you about today. So let's jump right into it.

[00:00:39.080] I'd like to start this talk with the following quote from Eric Pearson, who was the CIO at InterContinental Hotels Group. And he says, "It's no longer the big beating the small, but the fast beating the slow."

[00:00:54.740] Okay? And I think that this quote adequately sums up the state of the art and the state of the industry right now. Okay? It's no longer the big, giant software companies that are just dominating, right? It's the disruptive, nimble companies that are able to respond to customer and market feedback quickly and bring their features to market fast. Right? And that's obviously what we're here talking about today with DevOps. Right? That's the whole purpose, is to get around various constraints and to unstick bottlenecks and to convert our code into throughput faster.

[00:01:31.200] So we're going to talk today about a very specific problem around feedback cycles for developers and how that impacts developer productivity. Because the practice of developer productivity engineering is very practical and very pragmatic and very straightforward. It's about identifying bottlenecks and friction in the software development process, and then using technology to mitigate those bottlenecks.

[00:01:57.260] Okay, so here's your average developer doing their thing. They're in their local environment, but this could also apply to remote environments or even building out in CI environments. And they're waiting on feedback. Right? They've written some code, they're going to run a build, and they want the build system to tell them something. Well, sometimes the feedback just takes too long. Right? Sometimes the build takes minutes, in extreme cases, even hours. Right? Or sometimes the test cycle takes too long. Maybe there's thousands of tests that have to be run. Right?

[00:02:30.020] Maybe the feedback is okay, but it's a failure. Right? There's some type of failure that has to be investigated. Right? Maybe it takes too long to fix that failure or to try to collaborate on the fix with that failure with other engineers. And maybe the worst case, it's a pitfall or a problem or a bottleneck that could've been avoided altogether. Right? Maybe a flaky test, a test with a non-deterministic outcome, or a failure that is actually impacting lots of developers in the organization, but nobody's talking about it because nobody's tracking it. Right?

[00:03:04.960] So think about these pain points, these bits of friction, and these bottlenecks, then multiply that by hundreds of calendar days per year, then hundreds of developers times that. We end up sinking productivity and increasing cost, which is antithetical to constraints-based work and productivity theory.

[00:03:24.780] So backing it up just a second, okay? Software development is still a creative process. Right? Even as practitioners, even as folks who do this for a living, right? This is still creative work. Now, it's not purely creative work. It's also scientific work. I think it's fair to say that it's a bit of both, right? We experiment with our code, we form a hypothesis in our mind, we run the code, and then we get the results. Right? And we want to understand whether the work we did accomplished the task that we wanted to accomplish.

[00:03:57.500] And I think for a lot of us, when we started in this industry and we started down this path of learning how to code, the experience felt like this very often for us, right? These were learning moments, right? We were achieving. We were running a Hello World program for the first time in a new language, and we would run it, and we would get the output, and that feedback would happen so quickly, right, because they were very simple programs, right? Maybe we're doing a standard Hello World, or maybe we're just changing the background color on a shape or something like that. But the point is that we would get feedback fast, and that fast feedback is part of what led to the delight of the experience.

[00:04:40.080] But as a lot of us have moved into more professional roles with developing software, especially in the enterprise, the success of software projects, ironically, creates a new set of challenges for us. Right? As the projects become more successful in an enterprise, as more users start to use the project, as the project grows in size and the number of developers working on it, or the number of repositories that are necessary to support the development, just the number of dependencies that are being used, and just the overall diversity of the tech stack, again, as enterprise projects get larger and larger, the toolchain efficiency starts to degrade if we're not managing it. Right?

[00:05:26.380] So the build time gets longer and longer and longer. The test cycles get longer and longer and longer. And ultimately, the delight of the process and the happiness involved in the craft begins to diminish and begins getting replaced with some frustrations. Right?

[00:05:43.860] So I think now, if you look at the day in the life of an average enterprise developer, their calendar might look a little bit like this for organizations that haven't fully invested in productivity engineering as a practice. Right? The developers might come in. They're ready to go, and they start coding. Great. And then we're going to wait for our local build to complete. We made a bunch of changes and something failed in the local build, so we're going to spend some time debugging that build failure. Maybe we fix it and then go to lunch.

[00:06:15.588] Great. We're going to come back, and we're going to code for a little while longer. We're going to wait for our local build to complete. Now, we debugged the build failure that we had last time, so this time it's successful, but, ugh, now we push that build to CI, and there's a flaky test. Right? There's a test that passed in our local environment, it did not pass in the CI environment. Now we're spending time investigating that. All right?

[00:06:39.308] So how much of this was spent doing what developers love to do, right? Coding. Not a lot, right? A lot of this is actually just spent with various frustrations, supporting the build, troubleshooting the build, debugging the build. So an outcome of developer productivity engineering then is to give developers literally hours back in their week to be able to work on valuable solutions.

[00:07:01.868] And I think that we are at a point where we can make a global appeal for this now. Okay? There's a study that's been, I think it was published in October of last year, and some of you may have seen it, you may have even been quoting it in some of your presentations today. It's a very powerful statement, and it was from IDC, and it's a prediction that by 2022, right, coming up by the end of this year, 65% of the global GDP will be digitally transformed. 65% of the goods and services that we pay for will be software services.

[00:07:36.668] Okay? So this is now still this relatively small group of craftspeople, developers, who are quite literally lifting all boats in the harbor, and I think as much as possible, we owe it to this part of the workforce to have a happy and productive work experience. And that's very much the purpose of developer productivity engineering, is to increase productivity by increasing developer happiness.

[00:08:03.068] Because right now, the current state teams do work far from their true potential, and the productivity of developers absolutely affects their happiness, right? Again, going back to when you first learned about coding. It's neuroscience, right? Those little rewards that you get for running the experiment and seeing the results that you want, it's a dopamine hit. Right? It's a small reward that leads to overall joy and happiness, and if you can't be productive, if your toolchain is blocking you from getting the feedback that you want and from building code at the rate at which you want to build it, you're going to be more frustrated than you are happy. And so, as a result, low developer productivity is blocking business innovations.

[00:08:47.868] It's not that DevOps hasn't done amazing things in terms of increasing our ability to get code to market faster. Of course it has. But what we're talking about is a new set of bottlenecks that are actually further left in the process now, that are actually part of the developer experience in writing code.

[00:09:08.148] So this is not a replacement for DevOps or anything by any means, right? This is more of a continuation. Right? It's a constraints-based theory, just like just-in-time manufacturing, business process re-engineering, ultimately moving into things like change management and then evolving into practices like Agile and DevOps. We know these processes deeply. We understand Goldratt theory and theory of constraints. And if we fast-forward now to what developer productivity engineering is trying to do, it really is just taking a look at the same types of bottlenecks that have been identified by other practices like this in the past, and then taking pragmatic solutions to do something about it.

[00:09:56.528] Okay, so let's talk about what those solutions are. That's a lot of theory, but the solutions are actually very straightforward and pragmatic. So when we say that DPE is the next thing, like this continuation of DevOps, though, it's quite literal. Right? We really say, okay, well, this is the next set of bottlenecks and friction and pain points that need to be addressed.

[00:10:18.588] So we use acceleration technologies and data accumulation and analysis technologies. All right? We use acceleration technologies to speed up the build itself, to decrease the amount of time it takes for a developer to get feedback about the build, and that applies to local builds, remote builds, and CI builds. We also apply technologies to the testing process to allow more tests to take place in parallel, and we're working on a machine-learning, gradient-boosting-based technique called predictive test selection, which was actually pioneered at Facebook to avoid running certain tests that probably won't produce any valuable feedback for us to begin with, right?

[00:11:04.488] So really, we just take these acceleration technologies, we apply them to the build and the test to speed up the feedback cycles. And then just as important, we monitor metrics like build times and test cycle times and failures and test results so that we can determine flaky tests. And we use all this to paint a picture of overall build performance so that it can be monitored over time by a group of production engineers. Okay?

[00:11:36.108] So that is fundamentally different than some other approaches to productivity that we've seen in the past. And I think we can talk about two categories of productivity work. We have developer productivity management, and we have developer productivity engineering. And I think it's really important to separate the two. We're talking about developer productivity engineering in this talk, which has a different focus.

[00:12:01.080] Whereas productivity management might focus on the people, so how many lines of code are being produced, how many story points are being generated by it, or how many story points does a team's capacity have, what's that team's velocity? DPE focuses on the process and the technology. It says, "Can we make the build times faster?" Not are they fast enough, but how fast could they possibly be given the right types of acceleration technologies?

[00:12:29.160] And so our metrics then are based on outcomes. Symptomatically, a lot of them are the same as you'll get from a productivity management solution. This will absolutely increase a team's overall velocity. It will allow them to work through more story points per sprint, or however you measure your productivity. But it's going to look at those raw concrete outcomes.

[00:12:52.600] The SDLC focus right now for productivity engineering really is just sort of in the build and test parts of the SDLC right now, because that's where the majority of the developer experience happens. This landscape is changing a little bit, certainly. I think as we start seeing more infrastructure as code, and we start seeing more developers actually getting feedback from production for very fast releases, so pushing things out to service mesh where they can weight network traffic and do things like blue-green testing or canary releases very easily as part of the release process, then we may start seeing DPE sort of creep into more of the CD and deployment side. But right now, it's really focused on test feedback cycles, build feedback cycle times, and tracking that over time. And as a result, the ROI from taking DPE initiatives is very straightforward and easy to calculate. It's very hard and proven.

[00:13:55.260] So just to give an overview of the overall solution and the five pains that we try to address with developer productivity engineering as a practice. We already talked about idle and wait time as a pain, and as a result, we want faster feedback cycles, and so build caching and test distribution are the two technologies that we utilize there. We're going to do a super quick demo of a build cache so that you can get a better idea of what that is, and I'll show you where you can link to a video on test distribution. Unfortunately, we don't have enough time today to do a demo on that one.

[00:14:29.720] Then the next thing, inefficient troubleshooting. So we mentioned that the build time is just one part of it. The build time and waiting on feedback is just one part of it. What if that feedback is negative feedback? What if it's a failure? How can we improve a developer's ability to debug that failure, collaborate on that failure with other engineers so that they can get to a root cause? And with that, we utilize something called a build scan, and you may be familiar with this if you've built with Gradle in the past.

[00:14:56.500] If you haven't run a build scan, but you build with Gradle, you can do it right now. It's part of the open source tool. You can just do `--scan` at the end of a Gradle build, and it will run a scan. You'll kind of get a chance to look at that. We'll see one at the very end. And these are also available for Maven, the Maven free Gradle Enterprise plugin, kind of a freemium feature for Maven. But you can also run build scans against the Maven build as well.

[00:15:20.080] So what the build scan does is just collect a whole bunch of forensic data about the build itself and context around the build, and puts it in a shareable form, in a URL that can be passed around the business. Failure analytics allow us to detect proactively things like flaky and non-deterministic tests, and then other types of avoidable failures that may be occurring, again, for multiple developers, but there's no visibility. No one's actually tracking that, and so because no one's tracking it, that failure effectively never leaves that developer's workstation.

[00:15:58.820] There's no metrics or KPI observability. No one's actually paying attention to how well builds are performing in a lot of organizations, and so that's another pain point. No one is really paying attention to how much time developers are sitting waiting for builds to complete or test feedback to come back from testing frameworks. And so part of this is making sure that you're rolling up, aggregating all that data centrally and being able to visualize it.

[00:16:27.140] And then a side effect of this, which really doesn't have anything to do with productivity, but since things like caching are allowing us to, and things like failure analytics and flaky test detection are literally making the build systems do less work, a side effect of this is that we can utilize our CI resources more efficiently. A side effect, then, is that if we're building out on cloud and we're worried about CI costs on cloud, which we all know is a creeping cost, by literally asking the build system to do less, we can save on those resources. So it's just a side effect of DPE. It's really not part of productivity, but it's worth mentioning.

[00:17:06.720] All right, so those are really the five pillars of DPE, and if you're aware and conscious of these pains, and if you're using technology and process to address those pains, then you're going along the path of developer productivity engineering.

[00:17:22.360] So let's look at what kind of impact this can have, because very fast feedback cycles are really important. Let's take a look at two separate developer teams and just do a little thought experiment here. We have 11 developers on one team with a four-minute build time, which if you ask any Java enterprise developer if a four-minute build time's killing them, they're going to say no. They're going to say it's fine.

[00:17:41.660] But compare that to a team of six with a one-minute build time, and look how much more often they're able to build. Look how many more local builds they can run in the same unit of time. Right? This second team will, in all likelihood, be able to ship more, better features because they're able to build more frequently. They're able to have a smaller change set per build. They're able to avoid merge conflicts more often because they're able to build smaller change sets more frequently, and they're able to experiment on the code base more frequently.

[00:18:14.524] When we start looking at the savings per year at very large teams, when we take this same principle, look at 100 developers doing 12,000 local builds per week with a nine-minute build time. Reducing that with our acceleration technologies like caching to a five-minute build time can translate to 5,200 days a year in engineer savings.

[00:18:37.024] Okay? So we talked about this build cache, and it's just a tool for fast feedback cycles. It was introduced to the Java world by Gradle in 2017. It's not the only build cache technology out there, but it is available for Maven and Gradle, and it's important to understand that this is very different than a dependency cache. Right? A dependency cache, like an Artifactory or a Sonatype Nexus, those hold your binary dependencies, fully compiled binary dependencies that need to be downloaded to various projects. And they're useful. Right? They're complementary to a build cache, which actually caches outputs from various tasks or goals within the build.

[00:19:20.044] So, as a Gradle task completes, the inputs from either the Gradle task or the Maven goal are effectively just cryptographically hashed, and a key is generated, and if code hasn't changed or if tests haven't changed that would affect the output, then when we actually go to run the build, we just generate the key, look in the cache first to see if we have literally the exact same output that would've been generated based on the changes, and if we do, we just pull it from cache, and it's usually a lot faster. Okay?

[00:19:52.204] Several open source projects, more than just are in this list, are using this technology now, and they're seeing, I think, very dramatic results. The Commons IO Java library is using caching and brought their build from a minute 23 down to four seconds. Spring Boot from 21 minutes down to six, and this is even better now, actually. We can go and we're going to look at this dashboard as the very last demo. We've got about five minutes left. The Atomix project, which builds lots of stuff, this is Tomcat, but then also things like ActiveMQ for JMS, from an hour and 27 minutes down to 20 minutes with caching.

[00:20:27.404] So we're going to take a look at a super quick demo really fast. Okay, so this is just a very simple Maven project. It's actually the Camel Spring Boot Router archetype that you can get from Maven Central Archetype Repository, and I've modified it to add the Gradle Enterprise Maven extension's open source, well, freely redistributable extension that you can just add to your project. So we're going to do a Maven clean verify, and this should just be like a normal run.

[00:21:03.864] I think generally this project takes 20 to 30 seconds to build and test on a normal run. Right? About 10 seconds probably, because I built it pretty recently. But let's run this again now with caching turned on and look how much faster. Okay? We pulled several of these tasks from cache, and the build time only took three seconds. And we can take a quick look at a Maven, excuse me, a Gradle build scan here on a Maven build, and take a look at our performance, and we can see that we avoided 79%, almost 80% of our overall build time using the cache. Okay? Let's jump back into the presentation since we are getting near the end here.

[00:21:54.844] So build caching is one acceleration technology. Test distribution is another. Test distribution is the ability to distribute test workloads across multiple agents in sort of an elastically scaling way. We don't have time to get into it, but we do have a really good video on how we've applied this to the Apache Cassandra project, and you can view it here.

[00:22:20.024] Okay, now the other part of this we said was observability and data. What gets measured gets improved, right? We all know that. Performance regressions are very easily introduced into any type of build infrastructure, right? Changing office locations, refactoring the code, changing the way that we manage binary dependencies and things like that, all of this can have an impact on build performance, and so it's really important that we maintain vigilance over that part of the build and make sure that those metrics are really well understood.

[00:22:54.704] And so that's the other two practical parts of this practice: doing failure analytics and really making sure that we can detect things like flaky tests, and then making sure that we're watching build performance over time. So we're going to take a look at another very quick demo, and then we're going to wrap up. This is the Spring Framework Gradle Enterprise dashboard. So several open source projects, a number of open source projects, we just give this technology to for free. Gradle Enterprise is an enabling technology for the practice of developer productivity engineering, and Spring has chosen to use it to augment their build process.

[00:23:37.324] So I want to point out two things here. First, let's take a look at failures. These are failures that have been aggregated across the build process for all Spring developers. And look at this one right here. So we have these two types of failures, non-verification and verification failures. The non-verification failure is like an infrastructure failure, right? It's something from maybe a network timeout, whereas a verification failure is an assertion that wasn't met properly. So like a programmatic failure.

[00:24:06.552] But we can find out, for instance right now, that 26 builds have failed with this particular failure. We can see and drill down to all the various builds where it's happening. These are different local builds taking place for different users. And we can drill down right into the failure and really try to understand what's happened, and we can even take this link, and we can link to really any part of this build scan, but we can take this URL, copy it, we could paste it and give it to somebody else, and it'll take them right into the failure. All right? So this is one way that Spring is using it.

[00:24:40.872] And then another way I want to point out is the trends dashboard. And this is what allows us to take a relative amount of time, maybe the last 28 days, working weeks, four working weeks, and take a look at how our build performance has been. How many builds have taken place, what's the cumulative build time? If we scroll down a little bit, we can see how much savings have been taking place by our cache. So this is the way that productivity teams can remain vigilant over the overall productivity of the development group.

[00:25:18.412] Okay. Let's close this out and wrap it up. I think we're coming right up on time. So just as some next steps, we do have an e-book on this subject that you're welcome to download for free. You can go and hit this link or just look up the "Developer Productivity Engineering" e-book. You'll find it. Take a look, it was written by Hans Dockter, the inventor of Gradle, and the person who sort of coined the developer productivity engineering practice and term.

[00:25:44.652] So that's a good start if you want to learn more. Try a free Maven or Gradle build scan. If you have Gradle, just do `--scan` or just try the Maven extension, check out our documentation, read more about build scans, and of course, feel free to reach out to me, Justin Reock. That's jreock@gradle.com if you have any questions, and we'll take some live Q&A now.