VendorDome: Reaching Peak Productivity in Production
The latest edition of the DORA report put elite-performing organisations further ahead than they’ve ever been. Most notably, however, the speed at which they deploy now matches the rate at which they recover.
During this session hosted by Jessica Cregg, Developer Advocate at LaunchDarkly, Martin Woodward, Senior Director of Developer Relations at GitHub and Andy Bold, Senior Engineer at LaunchDarkly, will discuss the various factors working to maximise both deployment frequency and MTTR in everyday software development.
They will explore how developer experience drives our decision-making and why it doesn’t have to stand at odds with productivity-enhancing measures.
Chapters
Full transcript
The complete talk, organized by section.
Jessica Cregg (Moderator)
Hello everyone, and welcome to the DevOps Enterprise Summit Europe 2022 VendorDome. My name is Jessica Cregg, and I'm a developer advocate with LaunchDarkly. We are here today to talk to you about speed and recovery. As you may recall from the latest edition of the DORA report, elite-performing companies were further ahead than ever before, and most notably, the speed at which they deploy is now matching the speed at which they recover.
Today I am joined by two panelists who are about to go head-to-head, or hopefully side-by-side, in discussing how we can reach peak productivity in production. This discussion is fully interactive, so please put questions in the Ask the Speaker Track Four channel. We have Martin Woodward, Senior Director of Developer Relations at GitHub.
Martin Woodward
Hey, how are you doing? Good to see you. I'm glad we got through those difficulties. It's great to be here. As you mentioned, I work at GitHub. Before that, I was helping a few different teams roll out DevOps and DevOps culture change across their organization. I'm looking forward to chatting.
Jessica Cregg (Moderator)
Great to have you. And we're also joined by Andy Bold, Senior Engineer on the Platform Squad. Sorry, Andy, I merged your name with Martin's and created our first merge conflict of the session.
Andy Bold
Great to be here. I've been with LaunchDarkly just coming up on two years. I'm a senior engineer working on the Platform Squad and just moving into an engineering manager position to help the team succeed.
Q&A
Jessica Cregg: Let's kick things off by talking about developer productivity. Andy, how long did it take you to feel productive when you joined LaunchDarkly, and what are your thoughts about being productive as you move into your new position?
Andy Bold: It was actually quite quick. It was one of the best onboarding experiences I've had. Day one was the usual onboarding and meeting lots of strangers. Day two was getting the laptop set up: go here, download this repo from GitHub, run the scripts, and I was up and running. On Thursday of my first week, I did my first deploy to production.
Jessica Cregg: Super fast. Martin, how does that compare with GitHub?
Martin Woodward: We've got a lot better. GitHub was always pretty good, but the codebase has been around a while. It is about 11 million commits, and when you clone it your local Git repository folder is about 22 gigabytes, so it can take 45 minutes just to clone the code, and then you have prerequisites to set up. That was still much better than when I was at Microsoft, where the time to get something into production, the time to null ship, was about nine months because we were shipping software in shrink-wrapped boxes.
Martin Woodward: The goal at LaunchDarkly and GitHub is to have people deploying code to production with customers on day one. At GitHub we did work on virtualized container environments and productized it so everyone can use it. We got setup down from a couple of days to four hours, and then down to 15 seconds. If you want a new dev box, you can stand it up in 15 seconds. It is faster than I can clone the code, and it has all the prerequisites. It is a 32-core machine. In terms of getting code shipping to customers, we can do it same day now. Feeling productive is a different thing that involves culture and other factors.
Jessica Cregg: What difference did that make, and Andy, I'd love to hear about your container experience as well.
Andy Bold: There are two aspects. As a new person in a business, you're finding your place, meeting new people, and dealing with a lot of new context and cognitive load. Having an easy, stress-free process to get up and running matters. In past companies I've been there two, three, four, or five weeks before being productive, and that's horrible because you want to deliver value and you can't. From a self-confidence perspective, being up and running and doing a deploy on Thursday was important. It was scary because it was a brand-new company and if it goes wrong, what happens? But it was fine.
Martin Woodward: To get there, you need psychological safety, safety within the systems, and resilience within the systems. For me, it meant that I have checked in code to GitHub. I'm in DevRel, so I'm a PM, and most of the time engineers are trying to delete my code rather than let it run in production. But because I could easily get a clone of the machine, do development, test it in a dev container, and make sure it worked before sending a pull request to an engineer I had never met, I could submit it with a high degree of confidence that it did the small thing I needed and would not bring down production.
Martin Woodward: It also means that instead of begging favors or trying to nerd-snipe people into doing things, I can send a terrible PR that does what I want it to do. Then somebody can say, "Martin, that's awful. Here's a better PR," and I can actually contribute. Especially in a remote world, if you are productive and can get stuff shipped, you are more satisfied. There was a Stack Overflow survey showing a direct correlation between frequency of ships, time to ship, and developer satisfaction. We love shipping things and seeing people use them in production to solve problems.
Jessica Cregg: Speaking of metrics, what are your thoughts around deployment frequency and how it has become a measure of a successful team or individual?
Andy Bold: You need to go as fast as you are able, and you shouldn't measure yourself against somebody else. If somebody else does thousands of deployments a day, that's fantastic because it works for them, but you should aim to do as many deployments as you need to do and can do safely. If you do lots of deployments and they are all bad experiences, people will want to stop deploying. Deployment frequency is a good metric, but aim for it in the context of where you are and making it better.
Martin Woodward: We closely monitor the time from when the PR is merged to when it is running in production. We're not measuring how many we do in a day because it depends on what you want to change and the size of the change. Ideally, you want lots of small pull requests and the ability to deploy frequently. As GitHub grew, we had to modernize our engineering practices. When you have an order-of-magnitude increase in people, you have to change process to keep doing things productively.
Martin Woodward: One thing we did was move to a queue-based model for deployments. We had many engineers doing deployments, and we were serializing deployments because it takes finite time to roll out to a huge global estate. Now we batch a few together: whoever is ready this second gets pushed through. The downside is that those people are now involved in the incident if there is a live-site issue and telemetry drops, so we had to improve observability to detect where problems are happening. It's the traditional DevOps pattern: do the thing that hurts until it doesn't hurt anymore, then do the next thing that hurts and keep iterating.
Jessica Cregg: One question from the chat asks what made you confident sending out your first PR.
Martin Woodward: For me, it was being able to test locally. When I submitted the PR, I knew there was a full test suite and the tests were reliable and not flaky. When the PR passed checks, I had more confidence to open it for discussion and ask for human code review because there were enough computer checks before that, and I had done enough testing to know it worked.
Andy Bold: For me, it was cultural. I had a nice onboarding experience, met great people, and had an onboarding buddy in the Platform Squad. I asked, "What if it goes wrong?" He said, "What if it does go wrong? It's fine. We've got everything around this to make it safe. If it goes wrong, it goes wrong. Nothing bad will happen, and we'll fix it and move on." Having somebody to borrow confidence from was really good. Then doing it and seeing it go well meant I gained confidence each time.
Jessica Cregg: Gene Kim asks how many engineers you dedicate to developer productivity infrastructure.
Martin Woodward: It depends, and it goes in cycles. We have a lean team and mostly pull from feature teams to improve the core product, then cycle back into the core product. We don't have a dedicated developer engineering team at GitHub that I'm aware of, but we scale it up. When building Codespaces, the container-based development environment, we basically had the whole team working on the problem, around 100 people. We created the GitHub Computer Club and focused on the problems stopping people from moving from local Mac-based development to container-based Linux development. When people joined the experiment, they agreed to tell us if they were going to bail out and go back to local Mac development so we could fix the reason.
Martin Woodward: We don't have a huge dedicated team. In developer tools companies, it is hard to say because we build products that we also ship to customers. In banks and similar organizations, I've seen larger teams because they integrate things from vendors rather than building the thing themselves.
Andy Bold: I couldn't give a percentage because it changes. We put the number of people we need on the project at the time. For our project shipping ephemeral development environments for delivery squads, it started with me, then two engineers did the work, and then it was handed to another engineer to manage and take forward. The Platform Squad is about nine people out of roughly 60 developers.
Martin Woodward: On telemetry, we have access to Splunk logs and production telemetry, and we invest heavily in observability so individual feature teams can identify the problem and fix it. We have strict restrictions on read access to customer data, with break-glass mechanisms and logging if you need to look at customer data. For observability and logs, the whole company can see current requests per second, throughput, and logs. We treat PII leakage into telemetry as a P0 bug so we can enable observability safely.
Andy Bold: We monitor so many metrics and log files, pushing them into Datadog, with smart dashboards for focused views of system state. We can start from a high-level view and dig down into detail. We have mature alerts. When I am on call one week in four, the alert contains what I need: click here to see what happened, go to the dashboard, and follow the runbook. We measure RAM and CPU, but also traces through the service, and use Honeycomb for that.
Martin Woodward: There is a difference between telemetry to know if the platform is healthy and observability for diagnosing a problem and doing forensics. Measurement affects the system you measure, like the Heisenberg uncertainty principle. We are careful that high-level measurements and incentives are balanced, broad, and do not incentivize wrong behaviors. Mean time to deployment and mean time to remediation are metrics we judge, along with how quickly we can stand up a development environment and how much it is used.
Jessica Cregg: Are metrics tightly governing how you measure efficacy, or are you developing different metrics as they make sense?
Martin Woodward: It is more about trends than hard numbers. We have goals and want to get as close to zero as possible, but we care whether the trend is going in the right direction. If you put something in to avoid an issue, you want to monitor how it affects mean time to remediation. If it moves from seconds to days or hours, maybe it should not be in the path, or maybe it should happen after the fact. DevOps done right should feel like an eye test: better or worse now? Then adjust and iterate.
Andy Bold: Same for us. We look at trends. We have a weekly review meeting where service teams discuss service health, what is trending up or down, what they found, and what others can feed into. We look at raw metrics and also things like PagerDuty alerts over the last seven days, whether that is up or down, support tickets, and support ticket resolution time.
Martin Woodward: One metric I ask about when people want to improve internal developer tools is their developer NPS or satisfaction score for their own development environment. Have you asked developers? Are you regularly asking them? Are you making changes? Often people do not talk to their own developers and get a stat.
Jessica Cregg: Philip Day asks about issues that get normalized and do not get yelled about: a sort of snafu experience.
Andy Bold: Yes, and it is something we actively look for. We want to look for toil: things people do day to day that cause friction, get in the way, and slow them down, but they have always done them and keep doing them. We are keen to get rid of that because it is not good for anybody. It distracts you from what you should be doing.
Martin Woodward: That's where trends help. If setting up developer workstations takes four hours, you can decide that is terrible and you need to get it down to seconds, which requires a different approach. When you spot toil or hero culture, such as people pulling all-nighters, doing unsustainable work, or the same teams being called for incidents repeatedly, and it happens more than once or three times in the same area, then you have a systematic problem. Engineering leadership and the team need to ask how to avoid it in the future. Amazing work will always happen, but it has to be sustainable.
Jessica Cregg: Incident response and on-call reveal culture. They test whether blameless culture and psychological safety are real in practice. Eliza asks whether you set standards such as spending 20% of time improving things, and how you protect capacity for continuous improvement.
Martin Woodward: I've done different things in different teams. It depends where you are in the lifecycle. One thing I want people to take away from DevOps Enterprise Summit is that it doesn't need to be amazing before you do anything; you can take baby steps. In organizations just getting more agile, we monitored bug counts, avoided scrum waterfall, avoided long integration and stabilization phases, and fixed defects as we went. We also do periods where a team focuses on engineering optimizations, like app performance or front-end response time. We tend to do that cyclically on a team basis. We also have a monthly day of learning where engineering shares across the organization through brown bags and similar practices.
Andy Bold: From a strict practice point of view, we do not dedicate one day a week to improvement. We plan it week by week. Items from post-incident reviews and postmortems get prioritized at the top because they need to be fixed. For technical debt, every quarter we have a focus week where squads work on things that normally do not reach the top of the backlog because features tend to win day to day. It is a forcing function. We are also planning to spin out a developer enablement team, starting with two or three people focused 100% on these issues.
Jessica Cregg: David from BT asks how much platform work comes through InnerSource contributions versus requests, and whether that ratio helps keep teams smaller and productive.
Martin Woodward: From Microsoft internal engineering systems, where InnerSource is mature and shared by default, about 90% of changes to a system come from people paid to work on that system. About 9% come from close teams, meaning partner or sister teams, and about 1% come from people far away in the org chart making random changes. InnerSource helps build empathy because you can access code, see how hard another team's problem is, and sometimes realize a requested field is already in the table and only needs to be wired through. At GitHub, everything is InnerSource, with a big monolith for github.com plus services. Cross-team PRs happen, but maintainability is always the question: can we maintain this feature, and does it add significantly to maintenance?
Andy Bold: We try to avoid dependencies because as soon as you have a dependency, you have something that slows down delivery. In my delivery risk matrix, one dependency gives you a one-in-two chance of delivering; two dependencies gives one in four; three gives one in eight. We keep dependencies as few as possible. Otherwise teams work in their own domain, build their own things, ship to their own schedule, and release when ready.
Jessica Cregg: Richard Foden asks how you deal with hero culture.
Andy Bold: I don't think we have that problem at LaunchDarkly. It is a collaborative culture. The people who know the most are the people who share the most.
Martin Woodward: That is usually the sign of a strong engineering culture. For engineering leadership, the biggest problem is changing the questions you ask. If you ask the same questions as before, you get the same behaviors. Stop asking questions that do not matter. I don't care what a team says its velocity is, whether they estimate with T-shirts or Fibonacci numbers. I care about what is shipping in production, how quickly we can fix production, and customer satisfaction. With hero culture, thank people, but do not over-glamorize heroics. In the blameless retrospective, ask what to put in place so nobody has to do that again. Do not give the impression that heroics are what you want because that is where attention goes.
Martin Woodward: Culture is what happens when everybody isn't looking. You need a culture of collaboration and working together. GitHub and LaunchDarkly have similar engineering cultures, with trust-first and ship-to-learn mentalities, much of it from InnerSource and open source ways of working. At Microsoft when I started, divisions were like different companies. Microsoft changed incentives so that to get 100% of rewards, you had to demonstrate not just what you did, but what you did that helped others and what you built on from the work of others. That encourages an economy of sharing and giving credit broadly.
Jessica Cregg: That leads to practices for keeping feature flags cleaned up and preventing unnecessary things from building up.
Martin Woodward: We use feature flags heavily. We are always integrating into main, always shipping from main, and then after deployment we use feature flags to enable features to test segments and observability to test whether we made things better or worse. You have to set criteria and definitions of done. Is a feature done if the feature flag is still there? For short-lived features, does the flag need to be removed before it is fully done? If you are not careful, you build up feature flag debt. You can run tooling to bring it back down, but ideally you keep on top of it as you go, with flag cleanup in your definition of done.
Andy Bold: LaunchDarkly has facilities to help with feature flag debt. We can flag when a flag has not returned a different value for a period of time: if it has always been true for a month, you can get a notification asking if you still need it. Code references can integrate with your GitHub repo or other Git repo, and in the interface you can see where a flag is used in code, then go remove it when no longer needed. We also have timed changes, approvals, and workflows for flags once they are in code and out there.
Martin Woodward: Automated changes of feature flags require reliable tests and good coverage. Safety nets give confidence for refactorings. Dependabot is similar: it keeps open source dependencies up to date, but introducing a new dependency is scary, so good test automation and CI that gives a fast yes/no answer helps confidence.
Martin Woodward: Flaky tests are the worst. We have measurements: a flaky test is worse for productivity than no test at all. On one team, every night we ran the entire test suite 50,000 times against code running in production. If any test went red once, we took it out of rotation automatically and flagged it as a defect for the team to analyze. Usually it is a timing issue. Flaky tests destroy confidence, waste build resources, and create a Heisenbuild where you do not know which result to trust. They hurt new developers too, because if tests fail before you make a change, you do not know whether the system works.
Andy Bold: That matches my experience. Four times out of five the test is okay, one time it isn't, and then you rerun it. That slows you down because you revisit work to check it again and again.
Jessica Cregg: When you do not trust tests, you do not deploy. It hurts deployment frequency and mean time to deploy. It is never good to gamble with deployment practices.
Martin Woodward: It also hurts hypothesis-driven work. If too many things are changing, it is hard to tell whether the thing you did made the system better or worse. Removing flakiness helps you know that the change you made is what you wanted.
Jessica Cregg: To close, what is one piece of advice for making development teams more productive and increasing productivity into production?
Andy Bold: Be happy with good enough. Do not try to gold-plate everything because you end up in a never-ending circle of making the thing more perfect, and you will never ship it. Ship it when it is good enough, then make it better.
Martin Woodward: Don't ask permission. It is your job to make the system better. That is why you are paid: to solve problems and fix things. What one thing can you take from DevOps Enterprise Summit and do next week at work to make developers' lives and customers' lives a little better? What can you do in a day or a week? Go do that. Then next week, do the next thing that gets value into customers' hands quicker. If you do one thing a week, one thing a sprint, or one thing a day, iterative improvements build up quickly. Over a year you see a massive culture change. Write down where you were so that in retrospectives six months later you remember how bad the pain was, because people focus on the pain ahead.
Jessica Cregg: Combining those two pieces of advice, ongoing improvement lets you redefine good enough and keep that as a working definition across teams.
Andy Bold: Exactly. Your good enough always gets better.
Jessica Cregg: Where can everyone find you?
Martin Woodward: @martynwoodward on all forms of social media. I also want to promote DevOpsDays Kyiv, a free conference happening next week with people coming together for an important cause and raising money for charity.
Andy Bold: I'm going back into the DevOps bunker after this, but go to launchdarkly.com/careers. We're always hiring. Come and work for us.
Jessica Cregg: Thank you so much, everyone. Have a good DevOps Enterprise Summit Europe 2022. Enjoy the next couple days, and thanks for joining us.