Agile Coaching with Impact: What coaching advice really makes a measurable difference?

Log in to watch

Las Vegas 2024

Agile Coaching with Impact: What coaching advice really makes a measurable difference?

The presentation will explain how Agile coaches are currently challenged to know which advice will have the most cost-effective impact on their teams. The audience will be presented with burn-up charts from Teams that show significantly differing results. The presentation will explain the guidelines under which the teams are operating. The audience will be asked what issues they see with those teams and choose what they would suggest to help the teams succeed.Then it is revealed to the audience that the teams are simulated and there is only one simple source of randomness. The potential benefits of learning more about Agile dynamics from simulations will be explained. Just as a pilot learns from hours in a simulator, so could an Agilist learn a lot more from literally thousands of iterations in a day than they might from a whole career as a Scrum Master or coach.

Chapters

Full transcript

The complete talk, organized by section.

Dr. Anthony Earl

Thank you so much for choosing this presentation to come to — I really appreciate it. It means a lot to me. The ideas I had for this presentation came to me the day before last year's conference, literally in Vegas. So it's pretty special for me to be able to talk about these.

I'm Anthony. I am an Agile coach at Lockheed Martin in the space division. I've been there four years. Before that, I think in 2013, I first became a Scaled Agile coach, and I was working in agile ways for ten years before that. So that's my background. But let's jump into what we're going to talk about today.

So we're looking for ways to improve. If you're a scrum master, if you're an agile coach, if you're a release train engineer on an agile project, you're looking for ways to make things better. Even a small percentage improvement, for scaling up many teams — there are hundreds, many hundreds of teams at Lockheed Martin — if we can make a difference, five, ten percent, that's a big impact on the bottom line. Helps our shareholders, keeps everybody happy. It's a win-win situation for everybody.

But what was frustrating me as a coach was choosing which ways could we make an effective decision about making things better. There's a collection of ideas that you'll find from the world of lean, the Toyota production system, obviously agile, scaled agile, and so on. You can group these things together. You can put them into faster feedback. Feedback loops are great for making systems work — that makes a lot of common sense. If you want a better production system, you probably want faster feedback. You get your negative or positive feedback loops, make corrections, it becomes better.

There's work on flow. Don Reinertsen's book is really good on improving your product development flow. I can't remember its exact title, but that combination of words is in it. And whether you're looking at shortening the queues, making things flow in a smoother way — again, that makes a lot of sense. If you're focusing on the flow of value, you're going to see improvements.

And let's not forget the people involved with this. People are happier, they produce better outcomes. Having them take responsibility for what they're doing, giving them a purpose for their work — that's going to work in making things better. So all of these things are going to make things better. There's no real question about that.

But how to choose which things you make better is one of the key difficulties we face. We do not have, as far as I'm aware, a rational method to approach this. Scaled Agile's number-one principle is take an economic view. You want to base that on real data. Where are you going to get the real data?

Well, in an agile system, one of the key places is to look at the number of points a team produces. And I know you're all cringing inside going, "Oh no, not points again." Just let that stand in for now. Live with me on this for a few more slides, and you'll see why I'm so intrigued by this. Just let it stand in for the amount of work a team can do. However they're doing the work, there's got to be some measure of what they're producing. So let's call it points for now. Let's not have a religious argument, as the previous speaker mentioned about points.

There's a cost to change as well. There really is a cost in something being disruptive. It's going to change the way people work. It's going to maybe result in a U-curve of less production at first, and then it's going to ramp up. Hopefully that's what you're hoping for, but it's a cost. So if you're spending money as an investment — and Lockheed has spent a lot of money on Agile — then you want to make sure that you're getting good value in a timely manner.

So we found some teams. Were these teams conforming to the guidelines really, really well? It's pretty tough to read the single bullet points on this, so I'm going to turn around. Apologies for this. They were planning within capacity. They were basing the capacity on the team's average from the previous four or five sprints. They were working through stories in priority order and just taking one story at a time.

So these are going to sound like perfect teams. They were using Scrum, and at a small size — like five engineers/testers. They were people who were not in a product owner and scrum master role at the same time. So we're talking about maybe a team of seven. They were using relative estimates for story points. Again, the story points — don't lose any sleep about this — but they were using what Scaled Agile recommends for teams getting started. Choose a very small story that you really can finish in a day, and estimate everything relative to that. So a two-point story took two days, a five-point story five days, an eight-point story eight days, and so on.

This was an opportunity to really compare and contrast data across teams that were following some guidelines. If people weren't busy, they were taking work. If the sprint backlog was empty, teams were taking work from the next sprint. And if they didn't finish the work in that sprint, they did not get any credit for that work until they finished it in the next sprint. It was always moved forward to the next sprint.

So these were conventional sprints. There are no tricks here. Just two-week sprints, eight-hour days. I know not all of Lockheed works on eight-hour days, but again, for the purposes of this, let's live with this. We can't discuss the actual details of what they were working on, but there were no excessive dependencies, no substantial dependencies between the teams that they had to take account of.

So let's look at the burn-up charts. This was the first sprint of one of the teams. What do you think is going wrong in this team? Because they estimated nearly 40 points, which was within their capacity, but it resulted in around 20 points. Were they not identifying bottlenecks? Were they not T-shaped teams — they didn't have enough range of skills to make things work? Were they not visualizing the work? What was going wrong? I won't ask you to put your hand up or anything, but keep in mind your favorite choice among those — what number did you choose for those?

Let's look at the second sprint. They did a lot better. They finished 50 points worth of work and they had a fairly steady burn up. They were finishing work on a somewhat regular basis. A little flat in the middle there, but a pretty good outcome.

So what do you think they would work on next? Again, make your choice from that list: act on feedback, make sure they got trained scrum master and product owner roles, and so on. Again, remember the number that we draw — instinctive choice for this team.

It's the same team, next sprint. It was okay. They actually finished the work that they had planned in the first half of the sprint, and then they flatlined. So something odd is going on with this team. Again, if you've got some ideas of what they could do to make that better: reduce queue length, synchronize their planning with other teams, remediate legacy processes — you know, they're working at Lockheed, there's plenty of processes that come from pre-agile days. They may be not pulling the right size work from the backlog.

And how about the fourth sprint? Here they knocked it out of the park. They finished 70 points worth of work in this sprint. So they've been taking the advice from some of those yellow bubbles. Well, no, not really. Because I'm not even going to show you the next sprint — it was not anywhere near as good as 70 points.

So what's going on? Do you think they were improving overall in their outcomes? I don't really think so, because it seems they're kind of all over the place. Did anyone see that they were applying any specific pieces of advice that I offered as potential? You see the difficulty of being a coach — you've got to decide these things.

Maybe it's going to be easier if we look at not just one team but four different teams, and we look at not just one sprint but five sprints worth of work — a whole scaled agile increment. Does it get any better? Here are results from that. Coincidentally, two teams — team B and team C — scored 228 points, which was more than 20% better than one of the other teams at 202 points. And I know you're going to feel uncomfortable comparing points across teams, but please live with me for a few more slides. You'll see why. And if you read the introductions of the token — why you came here — you actually already know the answer. But note: these teams have got different patterns. They don't all follow the same pattern. They're completing work and flatlining at times in different ways. There's nothing common about those.

So with these perfect teams, they only took 15 minutes — a 15-minute person-hour — to plan each given story. They followed just about all of the guidelines that I listed earlier on. They were perfect, and they didn't generate any rework. Any work they finished was finished without any bugs, any problems. Their estimations were exactly accurate. If they said it was a two-point story, it took two days and the 15 minutes for planning. They had no interruptions, no email, no messages, no meetings to go to. They just focused on their work, and they took the work in exactly priority order.

So there you go. It was a simulation. I gather some of you already knew that. They were not perfect. There was nothing in the simulation that implemented improvements. They were not really executing scrum, they weren't going to any meetings, they were not doing retrospectives, they were not doing demos, they were getting no feedback. So that's not perfect. They didn't refine any of the stories into smaller stories.

But these were all exactly the same team. When I said these were different teams, they were not different teams — it was exactly the same model executing just another time. In fact, I could do a thousand executions. This was the simpler version of the model. I was able to do a thousand iterations of this simulation in 30 seconds on a regular laptop — a regular HP laptop, nothing special. But it wasn't the same outcome for each iteration, for each team. So what was different? There was just one source of randomness in the model — a single place in the model where something was different.

And that resulted in a 10% difference in performance across the teams, or across the same team at different times. It was simply the order in which the work was arriving to the team. The different sizes — there was an equal proportion, a 20% chance of a one-point story, a five-point story, a three-point story. The only difference was in that. So if you are making an improvement, or you're making a change to your team, and you think you've made an improvement — one thing that suggests is, even if they were a perfect team, it could just be random that you've seen some improvement. It may not be a real improvement. It may simply be randomness.

So that's one benefit of this. I'm going to mention here the tool I used to do this was called JaamSim. It's open source under the Apache 2.0 license. I'd not used it before. When I mentioned earlier on that I got this urge to do this the day before last year's conference in Vegas, I spent a whole day learning how to use JaamSim. It's Java. It will run on your favorite Mac machines, on your Linux machines — and yeah, it'll even run on those Microsoft machines. So it will scale very, very well. A model like this does not push this at all. It claims to support 40,000 active objects, and there was nowhere near that in this model. The more advanced model — I'm going to show you a little bit of it — took five minutes to do a thousand iterations.

So let me show you more of the details of that, and then I'll move on to the more complex model. Here was the model I just showed you the results from. The one source of randomness is up on the big red arrow there. That was just this creating the size of the stories. The stories were placed in the team backlog, which is at the top of the triangle there. And from there, the work was pulled off in each sprint by the teams for planning to the sprint plan. We're going to look at the details in that on the next slide.

The other bits and pieces there — most of the model, almost every parameter in the model, is just read in from a settings file. So that's at the top left-hand corner there. So if you built a model like this, you'd be able to change just about any of the parameters to your liking. It's got the teams there, and the sprint calendar to keep things within the time box.

And then when work is done, there are just those planning 15 minutes spent by a team member to understand the story, then the story gets placed on the sprint backlog. It's worked on by one of the team members, or things from the backlog are worked on by the team members in parallel. When that's finished, if the sprint's still within the time box, it will be scored for that sprint. If it isn't, it's recorded — how much time has been spent on the story — it loops around back into the backlog. And this work that's still on the sprint's backlog is the first work that's taken in the next sprint's backlog, and prioritized to be finished first.

That's how an individual sprint works. So what can we learn from this? I've mentioned already that if you've made a change, it might be random if it's within 10%. My estimation, at least with the parameters I used, were just based on my decades of experience in software engineering. I just used typical numbers — there's nothing special in those. There are no tricks in this.

But I think we can let people who are not experienced with agile start to learn how systems work — people, teams working in agile ways. We can learn what affects them. And some of the scenarios that can affect them are listed on the top right list here. I've implemented all of those in scenarios in the more advanced model: taking bigger features — what happens if you make your features bigger? So that's going against the recommendation for small batch sizes. What happens if you have rework? That's obviously not going to be good. All of these things are not good for your perfect team.

Interruptions: more email, more meetings, unplanned work being introduced. That is something that we see no matter how well we try and define work and plans — somehow, especially at Lockheed at least, somehow unplanned work leaks in. People have got many talents, and their time is treasured, and people find out how to access that treasure and start doing some other work that isn't contributing to the features that were planned.

Context switching — that's a real thing. We're going to see, it's not such a big impact, but I did add that to see how big an impact it might be. And also blockers. That's one way of looking at blockers — saying, well, you didn't plan for some of those cross-team dependencies that really do exist in the real world. But there could be blockers for other reasons as well: server went down, just some unexpected problem that the team encountered.

So we can look at it quantitatively. I know it's parameters I chose, but I wanted to give you a sense of what impact they could have. But before we do that, I want to show you this is real. I wanted to show you the actual full model in full operation, but my manager wouldn't let me do that. What he did say though is he wanted to assess the level of interest. I said there's a lot of people in the room. So if you are interested, he said that if I gained enough interest, he would consider letting me do a virtual event. So if you contact me, you can either do that on the Slack channel — I think there's one for the room — if you want to just meet me afterwards, give me your email address, I'll give you mine, and you can send me a request if you want to dig deeper into this to see if it's something you'd be interested in helping me get more enthusiasm behind. It was something I started by myself and then got very limited extra hours to work on. So that's a way you can help me. I know that's one of the themes of this conference. So if you want to see more of it, we can do that.

But I'm going to show you a very short video. There's no sound, so don't worry that your hearing's gone. The two scenarios we're going to see a little bit of: when there's a blocker. The normal stories are green — the green dots moving around this model. It's a discrete event model, so they go into activities and then they pop out when they're finished. We're going to see them get paused at the top and some extra work done with those. We're also going to see an example of the extra work — they're going to be red dots, they're going to just go through the normal work pattern. They've been introduced though during the sprints, rather than onto the team backlogs. So they end up interfering with the work that does contribute to features. So fingers crossed that this works. I'll describe what we're about to see here.

Those are the five different teams. I know you'll count six there, but one is just a template. The submodel part of JaamSim's not very well documented. You can start in the top-left-hand corner. PI planning happens to create the features with random sizes within limits. Then as the work — the stories get finished, working out to the teams — it stacks up the finished features there. We're going to zoom in to one of the teams working on a couple of sprints. And what you'll see near the top, in the top left, as we zoom in, you'll start to see there are a couple of stories being blocked. They're going through just a pause loop there that indicates the delay as you wait for those. You'll also see the red dot being worked on. And you can see those graphs — the graphs I was showing you on the earlier slides are just screenshots of those results that come from those teams. And you may have seen the line switch as it picked up the work that wasn't finished in one sprint and it went to the next. So it's a real model. I really just wanted to make sure you actually saw it in action.

These are just initial observations from that model running each of those scenarios one by one, and giving me some satisfaction that this is a way that you can look at some quantitative assessments of which improvements would you make first. If you had a team that wasn't perfect, it was suffering from some of these results — where would you get the most significant improvements if you made changes?

Well, the worst — the worst one and two, and they're very close in their scale, are about 20% reduction in velocity. That is blockers. I'm sure that doesn't come as a surprise to many of you, but that's what the numbers seem to show to me. And that was just about as bad as rework. So rework and blockers — they cost your teams quite a lot in terms of velocity. So if you invest in practices that avoid blockers, that avoid rework, then you're going to probably see about a 20% improvement, which is obviously significant.

Then large batch sizes — large features. And surprisingly, this is about the same as unplanned work. So if you increase the size of the features that you're planning to do, the more they grow in size, the more stories that are involved, and the less chance that every story that you need for a particular feature will get finished. So I saw about a 70% completion rate for features as you grew the size significantly — the completion rate dropped to about 70%. Which coincidentally was very similar to allowing unplanned work. And the amount of unplanned work I let into the model was about one or two medium-sized stories coming into each team per sprint — well, actually it was about one per week, one extra story per week — and you started to see a significant impact on the feature completion rate. So yeah, one or two pieces of extra work don't sound like much, but when you add them up across all the teams on a five-team train, you start to see some significant problems.

And then number five: avoid interruptions. A lot of people — at least my intuition, which was wrong — said, "Oh, interruptions are a problem because of the context switching that you get." But it's not. The biggest impact of interruptions is the time spent away from the real work. So if you go into a corporate meeting, if you're answering email, you're not actually focusing on your story. The context-switching problem wasn't as big as I expected. It's noticeable, but it's not significant. I think if you had all these other problems that were causing more context switching, then the context-switching percentage would grow. But just in and of itself, it doesn't happen as much as you might think if you're in a somewhat healthy team. So it doesn't seem to be one of those things you would prioritize to make things better.

So in conclusion: I've already warned you that if you've got an improvement and you think you can show it's 5% better, that might be because of just randomness in the size of the stories they're taken. So just the way accounting for points happens — it seems just to be a property of the flow of that happening. So that is something to be aware of. But the big opportunities I see for this are that you can let people experience thousands of sprints and see if they make changes — if they're introduced to these scenarios, or more scenarios that could be implemented. There are some other scenarios that really would be quite straightforward to implement in the model I have. Some would be much more difficult, and obviously there are some things that are just very, very difficult to simulate. But you could let people train in these simulators just like we let our pilots train. You'd like your pilots to train — not just your Southwest pilots, but all of your pilots. I came on Southwest, so I was happy to see that this morning.

So that's one thing. And then if you dug deeper and put in some more effort, and made your parameters for the model match as closely as you can the results you're seeing from your real teams, then you may be able to make decisions that are much more accurate about which will provide you with value. So again, if you wanted me to have the opportunity to talk to you more about this, reach out to me and I will try and convince my manager we had enough interest in that.