Time Theft: How Hidden and Unplanned Work Commit the Perfect Crime
Invisible work competes with known work. Invisible work blindsides people, leaving teams unaware of mutually critical information, until it’s too late.
Married to this problem, is the question, how does one plan for, or allocate capacity for the invisible? It’s tough to analyze something you can’t see. Incognito work doesn’t show up in metrics. Hidden work stalls and blocks important priorities and masks dependencies. Risk accumulates from work delivered late and started late.
The solution is to put conditions in place that allow unplanned work to be seen and measured -- particularly high risk work involving far-reaching decisions. This talk shows you how to do just that.
Chapters
Full transcript
The complete talk, organized by section.
Dominica DeGrandis
So it's been my pleasure for many years now to help people make their work visible. I teach teams how to see and how to measure the problems that prevent them from getting things delivered. And it doesn't matter if it's a 50-person startup in LA or a 150,000-person company in Amsterdam, everyone suffers from time theft.
So I want to provide you with some countermeasures today for how to deal with these thieves that rob us of our time.
There's a reason our offices are plastered with whiteboards. We are visual learners. And when we can bring our visual sense to solving problems, we can get clarity, and it's easier to make decisions.
So let's have a look at how to make these thieves visible so we can do something about it.
There are five thieves that, if we could see and measure the impact of, could help us improve our performance. And by that, I mean reduce our cycle time and be more predictable.
There's unplanned thief in yellow on the left. This is a sneaky thief. These turn into fires.
Neglected work in blue there. These are important things, but they tend to get benched by urgent work.
Unknown dependencies: these are the ones that have a high coordination cost, and they are very, very expensive.
Conflicting priorities: this is when Brent says yes to everything.
Thief too much WIP: this is the ringleader of all the other thieves.
Now, because I'm under a time constraint here of 30 minutes, I need to prioritize. So for each thief, I'm just going to give you one essential bit of info that you need to know. It'll be sort of like a crash course in what I teach people.
So now that I've introduced you properly to all the five thieves, let's take a look at the ringleader, the first item, the first thief, and that's too much WIP.
In textbook terminology, too much WIP is when the demand on the team exceeds the team's capacity. It's a rather boring way to say that our teams are drowning in work, and it's often because they're fully allocated to 100% resource utilization.
This equates to people doing their full-time job on top of troubleshooting two hosts that have gone missing. And it's why people can't even get started on their to-do list till 6:00 PM. And I like to say that we don't let our servers get to 100% capacity utilization. Why do we let our people?
So why does too much WIP matter? It matters because when we take on more new work before finishing old work, things take longer to do, cycle time goes up, and so does cost of delay. If you can't deliver a new feature because another request gets added on, then there's a cost toward that delay.
On top of that, WIP is a leading indicator. Mean time to repair, velocity, cycle time, lead time, throughput, those are all trailing indicators. Right?
There's a relationship between the number of things in your work queue and your cycle time. It's called Little's Law, and it's why the primary factor of time is the amount of work in progress.
For example, when you go to get on the freeway, you know immediately if it's jammed that your commute home is going to take longer than if there's no rush hour traffic.
You know thief one steals time when you find yourself context-switching all the time, or when you get asked that five-word question: "Do you have five minutes?" And you say, "Yes."
Two weeks ago, Liz sent me a Slack message and said, "What do you think about this? Can you take a look at it?" And because I really like Liz, I adore her, and I want to work with her, I said yes.
It was in the morning, and what could possibly go wrong? I'd have plenty of time to help her out. But it ended up taking much... Nothing takes five minutes. And so then I end up working late and working the weekend, and it's so annoying. I mean, I teach this stuff, and I still let it happen to me. I still take on too much work.
We do get endorphins from saying yes, and so it's just human nature. We tend to want to say yes. Even grouchy people like to say yes, but it's not sustainable. It's not sustainable to work evenings and weekends all the time.
So here's an idea for what to do about that. Simply try and hold each other accountable with some transparency.
This is a simple three-swim-lane board. It's got silver bullets in the top lane, teamwork, and then business requests in the bottom lane, and each one has its own WIP limit.
Silver bullet: these are the things that come directly from the CIO. Sometimes people call this the VP lane. They ask you to do special things, and they don't realize what the disruption is. And so here we're saying, "You know what? We can have one of these at any one time, but don't throw any more than one of these at us at a time." That's what that WIP limit is.
Sometimes you'll see WIP limits at the top of columns. They don't have to be like that. They could be by swim lane. They could be by work item type. They could be by the number of people. There's many ways to set work in progress limits. This is just one way to do it.
Lane number two there, teamwork. I like to call this revenue protection kind of work. This is fixing technical debt. This is security work. And we have a WIP limit of three there, which we are adhering to.
And then the bottom swim lane there is business requests. This is usually revenue-generating kind of work, and it's shaded in pink here because they've gone over their WIP limit. They had a WIP of five, and now they've got eight things in there.
And when we bring visibility to that, we can have other people look at our board, because we're being transparent, and they can say, "Hey, what's going on? Can we help you?" It helps if others keep us honest. It's like trying to go on a diet and staying off of sugar. If other people are around, you're not as inclined to eat dessert.
The feedback column there is an area where those items in the business request swim lane, because we brought new work in and we didn't finish the stuff in feedback, that stuff's going to take longer to be delivered.
So if you want to be more predictable, then limit WIP to the team's capacity.
Thief number two is unknown dependencies.
Architecture is a major target of thief two. A friend of mine works for a $23 billion company where product team X deployed a component that broke product team Y's product, and now team Y's customers have to fork out $5 million on the new part. They have a PR disaster, and they're losing significant market share because two product teams didn't talk to each other about their dependencies. Team Y had zero visibility on that decision.
It's very expensive when teams are mutually unaware of critical information.
Why dependencies matter. They matter because every dependency increases the probability that you're either going to start something late or finish something late by 50%.
So imagine you go out to dinner with four people at a fine dining restaurant. There's 16 possible outcomes if you aren't going to be seated until everybody gets there. Sometimes that happens: "We won't seat you until all your party is here." Well, there's 16 possible outcomes, and if you chart that out, that's only a one in 16 chance that you're going to be seated on time.
Dependencies are asymmetrical in their impact. With four dependencies, it's not a 25% probability that you won't be seated. It's a 93% probability that you won't be seated, because there's a greater chance 15 out of the 16 outcomes is happening. I put a chart in here so you could see that.
This chart's just with three dependencies. With three dependencies, you only have one chance in eight, so that's a 12.5% probability of arriving on time. And then if you add one more dependency, that's one in 16, or just .06 probability, unless they work in IT Ops, and then they'll never leave work in time to get there.
You know that unknown dependencies are stealing time from your team when the local pizza company delivers more than two pizzas to the same meeting room. If three two-pizza teams need to have a joint meeting to discuss their interdependencies, then that can be a problem. That means you got high coordination costs.
Also, when a change to your toolbar changes your filter functionality, that's a problem, too. That's a true story, by the way.
In general, when people aren't available when you need them, thief two is stealing things from you.
So something to do about it. Show of hands, how many of you have a dependency matrix?
One. I thought all these hands would go up.
Okay. Those of you, maybe there's some in the back. I forgot that I won't be able to see people with the lights here.
Of those of you who have a dependency matrix, keep your hands up if they involve string. Okay. Keep your hands up if they're still currently accurate.
All right. So this is for if you're struggling with dependencies.
The hardest thing we do is communicate across teams. And when a bunch of small teams, like two-pizza teams, these small teams with lots of dependencies between them, how much time is being spent coordinating?
I know we like small teams because they can move fast. That's true. But just realize that by having small teams move fast individually, you're paying a price of not moving very fast as a whole organization.
On this board here, we're just exposing the dependencies. We're just asking the question.
The first swim lane there is for expedites or unplanned work. We see this a lot on boards where we've got a lane for incidents or emergencies that need to flow through really quickly. And then here we've just got a lane to call out high-risk dependencies and maybe show how long that they've been waiting.
That's what that circle is there for, eight days. It's just showing you how long this dependency has been waiting to hear back from a third party or another team that's going to be impacted.
Thief number three: unplanned work.
Sometimes unplanned work comes in the form of strategic direction: "Let's stop marketing to everybody, and let's just focus on marketing to large enterprises."
But often it comes in the form of expedites, right? The fires that stem from some kind of failure, and we call that failure demand. Failure demand is based off of some problem. It's not based on because we're delivering something new and fresh and exciting that we can sell.
And it seems to come out of nowhere. Where were you when Dyn got DDoSed last month? I expect some of you were impacted by that.
So why unplanned work matters. Unplanned work, expedited work, it steals time away from work that's creating value.
This is an excerpt of the State of DevOps 2016 report on the left. And it's survey data here, but it shows that high performers spend 11% more time working on planned work versus unplanned work. And they're suggesting that we need to consider the amount of unplanned work as a measure of quality.
It's hard to measure quality, but if you've got a ton of unplanned work, it means that your performance has likely taken a hit because you're spending so much time fighting fires instead of working on value work. All-hands-on-deck incidents tend to reduce performance.
You know thief three is stealing time from you when someone joins your Slack channel and within two minutes, four people are just sucked into the vortex.
Events like this critical issue where nobody can log in, they add variability into our everyday work. It's interruptions, it's context-switching. It's what causes things to take longer than expected. And if this happens frequently, chances are that thief unplanned work, which is mostly failure demand, is stealing not only your time, but it's stealing your predictability.
So what to do about it? Expose unplanned work, or just finding a way to visualize things that interrupt your day, that take you away from delivering important work because you're putting out fires.
There's usually resistance to this, and it comes in the form of, "I don't have time to log every single time I get interrupted. I don't have time to do that." Usually, Platform Ops manager Eric will say that.
But after weeks of interruptions, the CIO wants to know why the Azure platform isn't up and running in production. And what does Eric say? He says, "Well, we've been busy."
If it's not tracked, there's no evidence, and it's a perfect crime when there's no evidence.
When you can make unplanned work visible, other people can see it, and they can understand why other stuff didn't get done.
Thief number four: conflicting priorities.
So I was working with the marketing team. Yes, DevOps is starting to spread to business teams and companies. And they were working on a report, and it was taking ages. The leadership would've liked for that report to have been delivered six months earlier than it was.
So we looked at all their demand. It turns out they had 13 initiatives. They had more initiatives than they had people on their team, and their priority meetings were an hour long every week.
So we reduced the initiatives to seven, and now they've got a better focus on their priorities, and their meetings are shorter. One cause of too much WIP is failure to prioritize properly.
And it matters, because if we can't prioritize properly, then we do too many things at once. If everything is a priority one, then nothing is a priority one, and everything takes too long.
It could be that the single greatest value for the business today would be to go help somebody else finish something. Like, I could go help Julia finish this A3 training she's working on instead of pulling something new in.
When people can't prioritize effectively, we try and do too much at once, and that just causes more WIP, and that causes a longer cycle time. It's just a domino effect.
You know thief four is stealing time from you when you spend countless hours in meetings discussing priorities, and when you keep adding more new work into your queue before finishing old work. And when people ask you, "Are you done yet? Are you done? Ha ha ha."
Thief four is a close cousin of unplanned work.
So here's a way to expose or see the things that are conflicting with each other.
We've got unplanned work or expedites in the very top column flowing through. We've got a couple projects that we're working on. We've got some maintenance work we're doing, and they're all competing with each other for the same attention. And so some of that work goes on hold, and we're just bringing visibility to this work down in the bottom swim lane.
This work down here is stuff that we were asked to do, but then another thing became a higher priority over it, so it got pushed down and it's being delayed. We're not working on it. It's on hold.
Showing that if, for example, you're asked to implement a new security vulnerability fix, but you can't get to it because the merit reviews have been prioritized. They're due on Friday.
That's the kind of visibility we want to bring to conflicting priorities. The idea here is just to have an explicit policy for how you're going to prioritize. Otherwise, how does that get done? Is it by the loudest person? Is it by the highest-paid person's opinion? Is it cost of delay? However you prioritize, make it explicit so people know what to do.
Thief number five: neglected work.
Neglected work, this is the planned work that often gets delayed because the business wants to work on value demand instead of failure demand. And that happens. We need to be able to adjust to that.
So the new feature gets prioritized over fixing technical debt. Thief four and thief five are very close cousins. And so neglected work doesn't get the attention or the budget or the resources needed to be successful.
It's like you know you need to get rid of all your X machines in your environment, your XP machines.
And it matters because important work just sits waiting. Wait, wait, wait, until eventually it becomes an emergency, and it causes distractions and interruptions.
Neglected work is perishable. It ages. It's like rotten fruit. It's wasteful. Rotten fruit is expensive these days. It sits on the counter, it takes up space, it gets old, it gets moldy, and it smells bad.
It's like if you don't change the oil in your car every so often, you eventually end up with a blown-up engine. It's not a good thing, neglected work. And you know you've got it when you delay important things that'll eventually become emergencies.
It's like you decide to take your better half out for an anniversary dinner, but then you decide to just batch it up with next year's anniversary.
Expose neglected work.
So in the top column, we've got expedites or unplanned work. And then in the second column, we've got revenue-protecting work. So revenue-protecting work is a major target of neglected work thief.
And here's a way to see that they're sitting on the bench. We're just going to identify how many days those things have been sitting there not moving. They're sitting there untouched. They're consuming space. They're consuming energy.
This is what leads to the ability to create an aging report. Like, what if you queried your ticketing system today and said, "Show me everything that hasn't been touched for 30 days." How long would that list be? That's a lot of money on that list. That's a lot of inventory that's just sitting there consuming space and not getting done.
So the guidance here with neglected work is to make it transparent on your board so people can see it. If you can see it, then you can start to do something about it.
It's hard to see the big picture impact when all the thieves are secretly attacking all the different teams. Very hard to see that. It's hard to visualize that. But if you can shine a light on them and expose them by tagging them, then now we can use our visual sense to discover how can we put these thieves out of business.
Here's an idea. What if you tagged all your thieves on your Kanban board? If you just tagged them. Yellow's unplanned work, red is when you went over your WIP limit, purple's all the unknown dependencies.
Once you have all these things tagged, then you can visualize them, and you can count them, and you can begin to see the patterns and connections that would otherwise be scattered across multiple team boards.
And then you can have a time thief-o-gram.
It's a brand-new idea of mine, and it's incomplete. It's not finished yet. I'm actually looking for help for how to nail down the specifics of this. But I think it could be very useful.
The intent here is, out of all the data out there and all the thousands of metrics out there, what is the most important information to focus on? A lot of you measure velocity and mean time to repair and defects and all these things that are great information to have. But what I want to argue for here today is that we need to measure the things that cause the problems, that prevent them in the first place.
What causes those problems in the first place? Stuff that prevents your team from getting done quickly.
Unplanned work, neglected work: they're all intended to be the number of cards that you actually tagged on your Kanban board. The two other thieves, like those, you probably need some kind of criteria. You're going to have to probably create a card on the board so that can get measured and get put into this chart here.
But the idea is, if we can just count these things and measure them, then we can see where our time is going.
Here's an aggregated time thief-o-gram trend. Then we can start looking at risk week over week or month over month. We could see the fluctuation in the amount of uncertainty across the whole system.
I recognize that this view is probably going to be too difficult to manage because the height of the different bars are not of equal damage. If people look at this visually, they're going to think, "Oh, blue, neglected work. That was our biggest pain point." But we don't know. That might not be the case. It may be that the yellow unplanned items caused a lot more damage.
So seeing all these five adjacent might not be the most impactful. So then I thought, what if we looked at a segregated time thief-o-gram, that we could at least compare apples with apples a bit more.
And then I thought, "Wait a minute. Let's view this as a balanced scorecard." This way, people can understand which thieves to focus on first. You can see which ones you're doing well with, which ones are robbing you of your time, and your predictability left and right. And we can track these and measure them. Now we can do something about them.
In yesterday's Lean Coffee, the morning session about leadership change, the first thing that people put to discuss, that got the most votes on it, that got prioritized first, was: how do we influence CIOs to do DevOps? What metrics do we need to get buy-in?
The CIOs that I've talked to, they want two things. They want more predictability, and they want reduced risk.
The thief-o-gram shows risk. Well, actually, it shows uncertainties, because risk is in the eye of the beholder. The thief-o-gram shows WIP, and since WIP is a leading indicator, we can see predictability or not within the system.
So what I want to propose here is that, what would that be worth to your CIO? How can this help you in trying to get buy-in from your CIO if they have this kind of transparency on risk and predictability measures?
I'm going to leave you with more theory and books, and just great overall reading on time theft and uncertainty. Especially want to call out Troy Magennis' book in the middle there. He's probably the world's leading expert on using Monte Carlo simulation for forecasting and being predictable in our IT space.
And lastly, if you want a copy of this deck, please send me an email: dominica@leankit.com. I work at LeanKit with Julia Wester, who I want to mention is giving a talk this afternoon on how to be more predictable. Highly recommend you catch that one.
So send me an email and I'll give you a copy of this presentation, a copy of a Kanban for IT Ops paper that I just wrote with Kaimar Karu. We'll give you a discount code for DevOpsDays Seattle that's happening in April in 2017, and also a copy of LeanKit's new Lean Business Report.
And just put "Flow" in the subject line. I'll get right back to you.
And thank you so much.
Gene Kim
Thank you, Dominica.
Actually, if I can just tell one quick story about Dominica. One of those things that, if you ever get a chance to be coached by her, definitely take it.
And I remember my assistant at the time and us showing you my Kanban board, and the look of concern and surprise on your face was memorable.
So, highly encouraged. Thank you so much.
Dominica DeGrandis
Thank you.