Forecasting Using Data—Using Historical Data for Demand, Capacity & Project Planning

Log in to watch

Las Vegas 2019

Forecasting Using Data—Using Historical Data for Demand, Capacity & Project Planning

This session teaches you how to forecast capacity or delivery dates using a team's historical data. Probabilistic forecasting allows planning to take into account uncertainty and things that might happen (risks) and help communicate those plans

It will cover -

- Probability and Probabilistic Forecasting basics

- How much data is needed for reliable forecasting

- Predicting the arrival rate on incoming work

- Predicting the capacity of teams receiving work

- Building achievable plans across one or multiple teams

- Tracking progress and communicating status

Chapters

Full transcript

The complete talk, organized by section.

Troy Magennis

Hi, everyone. Welcome. Welcome to the forecasting talk. And first rule of forecasting: if you start late, you end late. So there you go. My work here is done.

The slides, you can download the slides right now. I know that's what most people... you can download them now and then leave, get to lunch early. To save you taking photos of the important slides, I thought I'd just put them out there straight away.

So we start all of our talks off here at DevOps Enterprise Summit, or DOES, with what are our passions? And I'm a little bit too passionate about data visualization, which is why I got married in my 40s. But helping people see data and understand it and interpret it and make good decisions by it is something that I paid too little attention to as a developer. And I guess as I've gone through my career, I've realized just how important it is. Sharing information to others is a very important skill to have.

And the other passion I have, which is more of a pet peeve, but I had to write it in a positive fashion, is that I see people trading predictability for value all too eagerly. It's all about predictability, predictability, predictability, which a lot of you come to a talk on forecasting to learn how to improve. Well, when we do that, we actually trade possible value, because sometimes the things which will drive our company value the most are really risky and really uncertain. So we need a way to try and help others understand that we're doing the most valuable stuff, and we're going to screw up from time to time. This isn't live, is it? I don't know if I'm allowed to swear.

So again, everything I'm going to mention in this talk is free and available. You can go and grab it and contact me. There's cards. You can get the slides here. There's cards on the edge of the AV table and the water cooler on the way out, so you don't have to take notes and take photos unless you really, really want to.

I want to restate what forecasting is about to me. And I know forecasting is about getting people to set commitments so they can be belted up later about missing them. But I don't want you to think about forecasting in that vein. I want you to think about it that we need a forecast to understand that we don't know enough about a system that's delivering, or an input of where we're going to get our help desk tickets. We need something to compare against to gain the insight that we don't know as much as we thought we did, or the system has changed underneath us and we don't know why.

So forecasting is about setting a baseline of what reality could be. And it's not a single occurrence event. What you have to do is you have to model, forecast, compare a reality that actually occurred. Reality's right. The forecast can never be right. The forecast will always be wrong. The model will always be incomplete because new factors take place. So our job is to whip around this cycle as fast as we can. And you can't do that if you forecast three months before you start a project, and then you complain that you didn't hit the date in nine months' time. Doesn't make sense, does it? But it's exactly what we do, day in and day out.

So I don't care that models are wrong. I expect them to be wrong. I expect my forecast to be wrong because I'm using it as a learning tool to tell me that I understand or don't understand the system well enough to even go on the record about making it predictable. And if it goes wrong, and it's for a reason of delivering something of high value, good job, team.

This is what I want you to think about. When you're forecasting something in the future, you should only believe it if you were able to forecast something in the past. So the first check you have to do on that forecast, check and compare reality, is to go back a period of time. Go back six months and use the three months of data from there, so minus three to minus six. Use that data, build a model, and see if you could actually predict what you knew happened in your system to some recent period of time. Only if you get that model, and you go around the loop a few more times to work out what your biggest factor is that caused the error, should you even consider forecasting into the future.

Then what will happen is, over time, your models will drift because the system will change and more unplanned work will come in, or you will take on riskier work, and your forecast will no longer align with your model, which is great information. Because it means you now have to think about: was that an intended consequence or was that an unintended one? And react accordingly.

A lot about forecasting, and when we look at the work that Deming and Shewhart did in the '70s and early '80s there, was about telling people to calm down when things were wrong. Like, just let the system settle. What was special cause and what was common cause variation? We want to be able to very quickly detect something special, something new, something unanticipated has happened.

So here's the first two takeaways. All my takeaways are going to be on black slides, and all the content is going to be on white. Forecasting is detecting earlier that you're wrong. It's not about setting a date and expecting to hit it. It's about knowing on the journey towards that date that you've deviated from the path you expected. And until you know you can forecast something slightly back in history, don't contemplate forecasting in the future with any anticipation of success. Not going to happen.

We're going to move on now. In a half-hour track, I'm probably not going to teach you how to do forecasting. It's not my intent here. I'm going to try and set the scene about what you should expect and what you can't expect and who to ask when you've got questions and problems, okay?

Because in our world, the most common way of forecasting is to just extrapolate out what has happened in the past going forward. In this case, this was the cumulative throughput for 100 teams inside a software organization. They were roughly, at the beginning of 2012 there, doing about 2,000 items a week, and by the end, they were doing about 100,000 items per week. Looks like a nice stable project. Who thinks that's a nice system to forecast? Yep, got a few hands going up there.

So here's the data that drove that cumulative projection. Notice it's got some weird stuff going on there, right? And as a data analyst or a forecaster, which trend would you like me to project, sir? Which one gives me the answer most likely is what you wanted? So depending on the timeframe you're looking out on doing a forecast, the methods we use where we just take a projection line and extend it may or may not be the right tool for the job.

And we all worry about, well, we've got to estimate stuff. If we just estimated better, it would all turn out well and worthwhile. Yeah, okay. Yeah. Stick with that belief.

All right, so that's around the December time period. Anyone got a guess as to what that was? Holidays. It's holidays. No. Yeah. Related to holidays. It was an HR policy.

Now notice that that throughput downturn happened every year on the same period of time, but it wasn't like after it finished there was this huge burst of pent-up, half-finished work. And this was only 3 or 4% of the staff taking vacation that period of time. The HR policy, of course, was use it or lose it by the end of the calendar year. So your superheroes and the people in the teams which were most constrained didn't have time to take their vacation through the year. They were faced with a situation of use your annual leave or lose it, so they did the right thing and used it.

But the dependency chains between those groups, and because they were the constraint, just meant that you had zero flow for that period of time. What's it cost, say, 100 teams, eight people per team? That's what, 800 people? What's the cost of their salary alone? You might as well have just sort of set fire to it.

And it's an easy fix, right? Just say you lose your annual leave on the date from joining the company or something like that. Shift it to spread it out throughout the year. Or better still, don't put staff in situations where they don't have time to take their annual leave throughout the year. Encourage them as a manager to take it elsewhere.

Anyway, what looks stable over a long term isn't stable in the short term. And depending on what you're forecasting, you're going to have to get good at understanding that.

Now, when we set out to forecast this type of problem about what a future value might be, there are three components to it. Trend is what you saw in that nice blue, nice curvy line. It's a long-term increase or decrease. Sometimes it's not linear. Sometimes it grows exponentially. As we add people, we get better and so forth.

On top of that, there'll be some sort of pattern, one or more. Think of traffic flow, for instance, right? What would be a pattern in traffic flow? Maybe day of week. It's slower on Saturday and Sunday than it is during the week. Maybe time of day, right? They're patterns which are predictable that alter the trend. They bump it up or bump it down based on a fixed amount. So everything moves up and down on the trend line, but you've got these patterns which are on day of the week and time of day there.

And then on top of that, you've got these special causes or noise, where there's a blizzard. Doesn't happen all the time, but when it happens, it's a hugely impacting problem. Whenever we're trying to predict a future value, we've got to be thinking about these three cases, and it will vary which one matters most in our forecasting domain.

Long term, the trend is probably the right one to look at. You don't want to micromanage these abnormalities of pattern or noise. But if it's medium-term, a month out or so, you might want to look at trend and seasonality, because if your project started just before Christmas, you're a month late before you even started the project. And that's going to matter. We think that we can just use the same model throughout the year. You can't, because the special causes are going to start overtaking your policy.

So you've got to be starting to think about in your work, okay, we're growing and we're adding people at a certain rate. It's probably going to affect our trend. Or if we're a helpdesk, we know that on Monday mornings we get a lot more password reset requests because we've had two days of chance for people's passwords to expire, not one, and so forth. So it matters which problem you're trying to solve and forecast with this.

If you're doing it yourself, I'm going to give you a tool to start off with. If you have a data science team, they can quickly build you a model which probably understands the trend and a little bit of seasonality. You need to help them and train and coach them on what your noise and special cases are. And they're going to be different depending on where you're at.

So what factors might impact demand? Ice cream sales. What factors are going to increase demand of ice cream sales? Temperature. Temperature. And you're right, temperature does impact ice cream sales. How might we model the simplest way to model temperature change? What might be a nice way to put it in, say, four nice easy categories? Season. All right. So if you were trying to forecast ice cream sales, you would want to make sure you adjusted for the season pattern of during summer, we sell more ice cream. During winter, only I buy ice cream.

Special cause variation. This is the big one. I worked as VP of technology for Sabre and Travelocity. And the things which made me not sleep at night was, we're going to run a new ad campaign. Ugh. And then The Amazing Race. Oh my God, we're going to sponsor The Amazing Race, right? Do you think that didn't have an impact on the load on our servers and the capacity that we needed to be ready for?

And even then, it got downstream because Lastminute.com used to run "when it's gone, it's gone" campaigns. So it would go from nothing at 8:00 PM at night to three and a half million hits in the next six minutes, back to nothing again. You try managing that server farm. That's why I'm gray. I had hair.

But it's important stuff, right? So no matter how good you are, you need to help any data science team understand the factors which are going to work with you, and you need to work with the business to understand how this stuff is going to affect demand on your servers and people that might need to be on the help desk. Because down the track, even when these happen, people buy these tickets, we have an uptick of help desk calls because a plane's been canceled or bad weather and so forth.

I managed to preside over the biggest Michael Jackson concert ticket sales. He tragically died. We had enough capacity to handle six million tickets being sold in four and a half minutes. We could refund about one a week. So no matter how good you think you are at it, you're going to get caught by these special cause concerns. Iceland volcano, out of the blue, not predictable by us. We knew it might happen at some time, but we certainly didn't know it would happen when I was in the UK trying to get back to Dallas. And it changed the way that hotels were sold. So even inside hotels, all the demand structure for the staff they needed for housekeeping completely changed over time. Context, context, context.

To get you out of being able to do this easily, I built a spreadsheet. I do that sort of thing on date night. My wife and I sit down and we knock together a spreadsheet to do various purposes. This one just takes a series of dates of things that have happened in the past for you, and it goes and projects it out. Now, it doesn't just project it forward, it projects it back as well. The orange line is the trend over time, and you see we're slightly getting more and more work here. The blue line is the actual data, and the forecast line is the dotted line. Notice I ran it backwards, and I ran it backwards so that I could highlight automatically which ones are possible special causes. Because if they happened once, they might happen again.

Now, the forecast is as good as we can do. Those special causes were, to a large extent, unpredictable, unforecastable, unknowable in advance. So those errors and bad forecasts are absolute gold. And I see people try and minimize the number of outliers they get, where you should be trying to maximize it, because that's where your most learning is.

No matter who you work with in your data science teams inside your organizations, you want to sit with them and work out what these factors are and if you should incorporate them into the model in the future. And they change over time. So again, this is a spreadsheet. There are no macros. You just put in a series of dates or a series of numbers, and it does this time series forecasting for you, and it looks for multiple different types of patterns: day of week, week of month sort of stuff, just in case there's a cadence of work cycle in your organization.

Special causes, I went through that. Know what factors are important in your context: trend, patterns, and special cause. What will be the best predictor in your case will vary depending on what you're trying to predict. The shorter term it is, the more you have to worry about the seasonality and the special cause trends. The longer term it is, the best you can do is trend lines, standard sort of progression lines, and know what you're getting to.

Forecasting duration and dates, and everyone says, "Okay, thanks for that, Troy, but how long is this thing going to take? And be within a week over six months, please." That's what we're having to try and do. We're having to try and forecast these very unpredictable events with supreme accuracy.

So when you're faced with a problem like this in forecasting, you go and copy someone else. That's what we do as developers, and it's what we certainly do as data scientists. Go and look at the way Google Maps has presented its forecast for travel times, which is a time-based forecasting system. They do a couple of things which are quite valuable. They don't give you one option; they give you multiple options. Why? Because they're leaving the context to you. If it's date night, take a bit longer, stay away from the kids for an extra couple of minutes. It's okay taking a bit longer to get to where you're going, or public transport if it's the second date.

The second thing they do is they don't commit to an arrival time until you actually leave. What do we do in software? We forecast to a date before we've even formed the team, placed the ads to get staff. You can do everything about choosing which option you're going to take with duration that you would do with date. In fact, people reverse the date in their head to say, "Oh, so that one's shorter." So just stick with duration as long as you can. Don't go committing yourself to a start date until you start.

But when you do start, keep going through that cycle of the model and refining it over time to make sure that you understand when you're deviating and when your model is being impacted by more special cause or seasonality patterns you don't understand. Even when we're forecasting, if we were continuously forecasting for that 100-team organization, we would've been able to get an early indication during the Christmas period of time that our dates are going to slip out. But if we just stuck with average data over the 12 months, that would be invisible to us. It's the errors where the value is.

So contrasting software planning to Google Maps: if you give one forecast now, even though your teams consider multiple approaches for delivering it, stop it. Start giving multiple options and getting someone to say, "This has benefits, this has benefits, over to you." If you're giving a calendar date for undefined completed or start dates work, like the team's not formed yet, but it's going to be done by the 16th of November, stop. Make sure you're doing that analysis and comparing the options just using duration. And then when you do start, continuously forecast to make sure you know earlier and where you can react with a much smaller push and nudge and have many more ways of solving the problem than if you just waited until it was late near the end.

So why, Troy? Why would you go through and not just use that projecting that trend line all the way up? Well, there's something that happens when we use an average trend line to forecast the future. It's to do with the fact that most outcomes end up being on a symmetrical distribution of some kind, a normal distribution in this case. I stole the picture from Wikipedia. Fifty percent of the outcomes will happen on or before the date that you give if you projected out an average line; 50% of them would happen after. So you're really up in a time-cost probability chance. I know it's in Vegas and that's fair odds, but you might want more than that if it's the team developing the software for your heart pacemaker. You might want to make sure that there's a bit more rigor in that it's going to be ready when it's inserted into your body.

So this is how it works. To solve that problem and be able to go on the record with a higher degree of certainty with the predictions we're going to use, we have to do something called probabilistic forecasting, which is nothing more than saying where normal math is, we take the average amount of work, the amount of work we have to do, and divide it by the rate that we do work and that gives us a duration. Well, all we do when we do probabilistic math is we say, "Well, it's about 20 to 30 stories and we get between about one and five done a week, and that's between four to 30 weeks." People get uncomfortable with that wide range and they get a bit stressed about it. But it is in actual fact, with the variability that the team has estimated, because you haven't got real data just yet, the right answer. The answer lies somewhere between those. It's just unusually wide and unusably wide.

What we're going to get you to do in this session is to insert your actual data of your team's actual pace into that denominator, so that you actually can then start saying, "At the rate this team is currently delivering, this is when the output's going to be." Then we get a bit more precise using some fancy math to work out that we're 85% sure, 90% sure, 95% sure, depending on where you want to go. And as we get higher in the probability, we get closer to the 30 weeks. But what happens is there's very few instances where the 30 weeks is the right answer. Let me show you how it works.

You've got a team where you've got some historical data of story points, velocity, or tickets completed, or work completed, and then you know there's a backlog of work that you have to get through. Well, what we do is we could just take the slowest of the previous set of throughputs that we have, or we take the average and project it out. That's the regression line. That's sort of this 50% chance of occurring. Fifty percent of the outcomes will be to the left of that date, 50% will be to the right of it.

Then we can take the worst-case scenario and just project out my team. Then we could take your teams and project out the fastest that we've ever done and bring that up to the line. And then we can get all crazy, have a few drinks, and start throwing up random ones. If we do that, if we just pick sampling from the historical data randomly and project them up into seeing where they would cross and intersect with the "you finished everything" line, you'll see that we start forming this distribution of possible outcomes.

When someone says they're 80% certain in a probabilistic forecast of rain, of weather, of snow, we're in Vegas, all we're saying is 85% of those lines were to the left of that point. Only 15% of them were over to the right. So now we can go on the record with a more precise probability.

Again, there's a tool for that to get you started. It is not the most complex tool. It simply does that division math that you just saw. You enter in a start date when you know it, a low and a high guess for the amount of work that you have to complete, and initially an estimate of how fast you're going to deliver that work. It does exactly what that other sheet did. You put your actuals on it so you can see when an abnormality happens and something special cause is slowing the team down.

You read that by stepping back at 14 feet and just roughly running your line down. I was a VP for many years, and I did that once. I said, "It's about December," and that was not seen as being professional. But it's actually better than what they were doing, which was just taking the average and progressing it. They just didn't realize it. I left.

So that's what we're doing when we're going from a 50% chance of that regression line to an 85% chance. Initially, the dates will be longer than people like, and you'll have to tell them why, and then you'll have to say, "Well, given that we should be missing half the work, I think we should only miss sort of 15% of the work. Can we make a deal on that?" And you're in much better situation to set capacity using your team's actual historical rate to actually get to this point.

That's hard to read and looks mathy and geeky, so I convert it to a table with red, amber, green. Best practice. And I just try and spell out the wording there of what it is. So again, the simplest possible MVP for doing probabilistic forecasting of your work: you need to do better. Start here, don't finish there. And that's the date you would've given. So that's about 14 days different, which is two weeks difference between 50%, the date they expected, to when I think we would've really had a chance of getting it.

As your teams start working, you throw in their real data. You throw that back in the samples of your history, and it stops using the estimate now and starts getting the same pattern as your teams have. So if there is a seasonality pattern in it, it will be replicated in the forecast in this way. We do some fancy stuff with months and stuff like that. But you start off with a nice wide-range estimate to compare options. When you choose one, you do a bit more analysis to maybe narrow that range, but you still want a range and you still want it wide. When you start getting actual data, you start removing the estimates and using real data as you go along to get your system profiled and modeled correctly.

All right. Now, how do you present that data? That nice colored chart, I know a lot of people are using it. They've stopped me in the hallways here. Stop using it. There's another spreadsheet which does this en masse. The idea is that when we're in the room and someone says, "We want to change the order," or, "We want to do something else as well," the more immediate we can say, "No, we don't have the capacity to do that," the better impact it has.

Because once we leave the room, they've already got it in their head, "Oh, okay, that problem's solved. I'm going to get A and B," when really they're only going to get A or B, and that's what the analysis is going to tell you. So you've really got to find ways to bring the capacity argument into the rooms where people are starting to set expectations. Because the moment it leaves the room, their expectation is set.

This spreadsheet just gives a nice simple thing of ticking across. And then when they get upset about missing feature four, you say, "Well, you want us to start that sooner?" And you change the start order, and now something else gets an immediate X and they go, "Ooh, I really want that as well." And you're sort of saying, "It sucks to be you." Work with me to make that so, which might be splitting feature A and B and one and four and getting the most important part of it that they really wanted.

I hear the objection: well, how much data, though? How much data do I need before this stuff takes place? You're balancing. I know in the real world of statistics, we want a large amount of data. But in our world, it gets stale so quickly that using that out-of-context and stale data actually increases the error too much. We pay too high a price for that data.

There is some fancy statistical reasons why seven samples, whether you're dating or whether you're forecasting software, is the right amount of data to have to understand roughly what good and bad is, better or worse is. So who dated more than seven people before getting married? I'm a coder. I dated one, married her. But you sexy people have much more vigorous standards than I do.

Less than three, you're better off using a guess because the data isn't good enough yet to really give you a good realm. About seven to 15 samples is the right amount to do. Please delete every bit of data you have after that point, because the worst thing is someone uses it, forms an average, and that affects your projections. If you're using this system, people say, "Why don't you connect to tools?" Because I don't want you to grab all the data. I want you to type in the seven that match. So seven samples, don't look back too far.

Why do we do probabilistic forecasting? It's because we want better than a coin toss odds of our forecast coming true. How do we get that? We start off by doing mathematics on ranges of estimates, and then we move into using real data into those estimates as we get it. Be really, really aware that you're going to have to sell to people that there isn't one answer. There's multiple answers, and we're going to keep on top of it over time to make sure that those outliers don't actually affect us, and that we understand the model well enough that we're trending down in actual data like our model is saying. And if it doesn't, when it doesn't, we understand why.

So you've got to balance recency with sample size. That's probably one of the biggest errors. After start date is number one. Stale data is number two reason that I see why forecasts really fail using any method, whether you're averaging or you're doing good probabilistic sampling.

Again, the slides are there, but we have to end our slide decks with what I need help with, what I see our industry is facing as the two biggest impacts on predictability. Here's number one. The guy on the bicycle, albeit who has a slower average speed and a higher amount of energy expenditure, is easier to predict than the travel time for the traffic on your right. This is important because we often run our systems at high utilization, and without flow, we really can't tell where it's going to be, and time of day matters and stuff like that.

Here's my commute in New Zealand. You see the point, right? You don't want to be stuck around in this traffic, and it doesn't matter if you have 10X teams. If you're overloading all those 10X teams, they will not be able to use the power that they have because they will be constantly impeded by a constraint somewhere else in the system.

So this is a very important business problem to solve: the understanding in our industry that utilization makes things completely impossible to be predictive. Because what's happening is we're actually trying to predict on this very steep curve where lead time changes dramatically by 10 to 100 times, just with a very small amount of change of utilization, like unplanned work, a drive-by, an outage.

The teams you don't want to be on, or you don't want to be forced to predict, are the teams who have all the experts in the organization, because they're the ones that get pulled off. You want a team of really average developers like me, because we never get asked questions. We never get pulled away from what we're doing. And so it only takes one absentee person to get to that point.

Second biggest problem we need to solve and help people understand is dependencies. If there were four people that had to be seated at a restaurant before they would take you and take you to the table, the chances of actually being seated on time are one in 16. There's only one case with a set of dependencies that you all arrive on time. Every other case, 15 of them, at least one person is late.

When we're delivering something, it's very similar to this. We can't deliver until all these four sequential steps are done. So our odds are very, very low, and it's very lopsided in the odds. If you've got a team infrastructure architecture like this, where there's seven levels, and we went and found a story which had to travel through seven team dependencies to get to the top, what's the chance of any feature forecast that you do delivering on time? The formula is one chance in two to the power of the number of dependencies: 127, 128. There it is. Him, her, whatever it is, that's your chance of delivering on time. The odds become incredibly stacked against you.

So if you're dealing in the probability game, and we're here in Vegas, so we should be, every dependency that you remove doubles your chances of delivering on time. While we're in this trend of making nice small pizza teams of three or four or seven people, if we could actually find a way of bringing together the groups and the skills needed into one team, five teams, one in 32, we have a much greater chance of delivering on time.

With that, thank you. Just remember, manage utilization, help others understand it, help others understand the impact, and help your companies understand the impact of dependencies and how it really puts you against the odds of delivering on time.

Again, there's cards at the exit with the links to all the spreadsheets and stuff like that, so they're on the AV desk or the water cooler. You get all the slides at bit.ly/ForecastingDoes, capital F, capital D. I'm able to stick around for questions if you want, but I know you're hungry because you're 10 minutes late.