Log in to watch

Log in or create a free account to watch this video.

Log in
San Francisco 2017
Share
Download slides

The Pursuit of Success & Averting Drift into Failure

The 'human factor' has long been seen as a weak link in otherwise well-functioning systems, and control through compliance and bureaucracy has often been relied on as a solution. But research over the past decades, from a number of industries, shows something very different. People are critical to the discovery and development of pathways to success—despite organizational, managerial and operational obstacles, goal conflicts and resource constraints.


Investigating why things go well, rather than hunting and tabulating individual errors, is proving to be a much better predictor of both failure and success. It can help avert organizational drift into failure by making visible the little and larger sacrifices people make every day to get stuff done. It can also inspire organizations to offer their people autonomy, mastery and purpose in their roles, allowing creativity and innovation to blossom.


Sidney Dekker, Professor, Griffith University

Chapters

Full transcript

The complete talk, organized by section.

Sidney Dekker

So I'm going to show you my career. This is what I did. This is not all my fault, okay? But I was on the back end of a lot of this, and I know what chaos looks like.

This is chaos: pain, hurt, suffering, dead people, lots of dead people. And so what do I do about it? Lots of dev. Lots of dev. I write a lot of books. I make a film. And then I decided, you know what? That's all dev and no ops.

And I was made professor quite early in life. You produce that much stuff, they make you professor early. You go, "Oh, my word, I have to do this for another 34 years." It created kind of an existential issue, and so I went all ops.

This is all ops. Somebody develops the airplane, like 50 years ago. Somebody maintains it, throws it over the wall: "Go fly." All right.

There's a little bit of training, but it's the saddest excuse for training I've ever had in my life. Because you don't want to pay for your airfare, and so we don't have money to train. You know what the answer is? "You'll learn it on the line."

But it's very safe. It's very safe, except when you have a professor in the copilot seat.

I learned a couple of things here which are quite... Not here, not from you, no, but from there. From doing this stuff. Which is actually confirmed by a lot of the research.

And that is, the first is anything, and I'm going to try to be serious now, anything that puts downward pressure in your organization on honesty, disclosure, openness, and learning is bad for your business. You're going to ask for trouble.

The second one is that any field of practice has a sweet spot when it comes to rules and standardization. This... Oh, okay. There was a cockpit. This field has long overshot it for quite a while, and we'll talk about that in a second. Where is that sweet spot, and where is it for you? Where are you relative to it?

And the third one is, and that is really quite an important one, this fascination with counting and tabulating little negative events as if they are predictive of a big, bad event over the horizon is an illusion. We should be doing something quite different if we want to understand how your complex system is going to collapse and fail.

Now, when it comes to driving this jet, one of the things that I've become sensitive to, being ops in this, is that you can throw stuff over the wall, but if you're ops, you own the problem. As a pilot, you have the privilege of arriving as the first person at the scene of the accident. It focuses the mind somewhat.

All right, so who's got this hanging off the wall? Well, who does C++ anymore? This is a really bad idea. But then I go into a warehouse and I see this nonsense. So somebody's counting this stuff, right? And you go, "Oh, that's really good because we pursue excellence and we hold our people accountable for excellent outcomes." And excellent outcomes is the same as zero errors and zero screw-ups and zero order-missed days, or whatever they are, whatever your KPI might be.

This is an invitation to a big blowup just around the horizon, and I've got lots of data to show that that's the case.

Here's one example. So these guys actually kill people. Well, you do at some point as well. But so this is a DuPont company. DuPont, very, very safe, supposedly. Right?

And these guys killed four people in a gas release in La Porte, Texas, not that long ago. Two of them are brothers. Little town, so you got two families, sorry, three families, but one family's got to bury two sons in the same week. That's not cool.

Now, the issue is, how have they been managing their safety, their errors? It's in the picture for you to see. If you look around, you will see that what they're concerned about is little bugs.

"Please take care." Sorry, "Please take extra precaution when driving and walking."

So who got killed driving and walking? Right. Somebody got killed in a gas release. And so this focus, this obsession on C++ mishap-free days doesn't predict disaster at all.

Great example: Macondo. Did you know that these guys had more than six years, six years, of entire incident- and injury-free performance on this boat? Because it's a boat. It actually floats. Six years injury- and incident-free.

Now, I don't know about you, but I've got three kids. All right? If you don't have kids, you have just solved significant logistical issues in your life, okay? And you'll be richer than I.

So three kids, two boys and a girl, and you know what? It's like 132 guys, and I have to say guys, it is gendered, mostly guys on this boat. For six years, no injuries and incidents. People, I cannot keep my house injury- and incident-free for a week.

Now, there is a difference, though. There's a nuance to this. So if you have boys and girls, you'll know this. When the boy does something dumb and you go, "Why did you do that?" and the boy says, "I don't know," he really doesn't know.

The girl does something dumb, and you go to the girl and say, "Why did you do that?" And she says, "I don't know." She knows. Okay?

Boys are dumb, girls are mean. Okay? If you don't have teenagers yet, you'll know what I mean very soon, okay? That'll be the take-home. The rest is all noise around the edges, okay?

So there we go. And then you get this, right? I'll just let that out there for a moment.

So these are very productive people. If you're employing them, it's your printer that they're using, right? And your paper, and your little Post-it note. You're paying for this. I'm just saying.

All right. This is lots of statistics from MIT. They want to be very intelligent people, or come across as, and so they make their statistics very complicated. But it shows something really interesting, which is the airline that seems to be less safe actually won't kill you.

So the airline that reports more incidents has a lower passenger mortality risk. Now, what's fascinating, so we replicated this data across various domains, construction, retail, various other domains. And we see that there is this inverse correlation between the number of incidents reported, the honesty, the willingness to take on that conversation about what might go wrong, and things actually going wrong.

So if you're going to fly back from San Francisco, find the airline that's got the most incidents, and you get on the other end alive, okay? That's the lesson.

All right. Now, let me take you to another really dangerous world. So you think you're sort of wonky, right? Don't go to a hospital.

So this is a hospital where 1 in 13 patients that walks in the door comes out sicker than they went in, if they come out at all. I mean, walking rather than on a stretcher. You get harmed by seeking care. 1 in 13, that's about 7%.

Okay, do you have to go to the hospital? We can wait. Is he important? He must be. He doesn't think this is funny at all. Probably doesn't know we're talking about him. What is his name?

1 in 13 goes wrong, 7% sort of, right? So now the question is, what do you do with that one? Well, you investigate it, and this is what people do. You investigate what goes wrong, and what do people find? Well, we'll look at that in a second.

All the resources of this hospital were focused on this one going wrong. Why is this one going wrong? Well, what did they find? Typical: human errors, guidelines not followed, communication failures, miscalculations, procedural violations.

And you go, "Okay, fair enough. If that is how you get hurt, then let's try not to do that," right? Very simple. This is a great postmortem, right?

And so we declare, some colleagues, a war on error. Clearly, error is a dangerous thing. We need to declare the war on error, which is both cute and riling and vexing at many levels. Or we send in the auditors, right?

But she actually does this for a living. This is not made up, right? She works in Toronto.

But then we asked, "Do you know why the other 12 go right?" Because it's nice to find out why the one goes wrong, but why did the other 12 go right? Do you know that?

And the answer was, "Nope. We have no idea."

Probably because there are no human errors and no communication failures, and people follow all the procedures, and all the guidelines are followed, and they don't miscalculate. The decimal point goes exactly where it needs to be. That's why we get 12 good outcomes. Right.

So we started studying this, and there is no substitute. You have to go native. You have to go ops, right? And this is what we did. We went ops in this hospital. And you know what we found? In the 12 that go right, it doesn't make a difference. There is no difference.

Human errors, guidelines, the same stuff shows up, and yet you survive. What's the difference? Luck? Yeah, no, there was other stuff going on.

And so this is interesting, but this cuts across domains. In the work that Dave Woods and his team, of which I was part in the early '90s as a PhD student, were doing already, we were discovering these patterns and working with them. And it's not in what people aren't doing. It is what in they... Let me do that again. It is in what they are doing.

The distinction between screwing something up and not screwing it up is in the presence of positive capacities, not in the absence of negatives. I think I got to say that again because that sounded important. I'm making this up as I go, but how did it go?

So the difference between things going wrong and not going wrong, or you screwing it up and not screwing it up, isn't in the absence of negatives. It is in the presence of positive capacities. That's the distinction.

Are there people who say, "This is not a good idea. Stop," even in the face of acute production pressures? Is the team taking past success as a guarantee of today's success? In a dynamic, complex system, that is an extraordinarily bad idea, right? Because past success is not predictive of success today.

How diverse is the team? Is there a willingness to accommodate dissent? People say, "I disagree." Is there an ability to listen to that?

And so the problem with these things, keeping a discussion about risk alive, and I think that very much goes for the dev community and the ops to some extent, too. But it's this. Having sat in that cockpit, I have experienced that the reward for speaking up about a concern, "Captain, aren't we supposed to fly 156 knots instead of 235,000 or something?" To speak up about a concern, the reward for not speaking up is much greater than the reward for speaking up.

The reward for speaking up is namely uncertain and delayed. I may not even know that speaking up saved my own life. The reward for shutting up is immediate, direct, and certain. As in, I don't get in social and reputational trouble with my captain.

And I think this is very true. I think you have baked this into the structure of how you guys operate, employ yourself, or are employed. Many times, and I've had a bunch of discussions with lots of colleagues here, many times I get the sense that if you're part of a problem that you feel you want to speak up about, in a sense you feel, well, but I probably won't have to own it because I won't be here. I'll probably be employed somewhere else. I may be on the East Coast. And so I may be in New Zealand, which needs you as well. It's a lovely country.

And so if you don't have to own the problem, then the reward for speaking up is not going to be there at all, right? And so you are probably not the first at the scene of the accident. You've got structurally baked into your employment and the way you work the easy possibility to not speak up, to not be that dissenting voice, to not raise that concern.

But remember, the airline that gets you home safe is the one that does talk about their things going wrong.

But this is not the only thing we need to talk about. Let's talk about rules and regulations and go to Jake's country. So Jake mic'd me up, and he's from there. They drive on the wrong side of the road, and they like signs and rules a lot. Okay?

Now, as I said, there is a sweet spot when it comes to rules, right? So there's a guy from my country, in Drachten in the Netherlands, that put up that. No. This is what happens when you go to Asia, right?

No, this is a square that used to have all that, and it's all gone. And a fascinating thing happens. So they had pretty bad accidents, about 10 a year. All right? And some pretty bad outcomes.

Now, traffic engineer says, "Let's take out all the rules and all the signs and all the lights and everything." It's all gone. And now they're down to one accident. This has become a system in which people actually try to divine each other's intention. Are you going first? Is it me?

And spontaneously, they all slow down to the slowest common denominator on the square. No rules, because nobody's telling them to do that. There's no sign: you shall slow down to the slowest common denominator on the square at this moment. No. And in what language would that be in Europe? 15 or something. Well, it wouldn't have to be in English very soon, but...

Thank you. You're with me. Good.

So, spontaneous, right? This is literally horizontal coordination, the beautiful things that we see in a complex system.

So I just wanted to show you those pictures to think about the sweet spot, right? Full autonomy like here or completely clogged with rules. You're in a good space. Well, we can debate that probably, whether you are in a good space. But remember, there is a sweet spot. Aviation has long overshot that sweet spot, right, clogging itself with more rules.

All right. Final message, and that's this one. If we want to understand in complex systems how things really are going to go badly wrong, what we shouldn't do is try to glean predictive capacity just from the little bugs and incidents and error counts that you do. No. We need to understand how success is created, and I'm going to try to explain how that works.

So Erik Hollnagel, a good colleague of ours in the trade, tries to paint it like this. He says, "You know, much more goes right than goes wrong." And this is probably true for all the work that we do. Much more goes right, and then when things go wrong, then we do postmortems. Then we send in the hordes to try to find out what happened.

But we can learn from what goes right. Now, my claim is going to be that not only should we be doing that because there's lots of data to learn how things go right, but also because for us to know how things will really go wrong, we need to understand how they go right, and here's how.

So let's first talk about Abraham Wald. Abraham Wald. Ever heard of Abraham Wald? So sort of the father of operations research, right? He's sort of the proto-John Willis. And so, not contemporary with Deming, however.

So Abraham Wald was born in 1900, Austro-Hungarian Empire, which for a Jewish boy is not a good place to be to try to go to university in the 1930s. '20s, '30s. And so he emigrated, got to this country. Armed forces understood that he was very good at maths, in particular statistics. They send him back to Europe, to England, soon not Europe, to England, to solve the following problem.

Bombers are coming back from Germany. All right? And let's not discuss the ethical implications of what it is that they were doing there. But really, the Dresden Cathedral? Never mind.

And so back, right? And they've got holes in them. Let's call them bugs. All right? So these bombers come back with lots of bugs. If you're a pilot, that's not cool. Right? Well, it's cool. It's literally cool, but it's not what you want. And so if it's in your lifting surfaces, it's not good.

And so what they wanted was some predictive capacity for where to put some extra armor on these airplanes. Now, armor and airplanes are not really good bedfellows, right? Not good. It's not payload, it's not fuel, and so it's just dead weight, literally. So you want to be very judicious with where you put it.

So the question to Abraham was, "Where should we put the armor?"

"Well," he says, "let me get the data."

And so operations research, right? Lots of data. He measures and he calculates. So months go by. People get very impatient.

"But we need to start installing the armor on these airplanes. Where do we do this?"

"I have the solution."

"Oh, good. So where do we put the armor? Of course, we need to put the armor where the bugs, the holes, are most likely to show up, right? That's where we need to put in extra armor." Like you would do extra training or reminders or double-checking, right? Where do we put the extra armor? Where the bugs are most likely to show up.

"Nein. We need to put the armor where there are no holes, because those are the ones that don't make it back."

Is that good? It's such an important lesson, colleagues. Put the armor where there are no holes, because those are the ones that don't make it back. Those are the ones where the server will not come back, right?

So then I go out on the Darling Downs, and this is somewhere in Australia, outback. This country is as large as yours, but it's got 24 million people in it, which means it's very empty. It also has nothing to fall out of the sky. I have no idea why he's wearing a hard hat.

His name is Nick, and he's from Nottingham, because Aussies don't want to work in the outback, right? "Nah, mate. No, way too hot, mate." So they get Midlanders who lost their job in the steel industry to work in the outback.

I think he wears a hard hat because the old Norse myth is actually still true. The sky will fall one day. They believe in this. The sky will fall.

Now, there's lots of stickers on his hard hat, all right? And stickers on a hard hat denote status. We all have this. Your domains have this, right? The more stickers, the higher your status, because it allows you to do all kinds of things. I can do first aid. I can do CPR. I can go here, and you can't. Na, na, na, na, na. Right? And so that's what these stickers say.

But then there's a little sticker on the back of his hard hat, which is difficult to read because I was sort of subversively taking this picture, and he doesn't know I took it. So he's got flies on his back, too. But that's not unusual in Australia, by the way.

It says GSD on the back of his hard hat. And so I'm walking to another guy from the Midlands and I say, "So what's this GSD? I can do all the other ones, but what's this GSD thing?"

And he says, "Oh, no. That just means get stuff done." And he didn't say stuff.

And so when I want to understand where the next fatality in that world is going to come from, you think that I'm going to look at the incidents and the errors and the bugs? No. I'm going to look at the place where there's no bugs and holes, because that might be the one that won't make it back.

I want to understand how Nick creates success, because that is where failure is going to hide. Death will hide in his successes. I need to understand how Nick gets stuff done under resource constraints, goal conflicts, limited time, pressures, supervisory pressures, things that he needs to manage and control every day. How does he still get it done? Because something has to give, and what is that?

I want to understand how Nick creates success.

Now, let me take you to a country that people wish was not in Europe. It's afloat on German euros still, right? If there's any Greeks in the room, you are now offended, but it's okay. You're in San Francisco now, so it's all right. But this is Greece.

This is sort of the jet I used to fly, though mine have longer wings, better glider, so we should take out the traffic light, okay? Boom.

Now, this probably violates like 1,500 rules of the European Union to begin with, but it doesn't matter. The runway, this is beautiful, the runway is actually where the bus is, and it sort of goes into the speaker's room there, right? And you know idle thrust blows over a man. Full power or takeoff power probably can topple the bus. But the Greeks: "Mm, no, it doesn't matter." The sun still shines, right?

But the question is, how does this airport look? So let me show you the diagram. What do you think the big black thing down the middle might be? Yeah, really. So the runway. Okay, so there's a runway.

Now that little white bar across the top near B, the Bravo, that's not a pedestrian overpass, okay? That's not a pedestrian overpass. It's actually what we call the displaced threshold. You're not supposed to land before that thing.

Why not? Because there is lots of geology north of the runway. Rock, right? And if you choose between paper, scissors, rock, or in an airplane, aluminum, rock, choose rock, okay? So it's harder. So lots of rock, which you need to overfly in order to make it to the runway safe.

But in order to get you to overfly the rock, they want you to land beyond that white line. Am I making sense? All right, then I'll stop with it.

Well, there's one more technical detail. However, you can take off from the very beginning and blast that bus any way you want. That's fine. That's not an issue.

However, and here's the issue, you want to pull off at Charlie, at that taxiway, because you want to go to the terminal building, because there's always pressure on turnaround times. Airplanes on the ground lose money. Airplanes in the air make money. All right?

And so you want to get them up. But taxi time is the worst thing in the world in order to get airplanes to turn around quickly, and you know this from all airports you fly.

This is the name of the taxiway. So you have to taxi all the way down to the little turning circle at the end of the runway, all the way back up, and then into the apron, into the ramp. Now, that takes five, six, seven minutes right there. If you have to turn around a 737 in 30 minutes, 25 minutes, that's going to hurt so bad.

Okay, so what happens? You design the system like this, you put that pressure on, and you know what behavior you get? This is not Photoshopped.

And the other beautiful thing is the traffic light is not controlled by the tower. It is pure chance. All right? Would that the Greeks were that well coordinated, let me put it that way.

There's one beautifully coordinated system in Greece. It's the family. Philotimo. Where the man thinks he's the head, but the woman is the neck, right? And she gets to turn the head and make sure he looks in the right place.

So I'm not Greek. I married a Dutch girl, so.

But the traffic, so what's really cute is the little warning light on top of the traffic light, right? Oh, no, we're compliant. Yeah, it's broken. But it's a nice gesture.

But just imagine people from San Francisco going to this Greek island. "Oh, lovely Aegean Sea, and oh, honey, it was such a lovely honeymoon." Right? And here's the, "Oh, here's the airport. Oh my God." Right? Because there's an airplane, dunk, leaving some prints on the bus.

So do you think that this will be reported as an incident by the people up front? No. This is normal work. This is GSD. They might have that imprinted on the back of their little caps. "I'm a GSD pilot. I get stuff done."

However, if I want to understand how people are going to die, I am not going to look at the bugs. I'm going to listen to Abe Wald and not look at the holes and the bugs and the little incident reports about this, that, or the other irrelevant thing.

I need to study success. I need to understand how stuff gets done. And that goes for you as well, both in dev and in ops. How does stuff get done despite the constraints? How can this be safe for years? What is it that these people are doing to make it work despite the constraints and obstacles?

That's the question. Understand how success is created, and it will take you to where failure is going to come from.

I hope that makes sense. My name is Sidney Dekker. I really appreciate your attention. Thank you very much. Thank you.