Data and DevOps: Breaking Down the Silos

Log in to watch

San Francisco 2017

Data and DevOps: Breaking Down the Silos

DevOps brought developers and ops together. In too many organizations, data specialists are still in a silo. This talk examines the pitfalls of not integrating apps, ops, and data all together, and suggests ways of applying core DevOps principles to encompass data.

Chapters

Full transcript

The complete talk, organized by section.

Elisabeth Hendrickson

In 2001, I co-authored a paper with Cem Kaner and Jennifer Smith-Brock about the ratio of testers to developers, and that was the paper that he found. That came out of the time period when I had the worst day ever for my professional career. It still has not been topped.

That day was the day that I realized that the better I got at my job as a director of quality engineering, the worse I served my company.

First of all, that is a counterintuitive realization. But I realized the stronger that our independent quality engineering team got, the stronger our testers got, the more the rest of the organization abrogated their responsibility for quality. And therefore, we were testing more and releasing worse stuff. And I published that as “Better Testing, Worse Quality.”

So, that's not what this talk is about, though.

This talk is about data, and I want to start with a story from my consulting days.

For a number of years, I ran my own small consulting company, and my job was basically to get on airplanes and go help organizations with their agile transition. And in doing so, I was incorporating a lot of what I learned about how to build quality in and helping a lot with, how do we integrate QA in? So, that was a big part of my job. But in general, I was there to help with their agile transitions.

And one of the things that I did as a consultant was to run these agile transformation simulation workshops, where I would get a bunch of people in a room, and we would have the testers are over here, and the developers are over here, and the product managers are over here. And when I was feeling really evil, I would put them in separate rooms.

It was always hilarious when they would really hole up in their separate rooms because they loved that. But in any case, this was part of helping them with their agile transformation.

There was one customer that I got to work with who I just loved working with because they were small enough that I was able to work with every single one of their development teams, but large enough that they had several of them. And so, I got to really get to know the organization as a whole, and they had some interesting and complex problems, and it was just such a joy to work with them on those problems.

Now, about midway through the engagement, I was about to run this agile transformation simulation workshop thing for one of the groups. And my executive sponsor, the one who had brought me in as a consultant, warned me, “You're going to have a rough time with this group in particular.”

And I said, “Oh, okay. Anything I should know up front?”

And he said, “No, just, I'm sorry.”

And I said, “Okay.”

And I ran into somebody else in the hall who had already gone through the workshop and who we had kind of gotten to know each other. And he said, “Oh, I hear you're working with that group today. Yeah. Good luck with that.”

So, I knew walking into this workshop on that day that this was going to be a really challenging one. And now, I've faced massive challenges with this workshop. It's organized in, we do four iterations. Iteration one, they're all separate, and nobody ever ships. I make them ship software, and then I pay them for features, and I play the role of the customer. It's a fairly typical kind of simulation that you would do for an agile transformation thing.

And then in subsequent rounds, they own their process, and they're responsible for optimizing for the outcomes that they want. And in owning their process, they inevitably end up reinventing various practices from agile. And so, it's a great way to help them learn because they've just basically reinvented agile. So, this is kind of fun.

But I've had cases where it went off the rails, man. I've had cases where the group didn't ship or make any revenue up until minute 16 of the 15-minute round for round four. And then I just basically kind of cheated because if they're so just hung up on failure, they won't learn anything. And so, I give them a little taste of success, and then we can retro on what they could have thought about differently.

By the way, the most fun one of those was with a group of agile coaches who almost didn't ship.

Okay, but I digress.

So anyway, back to this group. I'm expecting one of those. I'm expecting one of those days where I'm running around as a facilitator and thinking, “How do I do a small, invisible intervention to make this not suck?” So, that's what I am prepping for.

And instead, round one, I set them up in their separate spaces, and I wasn't evil to them because I thought they were already going to be challenged. And round one, the same things happen in round one that always happen in round one, because when you constrain communication and intentionally create silos, people can't get anything done.

So fine, we got through round one, and they were clearly chafing at the rules in round one. And I said, “Great, now you own your process. What do you want to do differently next time?”

And they had the shortest retro I had seen to date, and then they were ready to go.

And we ran round two, and they made money in round two.

Something is fishy. What is going on?

So, we get through the entire day, and they make more money, fake money, than anybody else who had gone through the simulation up until that point. I have run the simulation 150 times at this point. They made more money than any other group had ever made.

So, after we finished all of the workshop-y stuff, I took the leader of the group aside and I said, “Look, I have a question. It's a nosy question, and I'm sorry to ask it this way, but I was warned coming in.”

And he said, “Yeah, I bet you were.”

I said, “I was told that you might be difficult to work with?”

And he said, “Yeah. You were also probably told that we're really slow and we can't ship features to save our lives, right?”

“Maybe.”

He said, “Yeah. Let me tell you what we do. We're the reporting group. We're at the tail end, and the business comes to us and they say to us, ‘We need this feature,’ and it's based on data that was never collected. And it's really hard to build a feature in a report based on data that you don't have.”

Yeah.

So anyway, that was a great engagement.

Let's flash forward, fast-forward.

For the last five years, I have been an employee at Pivotal, which I considered coming home, because I learned how to do agile at Pivotal Labs back in the day. For the first two years of that, I was on the Cloud Foundry side of our universe, and I got really immersed in DevOps, and developers loving operators. It's been great.

And of course, then we expand that to the balanced team to include design and security and QA and test and the business, and it's just been one great big love fest. It's been fabulous.

Now, that was my first two years, and then I moved over... Pivotal has data products as well, and I moved over to the data side of our universe, where I'm now responsible for GemFire, an in-memory data grid, and for Greenplum, a massively parallel data warehouse. So I was on the data side of things.

And after a few months, I had the flashback to that day running my workshop simulation for that set of people who were in the reporting group. Because the more I talked to our customers, our field, our salespeople, the more I realized that data is still in a silo.

That we've done the great big love DevOps, DevSecOps, Dev design SecOps. The heart has a lot of things around it right now. And in some companies, it is true that data is now part of that, but not always.

In fact, it's probably not just one silo. In a lot of organizations, it's three silos. You've got your DBAs who are responsible for keeping the data fabric up and happy and running and not falling over itself. And then you've got the data engineers that are probably responsible for the ETL pipelines or keeping Kafka up or whatever it is that they're doing in order to get data streaming through the system in various ways. And then you have the data scientists.

And I was hearing stories like the data scientists were having trouble. They had to file a ticket. Does this sound familiar? They had to file a ticket to get access to the data that they were supposed to be exploring.

And I flashed back to that reporting group telling me it's really hard to write a report on data that was never collected.

This kills me because I know that we can do better. Because what we're seeing is, if this is a system, this is the application, the software professional side of it, the application developers, this is what they see, and this is the view that the data people see. So they're seeing different parts of the whole.

So you can DevOps all you want on the application side, on the stateless side, on your 12-factor apps, but the data side, you're going to have a point of friction if you're not allowed to change your schemas, for example.

And part of the reason this kills me is that data has never been more important. In the age of Internet of Things, AI, ML blowing up all over the place, deep learning. In the age when massive amounts of our purchasing decisions, massive amounts of what drive business are driven through advanced analytics, through recommendation engines, through being able to tell that your Uber is going to arrive in two minutes.

So much of our world is now controlled by getting the value out of the data that we have, not just to make business decisions, but of course, that's also important, but even in the personalization of our experiences.

So from a business point of view, it's never been more important for us to have good practices around our data. And by good practices, I mean the DevOps practices.

But let's be honest, DevOps and data, it's hard.

One of the brightest people I get to work with, who I really enjoy working with, and who, before he joined Pivotal, had an amazing track record of background in DevOps, I talked to him about the data problem. I'm like, “Yeah, I want to talk to you about data. I want to pick your brain about it.”

And here's his reaction. He literally backed up and held up his hands and he said, “Look, the DevOps stuff I do is hard enough. I believe data is an order of magnitude harder. I don't want to go there.”

Really, man? Really?

But let's be honest, it is hard. First of all, from a risk standpoint, if you lose your data, that's a really, really bad, bad, bad thing. If you have a data breach, just ask Equifax, right? Poor Equifax.

Okay.

So let's talk about some of the reasons why it's hard besides the risk. But note that the risk is what makes us all twitch. Everybody's got a twitch, right? The thing that you do unconsciously in response to a stimulus. The thing that you're going to twitch towards when you have risk is typically control. Try to control the risk, control the change.

So what do we do with database schemas? We have a committee of people who come together to decide whether or not a given DDL change is okay. And why do we do that? Probably because it is the hidden, undocumented API between a bunch of systems that are all working against that same database.

Governance makes it hard. And I don't just mean governance in terms of not leaking your data, but things like HIPAA and GDPR all control what you're allowed to do with the data.

Even just backstage, I was overhearing, I hope they don't mind that I say this, but I was overhearing someone say, “Yeah, if we have anybody in Europe, then we have to keep the data in Europe,” et cetera. So even just the locality of data becomes a super complicated thing that you have to manage from a governance standpoint.

So this makes things hard.

Tuning. We ship an enterprise-class, massively parallel data warehouse. It requires some tuning to get it just right for your workload. Like hours and hours and hours and hours to run experiments and figure out which of the various parameter settings you want to set which way to optimize the performance of this thing specifically for your specific workload.

So once you get that right, you want to treat that thing like a pet. By the way, that is my dog. Cute, right?

So you're going to love it, and pet it, and hug it, and you are definitely not going to treat it like cattle, right?

And for that matter, even just alerting. It is super easy to alert based on the existence or non-existence of a given running process. Try detecting that the cache isn't being refreshed, or that you've got a split-brain condition in your cluster. That is much harder to alert on.

So these are some of the reasons why data is hard. It also is hard to move things around. Data is sticky.

So data is hard, but at the same time, I look to DevOps and Agile and cloud architectures, and I say we actually have some solutions. So they are hard, but configuration is code. We all know this. We're at a DevOps conference.

So all of those tuning parameters, if we're versioning them and automatically applying them, so as opposed to I go tweak a thing. But instead, I check in the tweak to the thing, and then there's something that automatically reconfigures my cluster, and then I run my performance experiment.

If we have the discipline to do that, then we will be able to always recreate that configuration for that cluster. So configuration still is just code, even if it's massively complicated configuration that took weeks to figure out.

And for that matter, that whole API problem with the schemas, cloud-native architectures help us solve that problem. So if you truly adopt a microservices-based architecture, you now have an API in front of that schema, and you no longer have to worry about change. You can change the schema, so it gives you a level of flexibility.

These things are still hard, though. I want to acknowledge that what I'm telling you isn't either obvious, like from first principles, or easy. It's still hard. But it doesn't mean it's not doable.

You can build data-specific monitoring and alerting. So you can build something that can detect a split-brain condition. It's not necessarily going to be easy. You can build things that can detect when your cache is not being updated. So that's certainly possible.

And for that matter, you can automate a lot of things. Now, there are some things you probably might not want to automate, like recovery from that split brain without a human involved. But you can at least alert on it, and you can totally automate all of the routine stuff, the routine maintenance.

If you're running Postgres, you can automate those vacuums. You can automate the backup. You probably don't want to automate the restore. You can automate any kind of routine user provisioning so that you can get to that self-service world where that data scientist who filed the ticket in the past and would wait six weeks to get access to that data warehouse instead just does whatever the thing is that gets them self-service to that dataset that has been scrubbed, cleaned, and now made available on the data platform.

This is all certainly possible.

Now, if it's possible, I have to wonder why more people aren't doing it. Now, granted, there are people doing it. How many of you in your organizations are doing all of those things? Data is part of your DevOps.

One. Hi, we should talk.

Okay. So maybe there were two, and the lights.

So I think the reason that we're not doing it is because of the way we're drawing lines. Now, when you think about it, one of the hardest problems in our field is how do we draw lines?

If you're a developer, is that one class or two classes? Should that be one big method, or should you refactor out some private methods as helpers? That's an example of drawing lines. It's hard.

If you're looking at your team structure, should this be one team or two teams? What should those teams be responsible for? That's drawing lines. That's hard.

If you're looking at an MVP, where do we draw the line?

Now, historically, I have sat in enough rooms where we've said, “You know, that super awesome personalization feature or super awesome data analysis feature, recommender, whatever, the thing that would've prompted us to bring the data scientists in, let's just call that phase two. We'll cut the MVP here. That reporting feature, we'll just cut the MVP here. We'll worry about that later.”

But that's exactly what happened to the reporting group at that company that I talked about. They weren't ever brought in at the beginning to say, “Oh, if you want a report on this thing, let's build a walking skeleton.” And in building that walking skeleton, we know that the application is going to have to be collecting this data in order to be able to write this report.

So when we look at drawing the lines of what's in and out of our MVPs, let's think about bringing some of those data features in so that we can build the data part incrementally.

Whoops, I just hit the wrong button. I hope nothing blew up. There we go.

One of the things I've noticed about silos is that they create a “for” relationship, silos with dependencies. So if I'm responsible for a thing over here, and you over here are dependent on me, and we're in silos, then we have a “for” relationship. I do things for you.

And the problem with “for” relationships is that I am doing my interpretation of what I think that you should want. And in the most perverse of those cases, I am doing what I think you should have asked for instead of what you actually asked for.

So part of what we have to get to is the “with” relationship, which says even if we roll up to different managers, even if we have different responsibilities, I'm going to do this with you. I'm going to be part of that team that works with you.

Which is why my ask, the thing that I would like from all of you, is to bring data in. To bring data into the MVPs. To bring your data scientists, your data engineers, your DBAs, any other data specialists you have, bring them into your balanced teams. Because I'm pretty sure some amazing things will result.

That's my ask.

Thanks.