Challenges to Implementing Database DevOps

Log in to watch

Las Vegas 2020

Challenges to Implementing Database DevOps

Service Reliability Engineer · Jack Henry

Challenges to Implementing Database DevOpsm -- Getting going in DevOps can be difficult at best of times. Throw in the need to manage data on top of that and the challenges grow. Using a case study, this session will provide you with tips and hints at how best to deal with your own database DevOps implementation. Whether your team is deploying locally to on-premises databases, or implementing cloud-based databases and DevOps technologies, the knowledge you gain from this session will help you deal with data in your DevOps. The challenges that data and databases present to DevOps are real, but they are also surmountable. Find out how by joining this session.

This session is presented by Redgate.

Chapters

Full transcript

The complete talk, organized by section.

Grant Fritchey and Stuart Ainsworth

Grant Fritchey: Hello, and welcome to "Challenges to Implementing Database DevOps." My name is Grant Fritchey. I work for Redgate Software, and this is my contact information. If you want to get in touch with me after today, I'd love to help you out if I could. With me today is Stuart Ainsworth. Stuart, can you introduce yourself, please?

Stuart Ainsworth: Hi, I'm Stuart Ainsworth. I'm a senior manager of service reliability engineering for Gladiator Technology, a ProfitStars solution, a Jack Henry company. Just to give a little bit of background, I'm a former database guy, but I've started doing a lot more with networking and infrastructure and application support in this role. And so I'm excited to talk to you guys about DevOps today.

Grant Fritchey: There we go. All right, so we're all up and running. So Stuart, why don't you tell me about your DevOps implementation? Let's start with that.

Stuart Ainsworth: Sure. So Gladiator Technology, the business unit I work for, we've been around since 2002. We've been around for about 20 years now, getting close to 20 years, and we've got a lot of legacy applications and products. We do firewall monitoring, network IPS monitoring for financial institutions. So we're in the security space, the managed security provider space. We get a lot of data flowing through this application that we've grown organically over the last 18 to 20 years or so.

What we started noticing probably over the last five years is that, as this application has grown, it's gotten a lot harder and more difficult to get new features out the door while still managing some of the technical debt that we had built up inside the application. So we really began to focus on figuring out the deployment pipeline a few years ago. We started looking at tools, et cetera, that could help us with that. And so we began the DevOps journey. I think that a lot of people know that once you start that journey, you never really finish it, right? It's just moving in place.

Where we're at now is really: I tell people that when you're in the DevOps movement, you're always in a point of pain, right? It gets easier in some ways, but the beauty of this whole philosophy, the beauty of this whole movement, is that you start discovering new challenges as you go along. And so it's not for the faint of heart. Where we're at right now is we're probably about 50% of our data footprint being managed in a continuous integration fashion, and there's still some improvements that we're going to make there. And probably the other 50% is not. When I'm talking 50%, we manage about 23 to 24 terabytes of data. It rotates out pretty frequently. That's a 90-day window of data that we're managing right there. It's a lot of logs going through, a lot of turnover, a lot of transient data.

There are several new features that we continue to roll out frequently. We've really tried to accelerate that, particularly over the last 18 months or so, with improvements for our customer bases, et cetera. It's been interesting. Part of the struggle of this is that you want to talk about what you're doing in an exciting and interesting way, and there's so much going on that it's hard at times to say, "These are the one thing that I can pick out and tell you about."

Grant Fritchey: Well, so the topic is challenges, right? What's hard about doing databases? DevOps is, we talk about DevOps, and it can get weird, and people get odd about the whole term and what it means. But fundamentally, it's process in support of people using tools and automation, I think is a decent summary. So why are databases lagging behind? Why is it so hard to implement this in databases?

Stuart Ainsworth: I think the biggest problem is, obviously, the issue of state. When you start talking about applications, applications being stateless in a sense that they are what they are, you can generate the code, it's going to take the moment of now and run with it. In databases, you have history. If you make a change, because your schema is so intertwined with the actual data that it represents, there's a risk associated with making that change. If it breaks something, then you've got to figure out how you migrate from where you were previously to where you're going to go forward with.

It also is, I think, a tool and cultural thing as well. As an industry, database management pulled away from a lot of the other development and administration efforts, because we became so focused on just the concept of data that we consolidated both of those efforts together. What we've seen is, particularly in the development space, the tools and technologies have really accelerated. And we've pretty much stayed focused on writing better SQL, doing query tuning, doing performance management, and understanding the relationships there.

Grant Fritchey: Well, and to be fair, HADR, the whole idea of ensuring that we've got disaster recovery becomes pretty important. Any good paranoid database professional has got a good DR system set up, and they've probably got a good high-availability system set up. And so we've spent a lot of time in there. But now we're finding that we're lagging behind in the whole infrastructure as code and all the rest of the stuff.

Stuart Ainsworth: Right. And I think as far as challenges when it comes to actually spinning this sort of stuff up, it's that mindset of, "I'm here to protect the data and protect the integrity of the data," right? And so we tend not to look outside of what the application is actually doing with the data. I think at times that, particularly in the database space, we get so focused on making sure that we are optimizing that we don't even stop to ask, "Do we really need this?" and "Does the application really need this?" And so it's that stepping back and having that holistic view over the entire application-to-server pipeline that I think database folks like us have struggled to grasp because we're so focused on making sure that what we're doing is being done right. It's important. I don't want to say that those skills aren't important and that focus isn't, but you've got to be able to put it in the context of the bigger picture, that the reason why you are protecting the data and the reason why you're protecting the integrity is because of that whole value stream.

So I think that there's challenges both on the tool side, and there's challenges in dealing with state, and then there's challenges in dealing with culture and mindset and people. Addressing all of those is a struggle. I don't think anybody can take them on all at once. I think that's the other piece of it. When we talk about DevOps, there's so much fantastic literature out there right now, and so many great things going on, that the temptation is to say, "Hey, I'm going to sit down and I'm going to do this, and I'm going to do all of it." And then when you fail, you're like, "Man, this just stinks. I'm going to be done with it. I'm just going to throw it all away and stop what I'm doing." In reality, failure is part of the learning experience, and you have to try. I always talk about success as incremental. Start small and just keep focusing on one thing at a time, and move the ball forward a little bit at a time. If that works, great, and if it doesn't, then you can go back and tweak just that one thing. Don't focus on trying to do the whole big picture at once.

Grant Fritchey: Well, that brings up my question. Obviously you're not going to start with, "Well, we've got all of our SQL Server and Oracle instances inside of containers, and we've got Kubernetes, and now we're able to simultaneously spin these things up and spin them down and deploy from an automated process." Nobody starts there. Whether or not you end there, cool, but you don't start there. Where do you start? Because as you said, the challenges are real. The technology is different. Persistence and state matter. You can't simply throw the database away. So where did you guys start in implementing your process?

Stuart Ainsworth: The first step, again, was we picked a couple of pilot databases that we were interested in, things that we were like, "These are the projects we're going to be accelerating and moving forward with." It was a mixture of both legacy and new greenfield development, like we were spinning up a new application. So we had some new databases that we were going to build from scratch. We got a little bit of exposure in both of those areas and about the challenges of both of them.

The first thing that we did, really, was focus on source control. I think that is so fundamentally important, and that's one of those challenges that I think I've seen a lot broader embrace of in the last probably five years within the database community. But I still think that it's one of those things that you want to get on the checklist: Are you guys doing source control? Are you source controlling your schema? Are you source controlling your lookup tables, those static values that you have to have to progress forward? Is that all part of it? That to us was really the first fundamental step that we had to do.

Grant Fritchey: Were there any unique challenges in just that first step?

Stuart Ainsworth: Yes. The successes we've had really are that we had to get the people most familiar with source control tools, and the source control mentality and model, together with people who have the strengths in performance tuning and writing good T-SQL code. Getting those two perspectives together at a table so that you could build a project inside of Visual Studio, just like any other sort of project that you're going to build an application in, and then having it run well once it landed on a database server, that was probably one of the first big heartaches.

Then you have to talk about things like index maintenance. Indexes: do you consider that part of your operation? Or is that part of your development pattern? Ultimately, for us, it was a lot of juggling back and forth to figure out. One of the problems when you're dealing particularly with databases that scale up significantly more than anything that your development environment can handle, like you have maybe 100,000 rows in your development environment and then you start talking about millions and billions of rows in a production environment, is that your indexing strategy is going to be affected by that. It is very tough to test those kinds of performance differences.

What we started to do is really have to figure out a way to handle getting those index ideas and suggestions that we've optimized for and designed for in production back into the development pipeline. For a long time, it was our DBA would do something, and then he would tell the developers, and they would include it as part of the script so it wouldn't get thrown away and rewritten over and over again. That's an okay model. What we've gotten really good at lately is, we may make the change in production to test, and then we roll it back out immediately, and then hand it over to dev and make it part of an official release pipeline. It's much easier to do it that way for us because it's one way to ensure that any changes that we make in production get communicated back to the dev source control model and get locked into source control.

I think that first mental hurdle DBAs have to get past is that the database is not true. Source control is true. That's the big one. If it's not in source control, it doesn't exist. That's really hard for DBAs because it's like, "No, no, no, it's in the database. It's right there. I can see it." Getting past that and getting people to understand that, okay, yes, we want to make change, then we'll do it.

We had some setbacks early on because we would do things like build indexes, and a DBA would forget to notify dev, and so dev would push out a release, and the index would be gone. All of a sudden, we're troubleshooting the same thing over and over and over again. Really the only way to do that is to just keep pounding on it and banging on it and checking and finding ways to check for differences in who made a change when. We started looking at various auditing tools to see when changes were. That worked okay, particularly for the smaller databases. When you start getting into the larger databases, then auditing itself kind of becomes heavy because we have such a change rate associated with it.

So it really is just that pattern of getting the process over and over again, that whatever change you make has got to be communicated and stuck back into source control, or else it'll go away. I think that's a pretty common experience between people that are on the operations side and on the development side.

The other struggle that we faced is that because our databases are so large and so organically grown over the last 18 years or so, there was a lot of interplay between applications, and applications use multiple databases. Databases themselves would have cross-database queries and would talk to each other. What we would find is that some of the sample databases that we had chosen to do a pilot program ended up saying, "Oh, we're going to keep this in source control." We started using SQL Change Automation to deploy those. We also have some databases that were still being deployed by scripts. What we noticed was that there were conflicts coming out of development. We had a couple instances where they were not having that final merge conversation about, "I've had a developer change this stored procedure, and oh, by the way, it's gone on and evolved, and it's stored in source control." We had a couple instances where things were getting overwritten back and forth, where we'd deploy change one way, and then we'd deploy it another way.

Ultimately, because I'm on the operations side of the house, I finally had to just come back to dev and say, "Look, the only way we're going to make this work is if we rally around one approach." I don't really care which approach you use. I just need it to be consistent. I think they got it. I think there's still some resistance because at times the tooling is still uncomfortable for folks.

But I started talking about that: if we can get source control done, and we can have those merge conversations done, and we can trust the fact that the database is in source control, and then if everything that we do is coming out of source control, and everything that's coming to us is coming in a consistent fashion, then the next step in the pipeline is we can start now looking at how we begin to make this much more push-button, much more automated, and begin to deploy it, not only in the production environment, but also look at what we can do with continuous integration into a QA environment or into another test environment.

Grant Fritchey: Right. So you'd say that was step one, source control. Step two, CI, or continuous integration?

Stuart Ainsworth: For us, yes. And probably step 1A would be using the build process. We've moved into Azure DevOps as our source control workflow management system. That tool, like many Microsoft tools, is getting so big and so broad, and there's so much cool stuff in it, that I don't even really know what to call it anymore. It just does what you need it to do.

We had to get people accustomed to now starting to do builds on a daily basis. You do source control, you start building things on a daily basis. At the end of the day, you check in your code, it automatically builds for you. It passes or fails, just like you do an application. Now, if you do that, you can either get the output in the form of a file that you can pass around and say, "Here's a NuGet package," or you can actually begin to build a feed, and then other tools can consume the feed. So I even get one more step out of the process. I don't have to pass files around. Now all I have to do is point a CI tool like Octopus at it, and it just says, "Oh, there's a new item in the feed. Would you like to build a release?" And yes, we would. Then you can begin to push it.

There is so much power in that because all of these things that I used to spend a day doing: "Okay, I've got to go prep the file. I have to build a test database in production so that I can compare the schemas. I have to begin to copy, deploy, run all these scripts. Sometimes it's 30 scripts, sometimes it's 10 scripts, sometimes it's one or two." And it's all been condensed down. That was probably a day's worth of prep work just to get ready for a deployment, and then you actually push the button and pray.

Now I've got all of that prep work down to where it's like, oh, you've done a daily build. I can go get that version, and I can push a button, and it'll go from the build, and it automatically begins to deploy it. It'll do all the comparisons for me. I can see if it's something that I want to approve. If I want to approve it, boom, I can go, and away we go.

There's so much power in that. Again, it's starting incrementally. Start with source control, get used to the builds, start looking at a CI/CD tool, try to figure out those challenges to get from one to the other. But the important stuff that you should not overlook when you're doing all of this is also the importance of making sure that your backup and recovery strategy is just as fast as your deployment strategy. That is one thing that we've had to really think about how we're going to do, particularly when you start talking about those first early steps. Once you've done that first deployment using NuGet or some other strategy where you're pulling directly from source control, what do you do if that goes wrong? You've got to have some of that old-fashioned method of, "Okay, I've got a side-by-side database installation. I can see where the changes in the schema were. Let's quickly build something and push it forward that way." But you need to stop and think through all of those strategies as well. I think that's another big challenge, just getting started.

Grant Fritchey: Well, yeah. It's easy to say step one, source control, but there's just so much there. Let me ask you another question. Total sidetrack, because we've talked technology for a bit. From a personnel standpoint, Redgate did a survey, and we do a survey every year on the state of database DevOps. The one thing that shocked me was we always talk about culture, and the challenge is culture, the challenge is culture. And the number one challenge as reported to us by the survey was not culture. That was number three. The number one challenge was technology, but because of training, because of turnover, and people who don't know how to script PowerShell or don't know Octopus or don't know AWS Build or whatever tool that we're talking. It's not so much the tools themselves, but the training aspect, the knowledge aspect of it. How did you guys deal with that?

Stuart Ainsworth: I'll admit that it's still a struggle for us a lot. For us, I think when you say turnover, I immediately think in terms of turnover of human resources. But it's the turnover of the technology itself. Everything's accelerating so fast that it's tough to get a grasp around it.

Grant Fritchey: Oh, it's all of the above, because it's not even if people are quitting your company in droves, and hopefully they're not. But you just move from one project to another, and the second project's doing something different. Even subtle differences and it's like, "Oh, well, we're doing exactly the same thing as you do, but we're not using Octopus, we're using Azure," and suddenly it's like I've got to learn new stuff.

Stuart Ainsworth: For us, and we're still pretty heavily siloed in terms of we have a development department, we have a QA department, we have my team, the SREs. For better or for worse, I'm really proud of the work that my team has done. I'm not so sure that other teams in our company do this, but we try to focus pretty heavily on skill exposure and building in slack time for people to play with ideas and play with projects, and try to get their grasp around some of these new technologies, even if they go nowhere. The goal is really, I think, that when you activate your brain learning something new, it translates no matter what the tool is. You've just got to keep that continuous learning model going for yourself personally. In order for your engineers and for your people to keep doing that, you've got to give them time and encouragement to do it, because otherwise, there's always plenty of work to do.

I read a lot of, obviously we're at the DevOps Enterprise Summit, Gene Kim's work and a lot of his stuff. He focuses a lot on that whole concept of improving work, that you need to spend as much time figuring out how you're going to improve your work situation as you do actually doing the work itself. I'm a big believer in that. I will tell you that I have found that people are wired differently in terms of how they embrace that. I have people that are thinkers, and I have people that are doers. I'm not trying to broadly say that that's the only category that you fit in. I just think that we have tendencies to start with one or the other. We're either going to start by analyzing the situation, or you're going to start picking up and start running and doing it.

Grant Fritchey: Yeah. It's let me understand every single thing, and then I'll start coding, or me, which is, I don't understand a darn thing that I'm doing, but I'm doing it.

Stuart Ainsworth: Right. I'm in. Let's make a change. Why isn't this working? Both are fine. What you have to do to get the harmony out of your team is play to the strengths of people, but still encourage them to invest that time and try to think about how you're going to rearrange your workload to make it easier for yourself. Because I have people on my team that are great with checklists. You give them a checklist, and they go boom, boom, boom, boom, boom, and then they're done for the day. I'm fine with that. That's okay. But the problem is, if they're doing the same checklist over and over and over again, then they're not growing as a person. They're not growing as a team member. So I have to say, "Okay, one of your items on your checklist is to take an hour and learn something new. Go code something." Make it an actual checklist item.

Then I have other people that are much better abstract thinkers, and that's what they do out of the box naturally. The little to-do items are hard for them. Those folks I love because that's me. Those are my people. I'm great at sitting down and thinking through all these great ideas. But when it comes to actually getting them done on a day-to-day basis, it's hard for me. I think lazy people are good at that because we're immediately going to figure out, how can I do this once and be done with it? Sometimes that's to our detriment.

But when it comes to learning new tools and learning new technologies and things like that, I think you have to get everybody to the table. You have to make it part of everybody's job to find the time to do it. But you also have to work within the constraints of what makes them happy and productive. If you try to force a person that is geared to doing checklists to say, "Okay, go learn this," and you give them no direction whatsoever, it will fail every time because they have no schema for wrapping that up and trying to figure that out. On the other hand, if you take an abstract thinker and say, "Here, go push this button, push this button, push this button," they're going to hate it and they're not going to learn it. So you have to figure out, particularly in a management role, how you balance that.

I'm very proud of how my team does it. I think on other teams particularly, we've had struggle points because we're so ready to do something new and exciting that it's tough sometimes to get people to drag along and come with us. So we've had to go back and frame the conversation in terms of what does this make easier for you?

Grant Fritchey: Well, okay. So that brings up a great question. Here's this person sitting in the audience right now. They're at DOES, and they're going like, "I'm in." But now they've got to go back to their organization and convince people. We've got about two minutes left. Tell them everything they need to know to convince the org to adopt this.

Stuart Ainsworth: First thing is start with a pilot project. Pick a small database, and you decide which. I really advocate, I really like the two-prong approach of saying, "This is a legacy one, this is a new one." Because there are challenges with both of them, and you're only going to learn them if you do both. You've got to do the brownfield and the greenfield development to do it. But start with a pilot project. Start small. Start learning how to get your code in and out of source control, and show people how to do it. You've got to make it repeatable.

That's the other problem. Just because you can do it, now you've got to go figure out how you're going to go back and tell somebody else about it too. Just because you've mastered the skill doesn't mean that someone else is going to master it the same way you do. We started doing monthly forums where we get together with other teams and show: "These are the pilot projects we're working on. What are you guys working on? Oh, that's interesting. Well, let's go back and bounce those ideas off." Once you get into source control, do the builds, start looking at continuous integration tools, and start talking about how you begin to push the button and make it go.

I will tell you that you do it one time, and that rush of adrenaline is so awesome. You're like, "Man, this is great." Then you automatically start calling up your buddies and going, "Hey, can you come look at this and watch?" You push the button, and it does it again. It's a fantastic feeling, but it does take a lot of energy and a lot of time to get it set up. Start small, focus on learning one step at a time, share as you go, and then it becomes much easier to begin having these conversations with people and say, "Now we need to start investing in tooling to make this happen."

Grant Fritchey: I'll just add to it because I mainly wanted you to talk, but I will add to that topic. The one thing I would say is just measure your pain. I always talk about that. Measure your pain, and what I mean by that is measure the downtime on deployments, measure the amount of time it takes to do a deployment, measure the times that you've horrifically messed up deployments and had to go into recovery. Measure your pain, and then that way you've got a metric to say, "Hey, look, we've reduced our pain."

Stuart Ainsworth: I think that's great because I think a lot of people, we have all of these anecdotal stories about how bad things have been. The one thing that I've learned is that the C-level, the upper management, they don't have time for a story. They want a summary. So finding a metric and finding a way to say, "This is where we were a month ago, this is where we are now," it works on both the good and the bad level because if you show that it's getting worse over time, they're going to be like, "Oh, we need to intervene. What are your plans for intervening?" And that's where you get to tell the story. If you show how things are getting better over time, it's got to be really exciting for them to say, "Wow, that's great." Then you get to tell them the story. But you've got to have those metrics. You've got to have that repeatable measurement of pain, and I think that's a great expression.

Grant Fritchey: Okay, cool. Well, thanks. We got to wrap it up. We are at our time. So thank you, Stuart. It's been great. Stuart Ainsworth. My name's Grant Fritchey. Appreciate everything that you guys are doing. Hope DOES is going well for you, and that's it.

Stuart Ainsworth: Awesome. Thanks, y'all.