A Guiding Map for DevOps

Log in to watch

US 2021

A Guiding Map for DevOps

We're getting good at DevOps, with work on maintaining our code in version control, writing good tests, running intelligent CI pipelines, and being able to deploy regularly meaning that the gulf between Dev and Ops has never been narrower. But is this enough? Does doing this well and building competencies in this area shine a light on other problems, and does doing DevOps well always help the whole business improve? I'll review where we've got to today, how it relates to the rest of the business, and then explore the link between DevOps and Value Streams. Finally, we'll explore some ideas of how it all fits together and what to do next.

This session is presented by Adaptavist.

Chapters

Full transcript

The complete talk, organized by section.

Matt Saunders

Hello there. My name is Matt Saunders. I'm with Adaptavist. Thank you for joining us on this presentation, where I'm going to talk about our guiding map for DevOps. I'm the head of DevOps at Adaptavist, but I'll spare you talking about Adaptavist for now. I'll talk about them at the end.

What are we going to go through in this session? It's a fairly well-established pattern of how to do DevOps well. We've been in the DevOps era for over a decade now, and there's some pretty good information out there, lots of sources of how to do DevOps well. I'm going to go through some of these because that's prerequisite for what I want to talk about, the bread and butter of the presentation a bit later on.

We're going to go through DevOps, what doing DevOps well looks like, and how we measure that, how we measure succeeding at DevOps. Then I'm going to take a slightly different direction and go off and look at how we find the value of what we're contributing when we do DevOps. Then we're going to talk about value streams, Wardley mapping, and mapping the streams. Right at the end, I'll talk about how to put it all together in one big coherent whole, hopefully.

Let's start by going into what do we mean by doing DevOps well? I think it comes down to basically three things here. Firstly, having good source code management and continuous integration. The second thing is having good access to environments. The third thing is around getting changes flowing and getting changes flowing smoothly through your organization. Let's dig into those in a little bit more detail.

Good source code management and continuous integration: this should be a panacea that everyone is aiming for, and we think a lot of organizations have reached this. In my adventures as a consultant with Adaptavist and also working internally within Adaptavist, we like to think we're doing this reasonably well, and we see this done fairly well across many organizations out there.

The three tenets of good SCM and CI: firstly, keeping everything in version control systems, because it's never too soon to have a peer review of your data. We can tie that sort of stuff back to sprint planning. If everything is in the version control system, we can work out exactly what we're going to do up front with our sprint planning, then go and do it and see the results in our SCM.

That lets us then build continuously. We like to do continuous builds in a DevOps world because smaller incremental changes are easier to manage. We're getting away from that waterfall environment where lots and lots of changes would go into a single release, and because nobody knew how those changes would interoperate, when the build inevitably broke, we wouldn't really know what caused it. We can find and fix divergences very early on by doing continuous integration.

The third thing here is about being able to deploy at will, so we can test and check our changes as soon as we possibly can after they've been written, after they've been integrated, so that we can tighten those feedback loops. If we go back to Gene Kim's three ways of DevOps, this is one of the key things here: being able to test these things, being able to get that software out there either in front of users, or in front of a developer who's seeing the results of their work, or a tester. Let's tighten those feedback loops so that we can see if everything went well, and if not, we can adjust and go again.

Access to environments is the second tenet here of good DevOps. We want to allow devs access to environments. By that, what I mean is an environment that looks a bit like, or a lot like, your production environments. Gone are the days where having a production environment alone and a developer's laptop where changes were tested was good enough.

What we want to be doing here, and what we're seeing increasing consumption of, is having pre-built or on-demand environments that mimic production, maybe not quite on the same scale, but are good enough so that a dev can use an environment with everything mocked out for the things that he's not already, or she's not already, using, and that are dedicated to their cause.

We don't want to make our devs queue either. Waiting for environments harms flow. Anti-patterns that we see here are things like a single dev environment with multiple developers waiting in line to test their changes out. Triggering these sort of things from CI environments is now something that people should be doing.

Production-like: it's difficult, but it's necessary to make environments production-like. What I mean by that is that often we find that production environments are built with a combination of automation and some other stuff, some hand-coded stuff over here, some things that were clicked on in an Amazon Web Services console over there. Those sort of things make it difficult to replicate production environments because we want to be doing everything as code. If everything is code, then we can replicate those things.

Other issues we find are with scale. Perhaps our production environment has got 100 million entries in the production database. If we want to be testing our systems out, then we possibly don't need all those 100 million entries. Maybe there's personal information in them as well, so you have to look at data masking so that personal information isn't replicated further than it needs to be. Getting those environments of an appropriate size and scale is really important.

A key thing we want to be getting to with all of these pieces of automation and code is to be able to get our change flow well-regulated and well-controlled. We want to be able to make it easy to get through any gates that the organization has. We see things like change advisory boards mentioned a lot in regards to deploying things to production. But again, it's 2021 now, and we're figuring out ways of allaying people's concerns so that we don't have to have a checkbox or lots of people ticking off on a spreadsheet that a release is good to go.

With automated testing, browser-based testing, for example, for web apps, which we can do in an increasingly sophisticated manner these days, we can test real-world scenarios. You've got a web app: can users still log into it? We can automate those sort of tests, just as an example.

Finally, on this point, decoupling comes in here. We often talk about microservices and monoliths. The key takeaway here is that if you reduce the dependencies that individual components have, and if you make them independently deployable, then you can deploy them more easily. There's less friction, there are fewer things to go wrong, fewer interfaces, fewer moving parts, and that all helps us go faster and more reliably. It's DevOps heaven.

Let's talk about how we measure that. How do we actually measure the success of DevOps? This is relatively straightforward in this day and age. We've had the State of DevOps Report that has come out of DORA for the last six or seven years, probably now. It's a well-established means of finding metrics that we can use to measure your success.

The key thing here now is that this report and the science behind it is now just evolutionary, not revolutionary. The metrics that the State of DevOps Report highlights as being ones that indicate that your organization is high-performing are nicely bedded in now. There's not really any need to change those, even though they're constantly revisited.

Let's just revise what those are, the measures for DevOps success, as defined by DORA and the Google Cloud people. Basically, four. There are four of them, four things to measure.

Deployment frequency, because this shows whether your deployment processes are mature and if the organization is willing enough to push change through. The more frequently you're deploying, the greater the indicator of a successful DevOps-style organization.

Lead time for changes: this is all about seeing how long it takes from a change being decided, that, yeah, we want to do this, a product owner, for example, coming up with some changes that they want to get onto a website or into an application, and the time it takes for those changes to get out there into production. The reason that's a key indicator of DevOps success is because it shows whether teams are able to deploy changes without getting held up and without any of the red tape that we traditionally associate with bureaucratic organizations.

The third thing is the change failure rate. Of all these changes that we're now pushing through the system, how many of them fail? We used to go fast and break things, attributed to Mark Zuckerberg at Facebook. We're maybe no longer quite that cavalier, but we're still going to have failures. If we're not having any failures, then we're probably going too slowly. But the percentage of failures shouldn't be that great. If we're deploying lots and lots and we have the occasional failure, then that's probably all right. This shows up whether there's a maturity in testing ability, because every time we make a change, we should be testing that change. The tests should have good enough coverage to reveal whether or not the change is going to be successful or not, and flag that before it gets anywhere near production.

The final metric of DevOps success is the time to restore services. If something has gone wrong, if our aforementioned change has failed, then if we're able to respond quickly and get that service back online, either by rolling forward to a new software release, by fixing a bug, or rolling back even to a previous iteration, then that gives us an understanding of the organization's ability to swarm on problems and solve them. They're inevitably going to happen. Measuring those, and seeing how well an organization does there, is a key metric.

All those things, again, I'm talking somewhat retrospectively in that a lot of organizations have been there and done that: yeah, we're cool, we're doing this stuff, no problem. Where we see some problems start to creep in is when you scale up a bit. These ideas work well in small teams, perhaps the two-pizza teams that we often talk about when we're looking at team topologies and scaling teams. What is the ideal size of a team? One that can be fed with two pizzas, or name your junk food of choice.

However, modern organizations generally need to scale up. Maybe it's a factor of success. So the two-pizza team thing doesn't quite apply anymore. We end up with teams forming, perhaps DevOps teams, perhaps that's a team that you have within your organization now, which traditionalists would say is an anti-pattern. There ain't no such thing as a DevOps team. It's only a team that actually does DevOps as part of everything else that they do.

The final thing I want to mention here is platform teams. We sometimes find that if you have an organization that has many things to deploy and they all look kind of similar and they all run on similar kit, then there are a lot of reasons to go off and make a platform team that can host all those things for the organization. You get the best common practice, you get economies of scale out of platform teams. But we can find problems starting to creep in.

The rest of my talk is largely aimed at organizations where you've got this sort of organizational complexity and you've got multiple teams trying to do DevOps well. Here are some of the problems that creep back in that we recognize from when we moved from waterfall into Agile and DevOps.

Silos: so, isolated departments or teams fulfilling specific functions. You start to get delays creeping in, often between these new silos. You get unseen constraints. Splitting work across teams can mask unseen constraints. Don't repeat yourself: attempts to share work and code can cause over-generalizations, which we can see in code, and also in processes, tooling, and so forth. Conflicting objectives: teams can be measured on aspects other than delivering quality software quickly. Queues and service desks: well-intentioned processes, streamlined to capture the correct data.

What that adds up to is reduced flow and increased friction. Those things are things that are going to harm our DevOps effort. We're talking about doing DevOps well, but when we scale up, we can find that some of those problems start to creep in. How do we get around this? How do we avoid them?

I want to talk about finding the value. There are a few things that we can consider here. Focus on flow. Flow is one of the key things that we're thinking about when we do DevOps, getting work through from inception, through development, testing, production, and so on. The scope is bigger than you think. If you are in a platform team, it's not necessarily just your team's work that you need to think about. It's the work that you're enabling for other teams as well. And situational awareness: understanding where you are in the organization, where the organization is, and what value you are providing.

The next step is to make a map. Value stream mapping is a good place to start. Value streams have been used for many decades in lean manufacturing and related fields. They're a way of drawing out all the steps that happen between a customer need and the value being delivered. In a DevOps context, that might be a defect report coming in, being triaged, assigned, fixed, reviewed, tested, and deployed.

Here's an example for a defect. We go from on the left, from where a customer opens a defect report, which maybe takes five minutes. On the other side, we can get a fix deployed out to production, which again only probably takes a few minutes. But the timescales are much larger in the middle. Triaging problems, assigning severity, adding into a queue for someone to do some work and to fix it: these things all take time.

Not only do the things themselves take time, but the handoffs between them can take time, and the queues of work can take time. Here, for example, the big takeaway is the value in the yellow oval, which is that there's a week's delay between triage of a problem and a fix being written.

It's an old cliche sometimes that we wonder why on earth it takes so long to fix problems when actually just writing the code to fix a problem only takes five minutes. Or in this example, let's make it into putting in tests, making sure the fix is correct, peer review, all those good things. Maybe that takes two days. But the key thing here is the delay of a week. We can see here that it takes about two weeks to get this thing all the way from start to finish.

Here's another value map. This is from the Lean Enterprise book by Jez Humble et al. I'm not going to go into the whole detail of this, but it shows just how much more detail you can add to a value map in order to find out how long things actually take. You end up focusing in on not necessarily how long it takes developers to write code or how long it takes a product owner to write a spec for something, but the handoffs, the gaps between the silos in your organization. You start to see where these delays are actually coming in by making a map.

What we found is, and here's a quote from Steve Pereira from Visible, that you can get improvements on this just by creating the map. You just write a map out and then straight away, somewhat anecdotally, but also there's some science behind this from Steve, yes, you can get perhaps a 20% improvement just by creating that map and seeing what the consequences are of all those little delays that add up, or the big ones indeed. Once you're aware of them, you can start working on fixing them.

I want to go a little bit deeper into maps for a few minutes by talking about Wardley maps. Wardley maps are a map for business strategy. It was conceived by Simon Wardley back when he ran a photo processing website, an application called Fotango. This is all about situational awareness, and it's about mapping out which things people should be working on, what's commoditized and therefore you shouldn't be working on them, you just buy that in, and working out where we are and where we want to get to.

There's a whole load of science behind it, which is absolutely brilliant, and I recommend you go and look that up if you're at all interested in understanding how your own value streams within a technical organization map out to those of the business.

In our context, in a DevOps context, Wardley maps can help us with direction and trajectory of what we're doing as a supplier. We're supplying DevOps to the rest of the company, hopefully in a fairly integrated fashion. Understanding where we sit in the global business and also considering the functions that we form as a business service is really, really valuable, because you can look at where you've come from, where you want to get to, and what you need to change to get there, whether that's through developing new stuff or going and buying some stuff off the shelf.

A simple example here is: would you design and build a CI system when you can go and buy one in from any number of different vendors? You probably shouldn't, unless you're doing something really, really special with CI, of course. A map, and especially a Wardley map, can help you figure those sort of things out.

There's an example Wardley map from Learn Wardley Mapping. You can also look at some great talks from an event called Map Camp, which is organized by Simon Wardley and his team, runs in London every couple of years. There's a map which shows basically the situation of the business compared to where it wants to get to, what they should be buying in, what they should be developing, and what they need to focus on. It's great stuff.

The results of doing a mapping exercise should let you help work out where you can add measurable value. It's very easy for platform teams, for DevOps teams, to do really cool stuff and innovate with new tools. There's lots of technology and tools out there which we all love to use and which we love to make the best of. But where is it actually really adding value? I'm not saying we're not adding value, but if we can measure that we're adding value to the rest of the business and helping other people get their stuff done, that's surely a better position to be in.

Again, we look at what can we buy in. There are commoditized solutions for many, many of the technical challenges that we solve, or that we need to solve for the rest of our organizations. We should buy them in where it doesn't make any sense to build them. Maybe there is a differentiator where if we're building something because we have some particularly different needs, or perhaps we're building something that we ourselves want to sell on, then maybe that's the case for building. But often we want to buy, and mapping these things out can help us buy these things.

Just to finish off on this slide: what are we building for others to consume? Let's be absolutely clear on what those things are and that they aren't just a technical shopping list.

How are we going to put it all together? Let's join all this sort of stuff up. Here are some takeaways, things that I think we need to be doing that summarize what I've talked about for the last few minutes.

What we're here to do is fundamentally to help people work together. It seems like an obvious thing, but when people's needs and objectives are different, those things can be skewed. Maybe a developer is incentivized by the amount of features that they add to a website or to an application. Maybe some ops people are incentivized by keeping those systems up. We've all heard that classic conundrum of don't change anything because it might go down and affect our numbers, but we have to change something in order to put new features out. Our job in DevOps, or one of them, is to make sure that those sort of frictions don't come to a head and that we can all sing from the same hymn sheet there.

We have to understand where we are. Mapping is key here: situational awareness. Where does our team sit within the business as a whole? Where does it sit within the other similar departments in other organizations? We need to understand what we're doing, why we're doing it, and how that's different to others.

We often see models in play that we adopt from other organizations, and blindly adopting those models, either in Agile development or in DevOps, cloud engineering, et cetera, without our own situational awareness, is probably going to lead us to failure or doing the wrong things.

Once we're aware of where we are, we can then work out where we need to be. Again, sounds obvious. We're here, we want to be here. These are the things we need to get from A to B. Those things are generally going to be things that we can do, things that we can change, technologies we can implement, processes we can change in order to get the most of that flow of the value, because we're focusing in on the value, the value that we're delivering to customers. Those customers could be internal or external. But the broader you can think at a strategic level, the better that's going to be.

Slim down what we're doing. Let's not go and write an alternative to Terraform. Terraform will do the job, it will do it brilliantly. Fantastic. If it doesn't, then maybe we can buy in a product that will do that for us. We're not in the business of writing Terraform plans. We're in the business of delivering value to our customers. Don't work on things that don't add real value or that we can buy in.

External touch points are crucial. Talking to our customers, either the people who pay us money externally or the people who we're kind of, sort of cross-charging technical and process implementations to within the organization, because their needs and making sure that we're still on top of their needs will always set us straight.

Above all, let's not forget the key tenets of DevOps: the feedback loops, great automation, making the best of our people. Those are the things that we need to keep on doing within this wider context of value streams, in order to help our organization succeed.

As I said earlier, I work for Adaptavist. Here's some Adaptavists working happily away, and here's some more of them. There we are, back in the pre-COVID days. I remember them. Hopefully we'll be back to them soon.

We've got a lot of stuff going on around digital transformation. For example, we have a program of helping organizations to develop an Agile organization: what does it take to develop a truly Agile organization? Sorry, we're getting a bit sales pitchy here, but this is the sort of stuff that we do, and we do really, really well. We help organizations become more Agile.

Decision Sprints is something that we work a lot with through our sister company, Brew Digital, who have developed an awesome way of relieving analysis paralysis and bringing teams together so that you can make decisions on things that are maybe dragging on, maybe we don't have consensus on something and haven't had for a long time. Some awesome work we can help you with in Decision Sprints.

Finally, right about now, on the 7th of October, I'm prerecording this talk in September, but I think this talk's going to go out around about the same time as our DevOps value stream management with GitLab webinar, where Jobin from our professional services team is talking about how to do value stream management with GitLab. I recommend you either sign up for that webinar off of our website. If this goes out after the webinar, then you'll be able to see a recording of it. I'm expecting great things from that. It's a great talk and a lot of value to be derived from it.

That's about it, really. My name is Matt Saunders from Adaptavist. As I said, we do all this great stuff. We're also an Atlassian Platinum Partner, so we work heavily with Jira, Confluence, those sort of tools. We also partner with companies like GitLab, Sonatype, et cetera, to help organizations absolutely get the most out of the tools, processes, and people, those three key DevOps tenets. That's it. Thank you for listening. Hope you enjoyed it. Goodbye.