Couples Therapy for DevOps and ITIL

Log in to watch

London 2018

Couples Therapy for DevOps and ITIL

Jayne Groll is co-founder and CEO of the DevOps Institute (DOI). Jayne carries many IT credentials including ITIL Expert™, Certified ScrumMaster, Certified Agile Service Manager, DevOps Foundation and is a Certified Process Design Engineer (CPDE)™.

Her IT management career spans over 25 years of senior IT management roles across a wide range of industries. Jayne is very active in the DevOps, ITSM and Agile communities and is the author of the Agile Service Management Guide. She is a frequent presenter at local, national and virtual events.

Chapters

Full transcript

The complete talk, organized by section.

Jayne Groll

I am Jayne Groll. I'm your therapist for the next 30 minutes, hopefully bringing some kind of therapeutic approach to DevOps and ITIL.

I am CEO, and I'm one of the co-founders of the DevOps Institute. I was also one of the co-founders of ITSM Academy for the last 15 years. I'm not in that role anymore, but I'm an ITIL Expert. I have a lot of experience in the ITIL, IT service management space.

Prior to that, I was an IT director for a lot longer than I want to admit to. Let's just say the screens were black and green when I got started in the Unix world a long time ago. I've been a trainer. I'm a Scrum Master. I've got a bunch of certifications.

I'm most proud of the fact, other than being CEO of the DevOps Institute, I'm the author of the Agile Service Management Guide. So if you're interested in learning anything about agile service management, if you go up to the DevOps Institute website, it's a free download. You're welcome to download the e-book from there.

Let me tell you a little bit about DevOps Institute. We stood up about four years ago when we saw that DevOps was really crossing the chasm into the enterprise stage.

We are a professional association. We accredit certifications around DevOps, starting with the DevOps Foundation. If you come from the ITSM space, you're probably familiar with ITIL Foundation, a similar approach to that. And then unlike the ITIL space, our advanced courses, we call it a DevOps Practitioner Series, is very skills-based. So DevSecOps engineering, DevOps test engineering, DevOps leadership and such. Again, if you're interested in learning more about that, please go to our website.

Our vision is really to facilitate a learning community, and hopefully a little bit of what I bring to you today is learning about the relationship between DevOps and ITIL, and more importantly, how to align those two sets of practices.

I can't say frameworks. There's a lot of debate about what DevOps is and what it isn't. It doesn't matter, right? The reality is ITIL is a framework. DevOps is a set of practices of which IT service management is one. So keep that in mind as we go forward.

So I have to start by reading you a love letter.

Dear ITIL,

Some think that since I've been flirting with this new thing called DevOps, I'm no longer invested in best practices for IT service management.

Those same people think you've outlived your usefulness and you're a throwback to another age. They tell me it's time to move on, commit myself to a younger, more modern approach.

Nothing could be further from the truth.

I still value our relationship and the guidance you give me, but times change, and we must change with them in order to stay relevant.

So let's work this out and create a new unified approach to accelerating the value stream.

Love, IT.

Come on.

The tech world has changed a lot since the last ITIL update. So many of you... Any ITIL Experts in the group? Anyone that's gone through the whole scheme? A few, right?

So in reality, the last major update to ITIL was in 2007. There was a refresh in 2011. Now, that doesn't make it irrelevant, and nothing I'm going to tell you today takes away the relevancy of ITIL.

But the reality is that during the last updates, Waterfall software development was the norm, right? And at the same time, software developers were investing in exploring Agile in the form of Scrum through the Scrum Guide and starting to operate in a much more incremental and iterative way, whereas ITIL at that point was really very much based on governance, risk, and compliance.

So we immediately had two silos that were operating in parallel, but not necessarily at the same speed and not necessarily with the same objectives. Right? Agile software development was incremental and iterative. We were really trying to get product out faster, and ITIL was asking for requests for change two weeks in advance. Right? So inadvertently, we had a conflict right from the very beginning.

And then somewhere around 2012, enter DevOps. Right? Started out in what we call the unicorn space, suddenly started to emerge in the enterprise space, and now we have Agile, IT service management, DevOps, and they're not operating in sync. Right?

But the reality is, are you interested right now in command and control or speed and agility? And the answer is probably a little bit of both, right? Governance, risk, and compliance didn't go away. Right? You're still subject to audits. You still have regulatory compliance, but you have to go faster, and you have to be more agile.

And for those of you in the ITSM space, that's a challenge, right? It's a huge challenge. And so if we start to look at the IT system of systems, right? Isn't that what we are? We are a system of systems that supposedly brings this value stream together.

You can see IT service management very prevalent in all of the stages of the value stream. You can see service strategy inside of agile software development. You can see some of service design inside of agile software development.

In reality, the reason the developers found Agile is that in ITIL, we sent them off the island. How many of you read the service design book? Some of you, right?

In service design, there's a couple of paragraphs, and it says, "Go find the software development lifecycle approach of your choice." And the developers did. Right? They found Agile. So we addressed the design, we didn't address the development.

Now, there is no secret magic one-size-fits-all framework, and ITIL was never meant to be that. But we didn't do this in parallel. So IT became a system of systems, even though the service management processes and stages of what we call the service lifecycle really underpin what was happening there.

You can see continuous delivery, continuous deployment, very much marries to service design and service transition with a healthy dose of automation. Right?

Anyone here an old ITIL Service Manager? Couple of you, right? If you remember when you took your Service Manager exam, if you had a release management question, you almost certainly failed. It was the one process in ITIL that we didn't understand. We didn't understand: was it change management's stepchild? Right? We didn't understand it. It was really hard to explain.

Well, continuous delivery explains it for us. Continuous delivery really focuses on release automation and release management.

Anyone that knew anything about ISO 20000? Most of the processes in ISO 20000 had about a half a page. Release management had 11 pages in the original version of ISO 20000. Tells you something about the release process.

So along comes continuous delivery and continuous deployment, again, very healthy dose of automation, and suddenly release management, service transition, change management are really in the spotlight with a shift-left approach that some of the things in ITIL we might have said should go downstream: build, test, deploy. Now suddenly are shifted left, such as security, such as testing again. Building occurs somewhere between agile software development and continuous delivery.

And then deployment happens, and I'm really happy to tell you that while DevOps has until recently really ignored what happens post-production, suddenly modern operations, suddenly site reliability engineering, Jason Cox called it this morning systems reliability engineering. I love that.

Suddenly reliability, post-production, incident management, error budgets are starting to emerge about site reliability engineering. If you're not familiar with that, I'll tell you a little bit about that in the beginning. But those same processes, incident, problem, service level management, knowledge management, all the service operation, service transition processes are very there. They just don't look the same. All right?

So service management is still pretty relevant. And I think most importantly, there is not one of these systems, one of these frameworks, that does not place an emphasis on continual improvement. But things have changed. The digital age is in front of us, and being able to put a request for change in two weeks in advance is no longer sustainable. Two minutes in advance, in many cases, is not sustainable.

So where's the disconnect? Well, I think the biggest disconnect that we have is that between Agile and ITIL and DevOps, we have completely different directives. We don't share our metrics. We don't share our objectives. We are not all going for the same result.

And that's a shame because we are all IT, right? We are all IT. We succeed together or we fail together.

We definitely have a different taxonomy. I'm going to address that in just a minute. I say one thing, you say another thing, we're not saying the same thing. Visiting different units within IT is like going to different countries in Europe. We don't speak the same language. We don't use the same tools. We really are not aligned in any way, shape, or form.

Certainly, as I said, we have different metrics. For developers, it's what comes out of the sprint. I pass it downstream or I go into continuous integration. For incident management, problem management, it's completely different. So we have not aligned our metrics in any way, shape, or form.

We're using different tools. So the cool thing about DevOps is we've got really all these great open source and enterprise tools, but we haven't aligned them. And it's getting better. We haven't really aligned them with CMDBs or incident management.

ServiceNow, some of those other products are really starting to build APIs into some of the tools within the continuous delivery pipeline. But truthfully, until recently, that wasn't the case.

I say ticket, you say Jira. I say ticket, you say ServiceNow, Remedy, pick any of the above, right? So we're not even using the same term and saying the same thing or putting a ticket in the same tool or somehow being able to populate one tool with the other's data. So we operated at a different place.

At the end of the day, we operate at different speeds, and so naturally there's going to be conflict. So like any good relationship, when you have people that have grown apart, that have different friends, that have a different language, naturally you're not going to be aligned.

Now, in the early days of DevOps, there were a lot of discussions, debates, articles about whether ITIL was dead. It's not dead. And it doesn't matter if it's ITIL. It's IT service management. Call it what you want. Read the books from Axelos. Don't read the books from Axelos. Look at the new things, VeriSM. Look at site reliability engineering. Services still need to be managed.

And for everyone in this room that has some relationship with DevOps, obviously you do because you're here at the DevOps Enterprise Summit, we need to be able to ensure that service management lives. Long live service management.

So we can work IT out. Like all relationships, when you look at the root cause of problems, it almost always comes down to communication.

So I say CI, you say? Somebody say it. I say CI, you say? Configuration item. Configuration item, right? Is that what you said? Oh, you said continuous integration. Okay.

I say CI. Where's my ITIL people? I just gave it to you. You say? Configuration item. Configuration item.

So we need some CIs. What do we do now, right?

I say configuration management, you say, if you're DevOps: Puppet, Chef, Ansible, any of the other tools that are configuration pre-production, configuration management. If you're ITIL, you say CMDB.

If I say service level management, if you're an SRE... How many of you have heard of site reliability engineering or SRE? A few. A few of you. Okay, I get the pleasure of telling you more about it today. That's great. They very much focus on service level objectives. We want a signature and get a signed SLA. We want to make sure that we have SLAs in place in the ITSM space and so on and so forth.

I mentioned Jira versus incident records. They say mean time to repair, we say mean time to restore service. Is it the same thing? It's close.

They groan when we say change management. We say request for change or the CAB. So you can see there's a communication problem here. What we have here is a failure to communicate.

So the first thing is we have to learn to speak the same language, and the truth is, it doesn't matter what you choose, as long as everybody in your organization is saying the same thing and meaning the same thing. You want to call a configuration item something else? Fantastic. You want to call continuous integration something else? Fantastic.

But you have to work together to agree on a common taxonomy so that you are speaking and meaning the same thing, and that you're starting to align that as well. So first step is to institutionalize a common taxonomy. Create your vocabulary. Whatever that vocabulary is, don't be loyal.

How many of you have ever watched Game of Thrones? A few Game of Thrones people. It's like the Seven Kingdoms where everybody is really trying to compete for that Iron Throne. Step away from that. It's really not a necessary thing within your organization.

The next step in healing your relationship is to cut down on over-processing. It's kind of like overeating. So when we start to look at how we originally built a lot of the processes, we built them with flowcharts that were if/thens. If this, then do that, and then do that, and we tried to really kind of make it a one-size-fits-all. We'll talk about that in a second.

But in many ways, because we were aligned with Waterfall, we also built a very waterfall process, and they were pretty complex. So the next step has to be is you have to go on a process diet. You really need to start to look at how much is just enough process.

Does every change really need to submit a request for change two weeks in advance? Does every incident need to do that?

I mentioned agile service management. One of the key concepts within agile service management is identifying how much is just enough. If you know anything about Scrum, that is one of the parts of the definition of Scrum, is to identify just enough rules, just enough structure. Same thing with your process.

Microservices need microprocesses, and so that's a very key aspect of being able to scale down. Go on the diet together. Work together to identify how much is just enough process. And again, that's going to depend on your vertical market, it's going to depend on your regulatory requirements, but if you work on that together, there's really healthy discussions. You look at your value stream, there's some ways to be able to pull that apart.

Stop taking a one-size-fits-all strategy. One of the key pieces of guidance inside of ITIL talks about models. Anybody familiar with models? Anybody have models in place? Not too many.

So going way back to version two, ITIL suggested that you build models for different types of incidents or changes or requests, problems. And the model would be based on the level of rigor that was associated with that particular event, whether it's an incident, problem, change, or request.

So you can build a model, we'll look at this in a second, but you can build a model for these changes, releases, incident problems, that talk about how much resource, how much automation, how much rigor, how much review is associated with each of these individual events so that you understand how to respond and also how to go faster.

When we start to look at change management, we have to start looking at making change management a much more agile process. We really have to start looking at agility in change management.

Now, I want to take a few extra minutes to talk about this. So we have a pretty healthy ITSM representation in here. How many of you really struggle with change management in a DevOps environment?

It's the number one question I get asked: how do I make change management work? I get asked by the developers, I get asked by the DevOps space, how do we overcome the constraints of change management and still be able to bring DevOps into our environment? How do we convince that we don't necessarily need a CAB for everything?

How many of you put, again, what we say in this room stays in this room. How many of you put every request for change through your CAB?

Thank you for being honest, because they all do too. Right?

The CAB should certainly review enterprise impact. Anything that's a big, ugly, high-risk, high-impact, going to affect a lot of people, affect a critical business application, absolutely.

But with the movement towards microservices, now changes are much more flexible because we're making changes at a different level. Peer-to-peer decision authorities, very common in true DevOps environments. Being able to have local CABs. Being able to automate the approval process.

From a governance, risk, and compliance perspective, you actually may get better governance through the automation, because the automation is going to be consistent if you've authored it well. So the evidence of compliance may come out of your tool, as opposed to out of a manual process from an individual.

There's no decision-making in most of the tools, unless you've built some kind of machine learning or artificial intelligence into it. You really need to look at making change management more agile, more standard changes.

Everybody know what a standard change is? Right. Standard change, low risk, low record. You don't need to put a request for change in. You do need to have a change record.

You know the difference between the two, right? Request for change is an intention. I intend to make that change. Change record shows activity. So one records the activity for that change, the other actually records the intent to make the change.

Request for change could be automated, and it could be automated about 30 seconds before the change is made. We need to understand the activity of the change. We need to be able to answer the question, what changed, for a variety of different reasons. We don't necessarily always need to request it. Really big difference.

And if you look at some of the very innovative environments that have introduced change management into their organization successfully and married it to DevOps, they may use code as their change record. They may not have actually filled out a traditional change record in a product like ServiceNow or Remedy or any of the other tools that you're using.

They may actually just attach the code with the comments so that we know exactly what's changed. They may invite production people to go to GitHub and look at the source repository and be a participant in that. There's a lot of innovative ways you could do that.

So definitely more standard changes. Low risk means you understand it. It's not necessarily break/fix. Low risk means it's gone through due diligence, it has remediation, the developers have a track record. There's a lot of definitions for low risk.

Definitely want to look at, as I said, alternative change records. Maybe you can auto-populate, say, your change database with data that's coming from Jira or Puppet or Chef or any of the other tools.

You want to have broader decision authorities. That's what I said before. Peer-to-peer is okay. In the old days, we wouldn't have liked that because we would have said that the auditors wouldn't have liked that our peers are reviewing our code. Today, we actually support that because our metrics, our accountabilities are very, very different.

And then the level of rigor really has to be based on the level of risk. And big, ugly, complex changes may have less risk than little changes, and little changes may have more risk than the big complicated ones, too.

How many of you are moving to microservices? More and more of you. Microservices need microprocesses, which means you need more of these models, more of these standard changes in place. You can't treat a microservice with a monolithic process. That's a very, very big opportunity for disconnect.

So like all good relationships, you have to share your toys. And so again, a lot of this was done in parallel. A lot of this was done without collaboration.

Do you know the difference between communication and collaboration? Somebody tell me, what's the difference between communication and collaboration?

In communication, I what? Sorry, sir? Yeah. I tell Barry here, I say, "Barry, blah, blah, blah." And then Barry communicates back to me, and he says, "Blah, blah, blah." And so we tell each other things.

What's the difference with collaboration? We discuss, right? And there's a really important difference. Anybody know? Share knowledge. We share. I ask his opinion. I actually ask him what he thinks.

And whether it's about sharing our toys or about our taxonomy or creating models, I actually say, "Hey, Barry, what do you think about that? Bring me your insight. Bring me your experience. Bring me..."

So when we in DevOps, when we talk about communication and collaboration, don't forget that sharing your toys isn't just, "Hey, I'll give you a user ID into, pick any of these. You can go into GitHub now." That's not collaboration. It's how do we make this work for everybody involved across the entire value stream?

So share your toys. Now, if you look at some of these toys, again, good representation. It is not a full representation. If you go out to the exhibit floor, certainly you're going to see a lot more. But there is really great opportunity for each side of the equation, each side of the relationship to learn something, and to be able to optimize some of the capabilities or features of a product that maybe you were unaware of, because they don't all do the same thing. So definitely focus on sharing your toys.

And then I think one of the big lacks is when we look at the continuous delivery pipeline, and we look at all the APIs that have been built. I've seen organizations where they have 90 APIs in their pipeline. Maybe you do. Of the 90 APIs, how many do you think are really into the ITSM tools? Not a whole lot.

So we have to make an intentional effort to either make it part of the pipeline or auto-populate some of the tools so we don't have redundant work. It's in a configuration product prior to production, and then we port it into a CMDB or a CMS afterwards. And so which one's right?

In ITIL, you've probably heard this, you know the old Danish clock? If I have one watch, I know what time it is. If I have two, I'm not so sure.

So this is exactly what we're talking about. If you have two tools with the same amount of data, which one's right? Aligning your tools is very important.

Learn something new together. So this is an opportunity for me to take the next few minutes and tell you a little bit about site reliability engineering. And if you've not heard about it today, remember today, because it'll be the first time you've heard about it, but certainly not the last.

So several people from Google wrote a book called Site Reliability Engineering, and it's about practices they use to keep their environment stable and reliable. And it launched a role, first and foremost, but also a series of practices.

So this is a very hireable role. So if you are interested in operations as an engineering practice, you can get the book for free online. You can buy it from Amazon for, I don't know how much money, but it's not really very expensive.

But it is a set of practices that this role, the site reliability engineer... I'm going to say that word "role" a lot, because in ITIL and IT service management, we really didn't define a lot of roles.

This is a very desirable role for many organizations, and many organizations are starting SRE practices. Again, if you heard Jason from Disney, they talked about that as well.

So the SRE has some skills that perhaps a systems administrator or post-production engineer may not have had. So certainly coding. Most can code in Python. Most have an engineering approach. So we're engineering operations, so there is very much an automation.

But the coolest part about site reliability engineering is this role takes on a lot of the responsibility for the ITSM processes. This role is responsible for service-level objectives, and it's how the role is measured. Oh, thank you. Meeting service-level objectives.

It is responsible for managing incidents, for problems, for working with change management. It is a very interesting role. It's given error budgets, and the goal is to reduce toil, the amount of manual work that's required to achieve reliability and stability.

But the coolest part of this role is this role is encouraged to take 50% of the time being reactive and 50% of the time being proactive, looking at the future in terms of how do we make operations better?

Some environments that have moved to a squad model have embedded this role inside of the squad, so squads will have an SRE. Some organizations have a one to whatever ratio of SREs to teams or squads.

Could this be the bridge? Could this role, could this set of practices be the next evolution of IT service management?

Now, I'll give you a little caution about this book. So the interesting thing about this book, in today's day, the average book is what, about 40 pages? They had a lot to say: 525 pages. So it's a pretty big book.

But more importantly, there are derivatives that are coming out about SRE, because you may not be Google. So your reliability needs may not be the same as Google's reliability needs. And so there's a couple of good books coming out.

Pay attention to this if you are anywhere near a post-production responsibility. Aside from the fact that it's a role, organizations are absolutely starting to look at SRE and DevOps as that bridge between post-production, pre-production, I feel like I went off, and IT service management.

So while it doesn't necessarily call out ITIL specifically, this is a really incredible opportunity to take service management to the next level.

So we're all part of the same ecosystem. People, process, and technology has not gone away. It's been a mantra within IT for a very long time. Whether you are invested in Agile or IT service management, ITIL or DevOps, it's more important than ever because the common thread is people. The common thread is organizational change management. The common thread is cultural transformation.

And until people start to think and act a different way, they're not necessarily going to be able to get as far as they thought they could.

Relationship goals. So as a good therapist, I'm going to give you some homework to work on with the other side of your relationship.

Rely on ITIL for your underpinning processes. They're good processes. They need to be scaled back in many situations, but they're really good processes, and we need process in order to go faster.

And ITIL should rely on DevOps for speed and agility, so that DevOps gives us ideas about how we can make these processes more agile, how we can meet the speed and agility needs.

But we also need assimilation and automation, and we have to stop being loyal to a particular framework. Whether it's Agile or ITIL or Scrum or DevOps or Lean or anything else, we have to stop being loyal to a particular one, and we have to realize that we are one IT.

So on that note, my friend in the back is going to give us some closing music, right? Come on. Dramatic pause.

Thank you.