ITIL and DevOps Can Be Friends
ING is a worldwide financial institution, based in the Netherlands. The IT department of the Netherlands manages a mix of off the shelf applications and in house built software.
Traditionally development was governed by CMMi and IT Servicemanagement by ITIL processes. Three years ago the developers started working in Agile/Scrum teams, dropping CMMi. The next step was to involve Operations as well and transform to an DevOps organisation, striving for Continuous Delivery.
In a lot of Agile organisation ITIL is considered the evil soul sucking epiphany of bureaucracy. But is it really?
If we look at the tasks you perform in the ITIL processes Incident management, Problem management and Change management, you will find that a lot of those you still need to perform in an Agile/Scrum way of work. And that there actually is a lot of value in making some rules on how we want to interact in these processes between teams. But we may call the task differently than we were used to in ITIL. And we may choose to use different tools to handle parts of the process. We call this adaptation of ITIL Agile ITSM.
This talk focuses on the adaptations we have made to our ITSM processes to accommodate the requirements of an Agile/Scrum way of work. Proving that there is still value in a lot of the things we used to do in ITIL And that there is no real conflict between Agile and ITIL.
- About Jan-Joost:
Jan-Joost stumbled into IT almost 20 years ago, starting on a temp job in the voice response industry for 5 days that lasted 7 years. He started as a tester but quickly became one of the leading dialogue designers in the Netherlands. He switched to IT Service Management when he joined ING, riding the wave of ITIL implementation to become Change Manager and finally Process Owner Service Management for the IT department of the Retail bank in the Netherlands. In that last capacity he has been one of the DevOps evangelists within ING, running an internal community dedicated to sharing knowledge over hundreds of teams, trying to help his co-workers to make the transition to DevOps and Continuous Delivery and have fun with it at the same time! He tries to facilitate his DevOps colleagues with a more Agile approach to the traditional ITIL Service Management processes, while automating as much as possible.
In his spare time he enjoys traveling the world to watch birds, or cooking, but rarely at the same time.
- About Ingrid:
Ingrid Algra is an effective team leader who has built, guided and developed team members to become high performing teams to achieve objectives ahead of schedule and under budget. She has more than 15 years of experience in IT environments, with focus on maintenance and IT operations, delivery of new functionality, the Dev/Ops way of working, project management, change management, outsourcing, and supplier management. Ingrid is an effective communicator able to develop strong relationships with internal and external stakeholders, building consensus across multiple organizational levels. She is always looking at opportunities to continuously improve business processes with IT solutions.
Chapters
Full transcript
The complete talk, organized by section.
Jan-Joost Bouwman
Welcome, everybody.
In the next 25 minutes, Ingrid and me are going to explain how ITIL and DevOps not only can be friends, but should be friends, and actually are friends. And hopefully by the end of our talk, you'll agree with us.
Oh. It's not working. Clicking. Yeah. Be careful.
No, still not clicking. Oh, it goes on fail real time.
Well, the first slide was our introduction. Hopefully sometime in the nearby future, the clicker will...
Ah, no, that's not it. No.
Am I going the wrong way?
Ingrid Algra
No, you are going the right way. No? I think it was just switching the monitors. Okay.
Jan-Joost Bouwman
I was going to introduce Ingrid. Part of it was already done.
Ingrid is one of the team leaders within ING. We worked together for a long time already. She used to be my team manager, but luckily she got another team, because, well, frankly, I'm unmanageable.
She's one of the few women in IT within ING. She's dedicated to building successful teams and developing her engineers. And in that capacity, I think she's an inspiration for all of us at ING.
Ingrid Algra
Thank you. I will introduce Jan-Joost.
You already heard a lot of him, but I can tell you that whenever you want to know something about changing, change process, you always go to Jan-Joost because he's always able to help you. So whatever problem you have, we will go to Jan-Joost.
And we are working still together already for six years, and we're still getting along, so that's a good sign, I would say.
And the slides are still not working, so now it's getting difficult. Now it's getting awkward, because I want to introduce you: who is ING? Well...
Ah. Yes. Let's go ahead. The green one. Yes. Okay. Well, thank you.
ING is a global financial institution in over 40 countries. We have more than 52,000 employees offering retail and commercial banking services to our customers. We also used to have banking operations in the U.S., but we were forced to sell those to Capital One.
The talk will focus about what we are doing in the Netherlands. We are the largest retail bank in the Netherlands, and we have over 500 DevOps teams in the Netherlands.
I will tell you about ING IT strategy. What you see here is that ING has changed from services to a strategic driver. Our banking is becoming a real IT organization. Well, something maybe you cannot imagine what it is.
And to make that possible, we have four pillars of our strategy.
The first one is providing clear and easy information that's easy to understand in a fast way, with easy access, because that's important for our customers, making it possible to give the customer a fast offering on his loan application and that he actually can understand. Because in banking language, we can write very difficult sentences and make very difficult sentences.
The second one: anytime, anywhere. So whenever, wherever our customers are, whatever device they will use, we have to be able to provide them with the proper financial information which they need.
Empower people. With our customer data, predictive analytics, and the information we have from other sources, we can provide our clients with the information they need to make informed decisions. That's very important for us.
And last but not least, agile development with a short cycle and a feedback loop is a great way to improve our products and the way we make them work very well.
To be able to reach those four pillars of our strategy, we did a transformation within our own organization. Oh, this is nice to see.
In 2011, we started with Scrum, Agile way of working. We used to work CMMI. We got rid of that, and we said we have to go Agile because of faster time to market and improved software quality.
After two years, we added Ops to the DevOps teams because we wanted to get rid of the handover from Dev department to Ops department, because the Agile Scrum was only implemented within the development department and not in the Ops department. So in 2013, we changed that because it was an obstacle of realizing the Agile promise.
We started in one department with 150 teams. And remember, this still does not include the InfraOps teams. Only the application Ops engineers were part in that team. And when you look now at ING Netherlands, as I already told you, we have 500 DevOps teams, so that's important.
And then in May 2015, so this year, we adopted a Spotify model with the squads, chapters, and tribes. And we even go beyond the Spotify model, because within our squad, we have put engineers and business together in a squad. So that's really a thing to do, and we talked about that, how that's going, the transition, how it's working, at Velocity last week. So whenever you want to know how that works, you can look it up.
Jan-Joost Bouwman
So the agile transformation, we started with a period of discovery for the teams. About a year, teams were allowed to find out the tools that worked for them and how to organize themselves. And that was good for building the team confidence.
But it also had a side effect. The DevOps team said, "Well, now we have product backlogs. Why should we do ITIL? We don't want to do the same administration in two tools. We'd rather use our own backlog, and that's it."
On the other hand, risk management says, "No, we still have to do ITIL because that's how we prove to our regulators," remember, we're a bank, regulators are a big thing for banks. "We have to prove to our regulators that we're in control, and we use ITIL for that. So you still have to use ITIL."
Well, that's a bit of a struggle. However, there might be a way to restore the peace in the Netherlands.
As a process owner, I got together with my fellow process owners. We listened to what the DevOps teams said, we listened to what risk management said, and we thought, "How can we bring these two parties together?" And we decided to change the ITIL implementation that we had within ING to match more the speed of the DevOps way of work, the agile way of work.
And the first thing to do is to eliminate duplicate administration, to make the ITIL process as lean as possible. And we call that agile ITSM. I'm not sure that we can claim that we were the first ones to introduce this term, but I'm sticking to it. We were the first to introduce this term.
So in the next few slides, I'm going to walk through the different processes, and I'm going to explain how we adapted our current ITIL processes to match DevOps. And Ingrid's going to talk about how it really works in practice because I'm a process owner, I'm a theorist, she's from the practice.
Incident management. Incident management is all about bandwidth, because solving incidents should not affect the team predictability. It's very important for a team to be predictable in what it is delivering in a sprint.
So how do you solve that if you have incidents coming in? Well, quite easy. Reserve a certain capacity of that sprint for operational tasks. That is solving incidents, but it's also other tasks, like cleaning your log files and your file systems, doing service requests, doing problem analysis, and other stuff.
We recommend to use 30% of your sprint capacity for that. It may be more, it may be less. Usually, it's less. If you have historic data based on your technical debt and the number of incidents you have normally after a release, you could do with less than 30%. We recommend people starting with 30%.
Priority one and two incidents have to be solved immediately, regardless of bandwidth. And if there is an engineer needed for that who is working on a sprint task, he has to stop working on that sprint task and first solve the incident.
Lower priority incidents will go towards the bandwidth. And if you have already used up that 30% of your sprint, then you should talk to your product owner whether you should continue with the sprint tasks or solve that incident.
How does that work in practice?
Ingrid Algra
I will tell you, because the product owner thinks he's in charge of saying what's the most important, but actually the team is. Because when the team really wants to solve some incidents, they will offer something to the product owner to solve those which he cannot refuse. So the team is really in charge of which incidents have to be solved today or tomorrow or maybe next week.
And what we also did is we introduced the engineer of the day. So that's a practical way to solve the bandwidth. And the engineer of the day, he is in charge of monitoring the queues, and he's also in charge to solve the incident and ask whatever employees in his team he needs to be able to solve the problems.
Jan-Joost Bouwman
To help the teams work on incidents, we created a small dashboard. It's really quite simple, and it shows the incidents of all incident queues within ING. There are maybe 500 of them, and every team can put on their own screen and put up their own incidents. But they could also look at the incidents of the neighbors to see if they're doing better or not.
And we tell them you should strive for today in, today out. So an incident that's registered today, you should try to solve today as well.
Why is that? We believe it creates a competitive mindset because teams don't want to be outdone by their neighbors, but it also helps to go home on a clean slate. You go home satisfied.
Problem management, then. A lot of tasks we know from problem management, you could simply put them on the backlog as user stories, no problem. But we made some additional changes.
First, workarounds. Officially, you should put your workarounds in your service management tool. Unfortunately, our tool is a bit ancient, and it's very difficult to get any knowledge out of it when you need it. So we finally said, "Stop doing that. Put it somewhere where you can actually find it, like on a Confluence site."
Secondly, the known error record. The flow that we have in our service management tool requires that you create a known error record. Nobody ever understood why we had to do it, and it didn't add any value because most known error records only had a dot in it because you had to fill in something. So we said, "Let's stop doing that," and we took it out of the flow.
Still, you still have to register a problem in the service management tool, but minimum registration: short description, status.
Why is it important? Because first of all, you have to be able to reassign a problem to another team, because most applications operate in a value chain. And you might think that the problem is in your team, but it might actually be in somebody else's team. And it's not as easy assigning something to somebody else's backlog, because we might have 20 different backlog tools and we only have one service management tool. So it's easy to transfer problems to another team.
Secondly, management reports and dashboards. Not very popular with the engineers, but very popular with management.
And finally, to be able to link incident records to problems so that you can see how often an incident reoccurs. Because naturally, if a problem occurs more often, it should have a higher priority and should be solved sooner.
How does that work?
Ingrid Algra
Well, as you can understand, we are very happy with it.
When you look at problems and to solve that, we have a very long, or we had, I had to say, a very long list for technical debt. Because as you know, in ITIL, it's possible to, I don't know the exact days, but let's say 100 or you have 150 days to solve a problem. But that's not realistic. Within maybe two or three months, you already forgot what was exactly the issue. So we changed that.
What we did is we said, "Whenever there's a problem, we do the statement: this sprint in, next sprint out." So that means that you don't get a list of a lot of problems. You don't know which ones to prioritize first, so you're able to really solve the problems.
Another thing is that it's the same with the "this sprint in, next sprint out." With the incidents, the people are really eager to do their best to solve those problems because they want to get rid of the old problems and have a clear backlog. So that's very important to know, to make a little bit of competition within the teams, and that's nice to do.
Jan-Joost Bouwman
Now, change management. Of course, this is my subject, so I'm particularly proud of what we did in change management.
We can't have an ITIL talk without a flowchart.
In the middle, you see the traditional ITIL change management process. And we took a close look at that, and we decided that actually a lot of the tasks concerning the registration of changes and deciding which changes to develop and the development cycle, that's already handled in a backlog. So let's leave it there.
So what we decided is the change management process for ITIL should only be about risks concerning implementation.
So when you have a finished product from your sprint that you want to put in production, you then put in a change record to ask permission. You check which teams you affect with your change, both by downtime and by functionality, and ask for their permission. And depending on the risk value of that item, based on previous records of that team and also the content of the change, you get permission either by change manager, change advisory board, or senior management.
And your flow for change management is suddenly a lot shorter and a lot easier.
There is, however, before you get permission, you still have to fulfill the generic definition of done, as we call it. That's the basic acceptance criteria that all changes have to adhere to, and that is the basic set that should get you through an audit.
It's not automated yet. We want to automate it in a continuous delivery pipeline.
Ingrid Algra
Yes.
Jan-Joost Bouwman
So for now, it's still a bit of a hassle to fill in the form, but that's just it.
Ingrid Algra
And then after change, always configuration management.
And we will be honest with you, we still have to do some improvements over here because we paid so much attention to the change process. You understand why.
What you see is whenever we made a change, we have to adapt CIs, and some things we still have to do manual. And when we talk about manual work, you know what will happen. Some people forget, and you don't have everything in your system.
So then it's really important, because whenever something goes wrong, an incident, and you want to find the root cause, it's very important that you have it very well registered. So that's one of the things we have to work on.
So maybe possible improvements will be discovered data or automated generation of CIs in the continuous delivery pipeline. But when you talk also about configuration management, it's not only, in the new way of working, the Ops side, but it's also the development side. And we are working on including build with actual configurations of applications at the stack because that's really important, and keeping those in version control, also very important.
Oh, you have to...
Jan-Joost Bouwman
Yep.
Ingrid Algra
Sorry.
So what can Agile Scrum learn from ITIL? I think that's what you all hear about.
What you see is that bringing reliability with every change, that's really, really important. We learned that when it affects the customers when we change something, it's not what our customers like. So I think that this is something ITIL can teach Agile Scrum.
The other way is also the feedback, because the feedback loop from incident and problems provides a lot of extra information for team to use to perform better, don't make the same mistakes again.
Another thing is that within ITIL, we have one tool, one process for all ING, and all the 500 teams are using that. And when you look at the stage we are in Agile transformation, we don't have the same tool for configuration management in all the 500 teams yet. So that we need to improve.
And last, it's about discipline. In ITIL, the roles and tasks are really clear for the team members, and I think that's also very important for the team members, that they know where they are from.
Jan-Joost Bouwman
But ITIL can also learn things from Agile Scrum.
First of all, the need for speed. The short cycle of Agile is something that really helps ITIL because, as Ingrid already said, a lot of people interpret, "I have 10 weeks to solve this problem. Oh, I should start at nine weeks."
And for a problem it's one thing, but if it's an incident, and they have eight hours and they start at seven and a half, it's painful.
Secondly, customer focus. And as you know, we also had customer focus the other way around. Here, it's adding value. And that's something that Ops engineers, the typical ITIL people, really have to learn, that it might actually put more value in the product to bring in new functionality than to solve that priority three incident. And that really is a different mindset.
Also important: work in progress. If you have 100 problems open, where do you start? It's impossible to solve that, and Agile Scrum really helps people limit their work in progress. So that is a good thing.
And finally, the feedback loop. What do customers really want? And that's not only which items in the application they actually use, but also the feedback they give us in, for instance, the App Store. When we have a mobile banking app in the App Store, we get a lot of feedback. People say, "I like this item. I don't like this item. Can you change this? Can you add this?" And it really helps teams to use that input.
So wrap up. I hope you agree with me that there is no real conflict.
ITIL still has value in a DevOps way of work. It does help if you make the ITIL processes as lean as possible to eliminate waste. ITIL and Agile Scrum, DevOps, they can work together. It does require understanding and acceptance of each other's expertise.
As you see in the picture, some of the people holding paddles are not even human. I'm not saying that they're Dev or Ops. I'm just saying they're together in a boat, they have the shared direction, and if you work together and understand each other, you can even make a stone boat like this float on the sea.
Thank you.
Q&A
Jan-Joost Bouwman: We have some time for questions. Yes?
Q: The engineer of the day, what happens when their issue of the day ends up turning into a long weekend effort?
Jan-Joost Bouwman: Then they become engineer of the week or engineer of the quarter and...
Ingrid Algra: Yeah. What we do is we normally team up with two, so then we divide the work. And when they need extra help, you can ask in the team extra help to prevent this. But sometimes it takes more than one day, and because the next day they are not engineer of the day, they then can work on that item.
Jan-Joost Bouwman: More questions? One or two.
Q: Hi. Sorry. Hello. I may have missed this one. So do you get first-line support people using Jira for incident management or something similar?
Jan-Joost Bouwman: Not for incident management, because for incident management, we use service management tooling.
Q: I see. So how does that get into the backlog then, too?
Jan-Joost Bouwman: So if it's something that needs to be solved by a code change, then it gets put in a backlog.
Q: Yeah. Okay.
Jan-Joost Bouwman: If it's just a restart, then it doesn't make sense to put it in the backlog. Other questions?
Q: So between your agile team and your, I would call legacy or ITIL team, what is your timeframe for approvals at the change management level?
Jan-Joost Bouwman: I want to make one thing clear. We only have one type of team.
Q: Ah, okay. If you have one type of team, what is your lead time for change management approval?
Jan-Joost Bouwman: Most teams have now daily CAB meetings, so anything that is decided today, you can put in immediately after the CAB.
Q: So you still have timelines.
Jan-Joost Bouwman: It depends on the risk value of a change. If it's something small, you can decide it on the spot, as long as somebody else in the team approves it, and then you can go ahead.
Q: And those risk factors are based on customer impact or...
Jan-Joost Bouwman: On impact, on the track record of the team, on the lead time of the change, whether it solves incidents or not. In total, there's 14 questions you have to answer, and a number rolls out. We had those 14 questions, or 140 questions, too. And hopefully, in the next tool, we'll be able to automate that.
Q: Yeah. So our problem right now is that we're required to have a 72-hour lead time, and a lot of our other teams that are in the company are not agile. They're not as agile as we are. And so having to go to a board when we're on a monthly sprint cycle and we're within 24 hours of releasing...
Jan-Joost Bouwman: I totally understand you. We had the same period. Now that everybody's doing DevOps, it's a lot easier because we don't have the waiting time anymore.
What I said to the teams that were already doing Agile, if you want to get the other teams on board, make sure you talk to them early on, so you take them along in your process so that they know what's coming. Then it shouldn't be a problem to get approval.
Q: I'd love to talk to you more about that part.
Jan-Joost Bouwman: Okay. Got one more question here.
Q: Do you have any SLAs or OLAs that you report on?
Jan-Joost Bouwman: To be honest, we got rid of those SLAs, because...
Ingrid Algra: The moment we...
Jan-Joost Bouwman: We don't use them, but we do make...
Ingrid Algra: We say, "Okay, what is the GPS?" We have availability reports.
Jan-Joost Bouwman: Yes. But we don't report on SLAs anymore because once the product owner from the business was part of the team, we said, well...
Ingrid Algra: He knows.
Jan-Joost Bouwman: It doesn't make sense to report from the team where the business is already in about how we're doing.
Q: Yeah. How about if you introduce third parties into the team, like contractors? Wouldn't SLAs be effective there?
Ingrid Algra: Yeah, well, we have. So what we see is that we make them part of the DevOps team, and we try to make the same cycle for solving incidents and problems. We do it together with our contractors. And it's not easy. Some contractors take a longer time to convert to Agile and DevOps, but we actually made some...
Jan-Joost Bouwman: Yeah, we had quite a lot of converts.
Ingrid Algra: Yeah. So it takes some time. It's the same within your own teams. It takes time to get them adept, and the contractors, we do the same, and they really like to join the team, so it's working out.
Jan-Joost Bouwman: Okay. Well, thank you very much.
Ingrid Algra: Thank you very much. If you have any questions, feel free to talk to us later on.
Jan-Joost Bouwman: Yeah. Thank you.