Log in to watch

Log in or create a free account to watch this video.

Log in
US 2021
Share
Download slides

DevOps SRE or ITIL – Know Before You Leap!

In an era of Continuous Integration, Continuous Delivery and Automation, implementation of a solid IT Service Management strategy is important for organizations to succeed at Digital Transformation. There are several IT Service Management Frameworks available today and the possibilities and processes stemming from each framework is often overwhelming. While all the Service Management methodologies are closely connected, we will discuss about the DevOps, SRE and the latest ITIL4.0 service management framework in how they compare with each other. What are the vision and values governing the frameworks and the guidance each provides when embarking onto this journey.


DevOps is an umbrella concept that advocates a collaborative working relationship between Development and Operations. It aims to achieve an adequate velocity of software and services for the line of business (i.e. high deploy rates) while simultaneously increasing the reliability, stability, resilience and security of the production environment.


SRE or Site Reliability Engineering is Google’s approach to service management and emphasizes the development of systems and software that increases the reliability and performance of applications and services.


ITIL4 is the latest evolution of the well-known service management framework from Axelos. With the introduction of the new service value system to the core guiding principles of ITIL, it emphasizes service quality and consistency and aims for improved stakeholder satisfaction through ensuring value from the perspective of the stakeholders.


We will discuss on How can an organization decide which service management methodology to adopt to best enable them to deliver business value and to ensure a successful transformation powered with operational excellence.


All three methodologies can coexist together, however, adoption of DevOps or SRE or ITIL is as much a cultural and behavioral transformation for the organization and its people as it is about technological and process related changes. Organizations need to continuously adapt and adopt, upskill and upscale to keep up the pace in the continuously evolving digital world.

Chapters

Full transcript

The complete talk, organized by section.

Meenal Meenaakshi

Hello, and a warm welcome to all of you at DevOps Enterprise Summit US. Let me give a short introduction about myself. So I am Meenal Meenaakshi, product landscape owner at SAP Labs in India. I have close to two decades of experience in service and product delivery management, where I have led several digital transformation projects within my organization.

I love to follow the digital transformation journeys across industries, across organization, and to speak a lot at all these kind of similar forums about the information and experiences that I gain out of my journey. And one of the topic that has more than often come out of most paramount importance is the implementation of a solid IT service management strategy within organizations for a successful digital transformation.

So today, I would like to take this opportunity to provide a short overview about the evolution of IT service management framework in itself and take a little bit more deeper understanding about the three most well-known IT service management methodologies widely adopted across organizations, which is DevOps, SRE, and ITIL. What are the vision, value, and the guiding principles they offer? How do they compare with each other?

What are the commonalities, and where are the key differences? And finally, as an organization, how can one decide which methodology would, in alignment with its strategic goals and objectives, best fit for their requirement? And why is there a need to continuously adapt and adopt?

Now, if you look at traditional IT, it was seen more as a service and support organization delivering technology solutions. But in the current era of industrial revolution, IT is no more delivering only technology solutions to business, but it is the business in itself.

IT and business is fast converging. The digital services that are being offered today are all customer-oriented and value-driven, and they need to be managed in a way that they are not only supporting business in achieving and fulfilling their requirements, but is also supporting in their growth.

Because your product will have value for customers and for business only when it is able to fulfill the expected services and is able to provide the needed outcome which the business expects. So how do we know whether or not our services provide or deliver that business value to the customer?

What is this business value? Business value is that differentiated experience which the customers would get on consuming our product and services. And as an organization, it is extremely important to continuously and constantly keep a check on whether or not we have been delivering value to the customer.

Value could be in terms of cost value, it could be in terms of experience value, it could be in terms of platform value, where you are offering some product or service which is at a relatively much lesser cost for the business, for the consumers.

Or you are providing such an experience to the consumers that they are ready to pay additional cost just for that wow experience that they get. Or it could be just such a vast and robust platform that we are offering that a customer would not want to go to or switch to any other platform or service.

But how do we know this? How do we know whether our product and service is adding value to the customer? How do we know what would be the future demands and needs for the customer? This requires data. This requires analysis. Because only when we have data can we take informed decision.

And this is where we need support of all the digital tools and technologies, AI, ML, several reportings, dashboards, metrics that is available. And armed with this information, once we have this data, then we need to really think and decide, are we doing the right thing? We need to be hyper-aware. We need to be able to take informed decisions so that we can execute fast on it.

We need to be really demonstrating that digital ability because we do not know the current need of the customer may no more be valid tomorrow. And we have to continuously adapt and adopt according to the business requirements. Because if we will not, then our competitors would do that. And that would mean then the end of business for us.

And when we know that, yes, we are on the right path, then we really need to understand, are we doing the things right? Will we be able to really deliver what we have gathered based on the information collected and have taken an informed decision, do a fast execution and deliver that to the customers, to the business?

This is where we have the several guiding principles, the processes and technologies of IT service management and the different frameworks available to guide us and support us in faster delivering our services in velocity and with quality. And be it any IT service management framework that we use today, Lean, DevOps, SRE, Agile, ITIL, finally, it all boils down to delivering value to customer. So let us now, with this background, take one step deeper to understand how DevOps, ITIL, and SRE frameworks compare to each other, and what do they have to offer.

Now, DevOps is an umbrella concept that advocates the culmination of different teams working together as one system. It involves people from across the organization, from different teams, be it development, design, UI, security, documentation, testing, operations. All the teams that contribute towards the value chain in the value stream, all the teams to come together to work as one common team.

This involves then right from planning, to building, to continuous integration, deployment, operation, continuous feedback, which flows again back into your development pipeline. And DevOps offers three ways or suggests three ways how we can achieve this, which is also very well known as the DevOps three ways, or the three ways of DevOps.

The first way talks about thinking IT as a system, where work should flow as fast as possible from left to right, from development through testing, QA, regressions, security, and different nodes across the development life cycle to operations, and finally to reach customer, because that is where value is getting created.

And value is created only or seen only for a finished product. Work that is in process adds no value. Hence, it is extremely important that we keep smaller pieces of work, smaller development pipeline that is flowing through. This ensures a faster flow, keep the development life cycle continuous and moving with smaller chunks of work, smaller chunks of development that is being developed and shipped to the customer.

This is what helps in increasing the flow from left to right, till value is getting created at the customer's end. The second way talks about amplifying the feedback loop, because when you have increased the flow of work, you also have to ensure that you are getting fast and quick feedback flowing from right to left, which is not only coming from customers who are at the extreme right, but from each and every node of your development life cycle.

So this means basically not only an increase in the number of feedback loops across your development pipeline, but also increase in the frequency at which you are receiving feedback and working and continuously improving upon it. And the third way talks about creating a culture of continuous experimentation and learning.

Fail often and fail early, because failure should be considered as an opportunity to improve and to innovate. As if we inculcate this culture of continuous experimentation and learning, we not only enable the risk-taking ability, but we also build in a lot of confidence within our teams.

And then our teams becomes, the system becomes, a melting point of new ideas and innovations, which helps in further increasing the velocity and quality of work and services that is being delivered. So DevOps can be considered as a culture where people from different disciplines work together to design, develop, deploy, and run a system.

So within an organization, if we are trying to implement a DevOps setup as part of our digital transformation journey, then what are the guiding principles that can be adopted or that should be considered or is being offered as part of DevOps implementation?

Collaboration is key, because here we are talking about bringing all different teams which were earlier used to work in silos, were responsible only for their own area of work. All the teams are coming together, so it is extremely important to build and bring in that culture of collaboration where each and every team works together in sync.

Each and every team has to take over end-to-end responsibility and accountability of the entire work that is being delivered to the customer. Each and every team should focus on automating everything and anything that is possible by treating almost everything as a code.

This not only helps in bringing and improving the flow, but also helps in a faster CI/CD. It helps in faster resolution of issues when all the teams are coming together and working together for a common purpose, and also helps in ensuring a stable environment. Thus, it adds a lot of technical value within the organization, and also provides a lot of business value for the customers. Because this helps in increasing the quality of work. It helps in increasing the velocity of work. You are able to not only work and provide fixes for the issues that have been detected or raised by customers, but work more on new features and thus reducing the overall TCO.

Knowledge upskilling. Invest in your people, invest in sharing the knowledge. This is extremely important because DevOps talks about different teams coming together, and each and every team is responsible and accountable for the entire work, entire system. So it is extremely important that we do not only raise specialists who are specializing only in one particular domain, but we really need people well-rounded with cross-functional skills. Fail to learn, as we already discussed. Fail early and fail often, because failure should be considered only as an opportunity to improve and further enhance and optimize our processes and system.

And continuous improvement based on the feedbacks that we are receiving, based on the feedback pipeline that we are continuously setting up at each and every node with increased frequency and increased number of feedback loops. We are also improving continuous improvement, bringing in continuous improvement in our service and delivery.

And this brings in a lot of cultural value within your organization, that this people-first approach brings a lot of innovation, motivation, and efficiency, thus leading towards a successful digital transformation. Now let us look at how Google talks about addressing these customer values.

Site reliability engineering is Google's take to delivering value to customers. It talks about building end-to-end reliability at site. Site reliability engineering is more a post-production set of processes and activities for systems at scale, and it operates on the principles of prevent, recover, and optimize.

Do whatever you can to prevent an issue from reaching production. This could begin right from providing input to architecture, to influencing and ensuring resilient development and resilient testing, intelligent alerting mechanisms, proper health check monitoring, so that we are already able to identify and fix issues even before it is reaching production.

But once an error occurs, then ensure with intelligent alerting, self-healing mechanisms, that you are able to have such a robust system that you are minimizing the meantime to restore and meantime to recover, and you can recover as fast as possible.

And once you have recovered, then focus on the optimization part. Then do postmortem of the issue that happened, do the root cause analysis, find the issue, and ensure, share the learnings and ensure that it is completely removed from the system so that it does not happen again. We're providing, again, the input to development and architecture.

Optimization involves one of the biggest contribution in eliminating toil. Now, toil is anything and everything that involves continuous and repetitive set of actions and activities, or manual set of activation and activities. This is the engineering part of site reliability engineering, where we are continuously trying to engineer and re-engineer your system to make it more and more robust, more and more optimized, more and more stable.

And this can happen only when you take a software development kind of an approach, a software engineering approach, which you are then applying to infrastructure and operations problem. And site reliability engineering is considered as a discipline that incorporates concepts of software engineering and applying them to infrastructure and operations problem.

So what are the guiding principles and values which site reliability engineering has to offer? Like in DevOps, collaboration is key also here. Site reliability engineers have to collaborate with other engineers, with product owners, with customers, with other stakeholders to come up with an aligned service level objective for the service that is being delivered.

SRE encourages you to have objectives and defined objectives and aligned objectives for each level of service that you are offering to customer. Because this is what pushes you, this is what sets a benchmark and pushes you to really speed or focus or bring in the flow to reach that benchmark.

And once you have a defined SLO in place, have a planned error budget. How much of change can be delivered and at what frequency? And just stop when you have reached the error budget. Automation is key. This is anyhow the key for when you plan to provide or deliver continuous development to customers. But automation is also key in ensuring that you are maintaining the resiliency and robustness of the IT environment and system that you are supporting.

Balance self-regulated control over development of new features versus the stability of the system. Decide where to stop, decide what to deliver to the customer. What will have an impact on the overall stability of the system. And finally, fail to learn, because a failure that happens in the production system is not really caused by an individual or a team, but is a failure of the entire system, where each and every team is responsible for it.

And SRE really encourages on blameless postmortem, that it's not really someone's responsibility, but the entire system is at fault, which needs to be corrected. So when you have these kind of guiding principles, you follow these kind of guiding principles and adopt it when you are setting on into this journey of establishing site reliability engineering within your organization, because as I mentioned, it's a separate team that needs to be set up. This is then what brings a lot of technical values, business values, and cultural value within your organization by ensuring robust system, which is resilient and reliable. You are automating operations, which is further improving the reliability, and it mainly helps then in reducing the customer churn rates, because you are able to improve on the service and...

that you are providing, thus reducing the overall TCO and bringing in a lot of cultural values within the organization. Now let us look at how ITIL addresses these topics about delivering value to customer. ITIL by far is one of the most widely known and accepted and adopted IT service management framework.

And ITIL 4.0 is the latest evolution of Axelos, which talks about co-creation of values. And ITIL has always kept pace with the ever-changing demands in the industry, starting from a more process-centric approach with ITIL 2.0 to the establishment of a service life cycle set up around these processes in ITIL 3.0 to set up of or to the formation of the service value system in ITIL 4.0, which embeds the service life cycle within its value chain.

So in ITIL 4.0, it ensures or encourages the effective collaboration with the business, with the customers at each and every phase of your development, so that we try to always ensure that a demand that we have seen, which will add value to the customer, is also perceived as valuable from the customer's perspective.

So we start together with the demand, with the change that needs to be shipped to the customer and design, build, deliver, support, together with the customer till it reaches the point where it is really adding value to the customers. And gain continuous feedback from the customers, so that you are then able to work on your next input and next demand.

So ITIL 4.0 can be considered as a digital operating model which believes in co-creation of values for its IT-supported products and services, where you are co-creating value together with the customer, with business. ITIL 4.0 also provides a very robust set of guiding principles and values which should be taken into consideration when embarking onto this journey.

Start where you are. Look within your organization, which is the area which really needs improvement, and just start from there. Collaboration, like for any other IT service management framework, is really the key because you have to ensure that you have everyone within your organization together with you in this journey, and everyone aligns to, yes, this is a change, and this is what needs to be delivered.

This is ensured by an effective communication and collaboration mechanism that needs to be put in place within the organization. Keep it simple and practical. Always break down the requirement into small deliverable changes. Progress iteratively. Once you have worked on that small change, and it has been delivered to the customer, gain feedback, and accordingly start again with the next change.

You need to really work with a holistic view so that your change is not disrupting the stability of the environment or disrupting something that was working fine before. So you really have to have a complete holistic view of the entire system.

And finally, you always have to automate and continuously keep on optimizing your system to ensure that you are delivering a value which is also perceived valuable to the customer at the end. And this then helps in improving when you have the customer in focus and the focus being on delivering value to them right from beginning till it has been delivered, till your service and product has been delivered to the customer, is when you are able to improve customer satisfaction, improve on the quality of service that you are providing.

And thus helps in delivering business value to customer. And it not only helps in delivering business value, but also helps in improving the technical value and cultural value within your organization. Now, if you look at the three IT service management frameworks which we just discussed, we see that they are more or less all aligned.

Each of them are finally focused on delivering value, which is the final goal for any IT service management framework, that it has to deliver value to the customer. But each talks about how we can increase the flow, which means we are able to deliver faster, that we are able to increase the speed of execution and deliver faster to the business, to our customers. Each talks about improving continuous improvement by having a really solid feedback pipeline and a feedback mechanism in place.

Because here it really encourages giving and receiving proactive feedbacks. This is what helps in improving the quality of the software and service being developed. Each talks about the ever-increasing focus on automation, because this not only helps in increasing the speed, but also the quality.

And above all, finally, what we see as most important is this people-first approach, encouraging the concept of experimentation and learning, giving an opportunity to the individuals, to the teams to take informed decisions, to learn from failures, to spread knowledge, to upskill themselves continuously and encouraging in improving the motivation, efficiency, and overall productivity within the organization.

Now, having said this, let us take a close look also at where are then the key differences. Where do these three IT service management frameworks, DevOps, ITIL, and SRE, cross their roads? How would you, as an organization, then decide which methodology to adopt?

So it is equally important for you to really understand where then are the key differences. Now, if you look at the overall architecture, as we discussed about how DevOps functions and SRE or ITIL functions, and what are the guiding principles that it offers, you see that the final goal for each of them is slightly different.

Where DevOps really focuses on the speed and quality of delivery, site reliability engineering focuses more on scaling uptime robustness of the system, and ITIL focuses more on delivering service with quality and consistency. This means also that the way change is managed in each of these methodologies is then different.

DevOps focuses or follows more on delivering gradual changes via continuous integration and continuous delivery, while site reliability engineering focuses on delivering quick changes via error budget. As long as you are within the error budget, keep on delivering your changes.

While ITIL, delivery of change is via well-defined governance model that is in place. This also means that the error handling process is quite different than for DevOps, SRE, and ITIL. For DevOps, error handling is at a pre-failure state, where we are trying to remove the error even before it reaches customer.

While for site reliability engineering, it is a post-failure set of activity. You do an RCA after the failure has occurred so that it never occurs again. While for ITIL, error handling is part of the problem management phase in the development life cycle.

Also, if you look at the operating model of DevOps, SRE, and ITIL, the way they operate is as well different. Now, just coming to a team topology, which DevOps, SRE, and ITIL recommends. Now, as we discussed, DevOps talks about bringing in different multidisciplinary teams together, which were earlier working as silos. So it talks about breaking the silos and bringing all the teams together. While for SRE, it is a defined team. It is a separate team with defined roles of site reliability engineers, who are basically more software engineers working or applying their concepts on infrastructure and operations problem.

And then we have ITIL, which really focuses on establishing a symbiotic relationship between IT and the stakeholders and the business, and does not really require setting up of any new or separate team. If you look at the entire value chain, DevOps starts right from development. Yeah. We start from development, moving into production, trying to remove, identify issues and errors that occurs during the life cycle before it even reaches production.

While for site reliability engineering, it starts from production, once a failure has occurred, and then trying to bring the corrections back and the improvements back into the development life cycle, into the value chain. While for ITIL, it is wrapped around the service value chain.

And also the way you measure the success of DevOps or SRE or ITIL, the metrics is then quite different. For DevOps, it depends more on deploy frequency, because here we talk more about continuous integration and continuous delivery, so how frequently you have been able to successfully deploy.

So deploy frequency, lead time, change failure percentage. While for SRE, it is more about how much you have been able to meet your SLOs, your SLIs, SLA, what has been your meantime to recovery, and so on. And for ITIL, it mostly depends upon the SLA that is in place, the change success rate, the ticket volume, the overall cost involved in running this service.

Now, if you look at this as an organization, even after gaining this understanding of how ITIL, SRE, or DevOps really operates, how would you still decide which way is then the right way? Or do you think there is really a right way or a wrong way for an organization?

It is not really an either/or kind of a situation. It entirely depends upon your organization's specific needs and requirements. What is the target that your organization wants to reach? Because digital transformation is not only about implementation of new technologies and tools. It has to completely align with your organization's objective.

So we talk about the five whats. We talk about the power of purpose. When you have the purpose clear in front of you, when you know what is the target that you want to achieve, that is what will help you to decide which methodology or methodologies can be adopted.

It could be one, it could be a combination of each of them within your organization. Because you really need to identify what is the problem that you are trying to address. And when you have that, then you need to find out which is, say, the right hammer for your nail.

You need to know what is your nail, then you can find the right hammer for it. So always question these five whats before deciding on or adopting onto a methodology or an IT service management framework adoption. What is the problem that you are trying to solve?

There can be different solutions to different problems. And is your entire organization in sync, in alignment with you that, yes, this is the problem and this needs to be solved? And if there are several problems, what is the priority at which it should be solved?

What is the scope of the problem? You really need to have a bird's eye view, a helicopter view of your organization to understand the scope and scale of transformation which your organization will have to go through. Because digital transformation is not only about implementation of new tools and technology, but it is as much as about behavioral and cultural changes that needs to be implemented within the organization.

What is the solution and why? There can be different solutions to different problems, but it is equally important to understand what is really running good in your organization. For sure, what is running into issues needs improvement, needs to be addressed, but it is equally important to understand that you do not want to disturb what is already running quite good within your organization.

And this information then also helps you in deciding and determining which way is then the right way. What should be your starting point? Look within those portions within your IT organization that really needs improvement and can bring a massive impact on adoption of these new transformations, and just start from there. Start small.

And always have defined and aligned KPIs and success factors to really define the success of the implementation in your digital transformation journey. Have the right KPIs for measuring your success. And when you have these five whats defined and decided, then you know that you have the nail.

You have found the right hammer for it, and when you have nailed it, then you are ready to scale it. Because it is extremely important that you are continuously ready to adopt and adapt in this transformational journey. Always remember, digital transformation cannot be brought about within a day. It is a continued and continuous journey, and it sits right at the heart of cultural and behavioral transformation because it is as much as about implementation of new tools and technology as much as about the human and behavioral transformation. You need to really have that right mindset in place to bring in that transformation. Because always remember, IT service management is only a means to an end and not the end itself.

So finally, whichever methodology you plan to adapt, be it ITIL, be it SRE, be it DevOps, it will only be successful if you really have the right mindset, if you are really able to nail also the cultural and the human aspect of it, where people really agree to change and adopt and adapt to this new situation and new environment.

The future of IT service management is definitely bright, whether you look at it from an IT lens or from a business lens. And the digital agility which the IT organization has always shown and needs to continuously show is extremely important to always be successful in this digital transformation journey.

Having said this, I would like to say thanks a lot for your patient hearing. I hope you have been able to take out some information out of this talk, and it helps you in really deciding on which path to adopt, which direction to go to while embarking onto your digital transformation journey.

Thanks a lot once again, and thanks a lot to DevOps Enterprise Summit for this opportunity for me to share my thoughts. Thank you.