DevOps and Agile: AppDynamics in Continuous Integration Environments

Log in to watch

London 2017

DevOps and Agile: AppDynamics in Continuous Integration Environments

Director of Technology Strategy · AppDynamics

Industry evidence suggests that around 80% of all business service outages are caused by release, change, and configuration processes. The business is challenging IT to streamline processes and drive innovation.

How do we enable speed and rapid change while simultaneously balancing quality and performance concerns?

This deep-dive session will provide insight into AppDynamics in a Continuous Integration Environment.

Chapters

Full transcript

The complete talk, organized by section.

John Rakowski

So my name's John Rakowski. I'm Director of Technology Strategy at AppDynamics, the application intelligence company. And I'm joined today by Andy Jackson, who's a senior sales engineer based over in the UK also.

So a little bit about my background. I've been at AppDynamics for two and a half years now, and prior to that, I was at Forrester Research. So I was Forrester's lead analyst for application performance management and also IT operational analytics. And I had the unique privilege, I suppose, in that time, I was there from 2011 to 2015, of seeing the interest in DevOps starting to grow within organizations.

But really, in the last couple of years, I've really seen how DevOps within the enterprise has started to really gain traction. Probably in every conversation which I have now with a prospect, with any enterprise organization, they're looking at DevOps. Now, whether they can agree on what DevOps really stands for and what it really is is a different matter. But everyone is now looking at DevOps.

So setting up separate teams, developing new applications in a new way. Sometimes these teams are referred to as DevOps teams, and sometimes I see these teams referred to as digital teams.

So what we're going to look into today is really how, at AppDynamics, we approach DevOps. So more specifically, release and change processes and Agile, and how we can support these within the enterprise.

And we're going to cover three main areas. So firstly, we're going to have a look at the state of DevOps within the enterprise today. So the push for agility. We're going to look at the way we define DevOps, the way we see DevOps within the enterprise.

Then we're going to look into how, at AppDynamics, we enable organizational collaboration. One of the biggest things about DevOps adoption is, as you've probably seen from all the sessions you've been to already and other conferences, it's about changes in culture. So we'll look at how do we change the way people work? How do we change the way we monitor in the enterprise?

Then finally, Andy will give a demo, showing some of our capabilities in AppDynamics, and then we'll have time for questions at the end.

So the industry push for agility. Now, like I said, there's different definitions, and I've seen many different definitions to what DevOps really is about. So for the purpose of this session and the way we look at DevOps, DevOps quite simply is the fact that the way we do business in the enterprise is starting to change. It is applications which are really at the forefront.

There's not one interaction with an external customer today which is not supported by software and applications. Internally, there's not one business process which is not supported by applications and software.

And so for us at AppDynamics, DevOps really is this: if software drives the way customers buy and the way businesses sell, for success, you have to deliver applications at a speed your customers expect, faster than the competition, while enhancing quality.

And there's really three key points in regards to that definition.

Number one, you have to deliver applications at a speed your customers expect. DevOps is not just about speeding up release and change processes. There's no point speeding up your release and change processes if it's not in line with user or customer expectations. It just doesn't make any sense.

The second important point about this is that you have to operate faster than the competition. Everyone today is competing with software and applications. There's not one of you in this room who's not doing DevOps already. You have to. It's by default. DevOps is an IT transformation which is already occurring in every single industry, in every single organization.

But the third most important aspect about this is that you have to do it with quality. And sometimes speed with quality, those two areas kind of rub together. But it's essential. Because if you're delivering new applications, new features, new releases, new changes without quality, well, if applications and software today is your business, then any application problem, especially in production, impacts your business dramatically.

And when we talk about business impact, it's not just about the financial impact. It's about the brand and reputation impact. So quality really trumps above all else. It trumps... It's higher and more important than speed or any of these other aspects. Okay?

So I'm sure you've all seen this framework, the CALMS framework. All experienced with it. I'm getting a number of nodding heads.

So to me, having looked at DevOps now for the past decade, the reality is the best definition I can find or the best framework which I can find for DevOps is the CALMS framework. Because DevOps is about transformation. It's about changing your culture, transforming your culture within your organization.

And if you're looking for a transformation of culture, it's about changing people. Probably one of the most hardest things to do in any enterprise.

And what culture are we after? Well, I've heard about culture of fail fast, fail forward, but that's still with quality. It's a culture of experimentation, not being afraid, not fearing changes, releases into a production environment.

It's about automation. Automation is key for speed. And automation solutions have been around on the market now for as long as I've been in IT, so 20 years. The reality is automation, again, is a people-focused area. In order to get automation right, you have to change and transform the way people see automation, not to fear automation.

You've also got Lean as well. From a monitoring perspective, this means using just the right amount of solutions or tools. Many enterprises I work with have 10 or more monitoring tools. The problem is with that is that multiple monitoring tools equals multiple sources of confusion and multiple sources of data.

And then you've got measurement, probably one of the most important aspects of any DevOps adoption, second most to culture. Because the right metrics guide you to the right destination. Metrics are so important.

Then finally, sharing, the third most important aspect of DevOps adoption. Because sharing is all about collaboration, making sure that your teams have the right information in context.

And so where are we today with DevOps adoption in the enterprise? Well, probably the best barometer of adoption comes from Puppet, actually, the State of DevOps Report. Has everyone seen this report?

A couple. Okay. If you've not seen it, I encourage you to have a look at this report. It's downloadable from Puppet, and it is probably the best DevOps survey I've seen for understanding where organizations are in their DevOps maturity. And I've been following this survey and report now since it started.

And last year was the first time, when looking at the results of this survey and the results in the report, where you can actually see that organizations are starting to move forward now with DevOps. And particularly in the enterprise, you're starting to see some kind of real benefits.

I mean, the top-level benefits for high-performing DevOps organizations, so mature in their adoption of DevOps. So you can see on the screen here, 200 times more frequent deployments, 24 times faster recovery from failures.

But really the more interesting results for me was higher employee engagement. They're asking questions such as, within your organization and with your adoption of DevOps, how has this helped employee satisfaction within the IT department, within the IT function? And actually, what they're finding is that those organizations who are moving forward with a DevOps approach are starting to have higher satisfied employees.

And that's really the key here, because I said that DevOps is really about culture. It's about the people. So those organizations who are doing it well are starting to see more satisfied employees.

But the other thing which this survey was addressing is when it comes to DevOps, it is about release. It is about change. It is about frequent release and about frequent change.

And they categorized organizations into three levels of maturity, so high performers, medium performers, and low performers. So from a deployment frequency perspective, high performers were deploying multiple times a day.

Is anyone in the room actually deploying multiple times a day at the moment? Hands up.

One person. Two. Okay, cool. So you can characterize yourself as high maturity on the DevOps scale, then. Would you class yourself as high maturity?

Maybe. Yeah.

How about yourself? Yeah. Yeah. Okay.

What they also found as well is, if you're looking at release, it's about quality of release. So from an MTTR perspective, MTTR for those high-performing organizations was less than one hour. So for any Sev1 incident, they could recover from a Sev1 incident or find the root cause before it becomes a business problem in less than one hour.

Anyone here? Okay. I've got to talk to you afterwards. It could be a case study. It's a good one.

How about yourselves? Recover in less than one hour for high MTTR?

Yeah, around about that.

For most organizations, and what's typical actually when I talk to enterprise organizations, is that they can recover from a Sev1 incident in really one business day.

But how much does it cost to have incidents, to have outages over one business day?

Well, in the last couple of weeks, there's been a number of notable examples of this. But before I come onto those costs, the most dramatic impact for fast release actually ends up with production environments. So this image on the screen could be considered as a support engineer in production because as soon as you start talking about fast release, continuous delivery, then it's kind of a look of this, especially when you're in the enterprise.

Nobody likes to do fast releases. Why? Because our production environments today are like sleeping babies, and nobody wants to wake a sleeping baby. What happens when you wake a sleeping baby? I've got two kids. One's coming up to two years old, and one's four. And my two-year-old, if you wake her up from a nap when she's not ready, she starts crying. She tells you about it.

Now, what does the 80 stand for here? So for a number of years, we've known that the cause of the majority of production outages come from two processes: change and release. And 80 was a stat. So 80% of production outages come from change and release processes. This was a survey done way back in 2011, and it still stands as common today.

So for any enterprise looking at DevOps, as soon as you see a stat like this, there's an element of fear. It's always the age-old question when there's an outage: well, what's changed? And usually the next question is, well, speak to the network engineers, it must be a network problem. But the reality is making change, fast releases, is extremely difficult.

And from a financial perspective, there's great cost. There was a report done by IDC a couple of years ago, which was DevOps and the Cost of Downtime. Again, a great report to have a read and look over. And they found that the average cost of unplanned application downtime is between 1.25 billion and 2.5 billion annually. The average cost of infrastructure failure is 100K per hour. The average cost of a critical application failure is 500 to 1 million per hour.

So I suppose all of you in the room are aware of a very high-profile production outage with a major airline in the last couple of weeks. Yeah? So you can probably start to understand the costs which were involved with that outage.

But here's the thing, this is the financial outage. In today's world, applications really are the business. So outages when it comes to an application, when it comes to a service, this is the equivalent of hanging up the closed-for-business sign in the shop window. And the most dramatic effect is actually on the brand of the business.

So any outage here changes the perception of the way your customers see your business. So you can understand now that if you're looking at these stats trying to explain DevOps to any C-level colleague within the business, and you're saying, "Okay, well, yeah, we're going to go faster release and change the way we do things," it's extremely difficult.

And in the report as well, they also had a look at, well, what were the major barriers to DevOps adoption. And again, it goes back to this people-orientated approach about the transformation of people. So you've got cultural inhibitors and fragmented processes. Both of those go hand in hand.

And especially within the enterprise, when we look at fragmented processes, there's major problems here because the way enterprise IT has evolved in the last 20 years, it's evolved as a siloed format. You've got network teams, you've got application teams, you've got architects, you've got developers, and all of these operate in silos.

And so the one thing which we're seeing in AppDynamics right now is that a number of enterprise organizations are looking at, well, do we have the tools really to promote more cultural collaboration? Do we have the tools to unify our processes? And there's a big focus now on solutions in that space.

So how do we do this at AppDynamics? Well, we focus on application performance management. We also focus on elevating monitoring up to the business level. And you'll see this through the demo as we go through.

But within the organization, within the enterprise, we see these silos, and this is from a top level. So you've got silos of business, business analysis or business colleagues. They're looking to really understand how applications are performing, not from a technical perspective, but how they're driving business outcomes.

From an architecture perspective, the backend architecture of applications today is becoming increasingly more complex. We can talk about microservices architecture. We can talk about non-relational data stores. We can talk about organizations now who are moving towards a multi-cloud strategy. So different clouds for different application profiles. Not just doing migration once, but doing migration twice.

So architects want an accurate view of their applications as they change.

You've got development and test professionals who want to develop and work on new features, but are having their time taken up by working on production outages, leading to developer backlog. They're looking at different ways of actually testing and making sure that their applications are working and functionally providing the value which they need before going into production.

And finally, production support. Those people who had that fear on their face. What they want to do is make sure that they can detect any problems in production before they become major business issues.

And so what we provide is a single pane of glass across each of these areas. So from a business perspective, we provide correlation between application health and the business health.

From an architecture perspective, we provide instant mapping of an application in any environment. We watch and monitor every single transaction and baseline all of its metrics.

From a dev, test, and release, we're able to spot problems in any environment very quickly. And because our approach is on unified monitoring, we're able to delve down, right down to the database layer, to understand where issues are occurring.

And finally, from an ops perspective, we elevate the importance of monitoring. We don't just monitor from a technical perspective. We provide visibility to the user journeys going through that application. We monitor the concepts of business transactions, those user interactions which are of critical importance to user experience and to driving business outcomes.

Which means that anyone can look at the AppDynamics console and see the health of an application, not just from a technical perspective, but from a business perspective. So we provide one single pane of glass for communicating and driving collaboration across each one of these functional areas.

But the best way to show this is actually through a demo. So I'm going to hand it over to Andy to show us the AppDynamics console to go through some of these concepts.

Andy Jackson

Thanks, John.

So I'll take you through the scenario that we're going to run through as part of the demo. So we've got an e-commerce application, and the marketing team have decided that we've got a brand-new offer that we're going to give to a certain percentage of our customers, and hopefully that'll help us drive some new revenue.

So AppDynamics is already in situ, it's already monitoring all these flows, and we've built a dashboard to show the marketing team exactly what's going on with its new offer.

So across the top here, you can see how many users are actually hitting our website. So in this time period, just over 5,000 unique visitors have hit my e-commerce store. We know that that's around about normal as well, so we can test with AppDynamics. Is that normal for this time of the day? Is it normal for this day of the week?

And we can see out of all those 5,000 sessions, our new offer has popped up on 16% of those users' desktops as they're going through, or mobile sites. Out of those, only 4.8% have actually converted that offer, so only 4.8% uptake. And the marketing team are not particularly happy with that. They were expecting a much better conversion rate.

So we know we've got something to look at there, and that's coming in from a business perspective.

We can also calculate how many people have taken up the offer. So 188 people have taken up the offer, and actually around about £27,000 in revenue coming from this. So you can see that if we were actually converting another 10%, our revenues would be much higher than that. So we know there's a dollar value cost to this problem.

So what I want to be able to do now is work out, is that a technology problem, or is it just a really rubbish offer and no one wants to take it up? Okay.

So we're going to take a look down at the user conversions, and we can see over time, the conversions are fairly standard. We can see that the user experience, again, is fairly standard and around about normal. There's no sudden peaks in the problems in response times.

But when you get to the conversion funnel here, this is really telling as to where the problem is. So with AppDynamics, we monitor 100% of every single transaction as they go through the system, and we're able to tie them together into user sessions.

We can then start to say how many users started at the top of this funnel on our authentication page, how many made it through to the account profile, how many received the offer, and how many converted. So we're able to thread those four steps together and work out where are people dropping off.

So around about 20% of the people don't actually go to their account profile, so they never see this offer. 80% of people are not making it from their account profile to the received offer page. So that tells me there's something stopping them.

Over on the left-hand side, this is my technology side of the world, and you can see there's a big red flag here, and you can see that it's taking about 2.8 seconds to load that page. So potentially what's happening is everything's going really quick. They get to their account profile page, and it takes so long to pop up this offer that they're not clicking it.

So what we want to be able to do now is confirm that that's the case and fix that problem.

So what I'm going to do is drill into the account profile here, and it's taking me to my application flow map. Some of you may have seen AppDynamics before. Some of you may not. What this is, is an automatically generated view of your application in real time.

I like to think of it as a live Visio flow map, because if you've got a Visio map of your application, my guess is it's at least six months out of date, and it doesn't have everything on it. And as we're talking about releasing every day, you would need full-time armies of people to update your Visio documents to make sure you've got a valid representation of your application.

So with AppDynamics, we do away with that. This is what your application looks like. This contains every call from every server to every other outside server.

So these on the left-hand side, these are actually external services, things that I don't manage. These three servers here, these are all Docker images. You can see I've got 30 instances of my web tier. I've only got one instance of my data services tier. And that changes over time. So if I was to spin up another three data services, you would see that go up to four.

So I can actually see what did my application look like in real time, but also go back and look, last Wednesday, how many of these servers did I have? Do I need to spool up some more of them now?

What I want to be able to do is work out why is the technology failing. So what I'm going to do is turn on our baseline comparison. And what AppDynamics does is it takes every single one of these numbers and compares it to what's normal for this time of the day and this day of the week.

So we're taking out the fact that you might have no users at 3:00 a.m. in the morning, and you might be really heavy with users at lunch hour. At lunch hour, you'd expect the application response time to be slightly slower. We take that into consideration, and we tell you how many were slow for the particular time of day you're looking at.

So we can actually see here at this time, this particular portion of the communication down to the data services is running slow. We're getting a poor response from one of our microservices.

What we want to then be able to do is look at what's the impact of that. So I'm going to go to the Business Transactions tab.

Business transactions are, again, automatically discovered by AppDynamics. You don't have to keep updating them. And they're based on user interactions, essentially. So if a user comes and logs in, we get a login business transaction. If they're doing a checkout, we get a checkout business transaction.

And you can see here the health of our account lookup has errors. So along with the fact that we know there's a drop-off in our conversion rate, we will also have had a ticket raised for us in Jira, in ServiceNow, whatever your ticketing system is, and the ops team will already be looking at this. You'll already be trying to work out what's going wrong for the account lookup.

I'm not going to go through that today, but AppDynamics will allow you to get down to the method level of data and see that there's actually really slow responses coming from our database.

So we know we've got a problem with account lookup, and we know that's impacting our conversion. So the dev teams take this away, and they work out what they're going to do about it. And they come up with an idea that actually the database is maybe not the right structure for this data, and we go away and we design a new system that uses MongoDB as a back end. Maybe that's a better structure for the data we're storing.

What we now want to be able to do is, A, test that. So in our test environment, we're going to build out with MongoDB, and I want to see that the response times for the step that was broken are now faster. I want to put a heavy load through that to make sure it can stand up to the levels of load we're going to get in production.

And then I want to go through a kind of an A/B test, and that's where we're at now. So we've developed this new back end. We've got our data services V2 now spun up in Docker, and we're putting a small percentage of our users through version two as opposed to version one.

So we can see that's live updated my flow map. I've got my old server down here with its response times. I've got my new server up here with its response times. And I want to tie this all back together and say, "So what? Have I actually fixed anything for the business, or have I just made a nice new pretty MongoDB, and that makes me happy because I'm a technologist, but is it actually helping the business?"

So the last dashboard I've got here is now tying these two together. I've got version one, version two, running in parallel in production, and I'm currently looking at the last three hours worth of data, but you can expand this as you leave that to run in for a day.

And you can see how many users received version one versus version two. I can see my conversion rate on version two has gone up by about 10%. So that improvement I've made has led to a 10% improvement in conversion, and although the revenue's slightly slower, the revenue per active session is going to be much higher.

So we know we've fixed it, and just to validate again, we've fixed the right problem. This is my conversion funnel with that 80% drop-off for version one, and it's now down to 40% for version two.

So with AppDynamics, we've kind of taken this from a new initiative from the business. We want to do something new. We've helped identify problems. We've helped the dev team to generate new code, and we've validated that everything's working.

Q&A

Q: Any questions on that?

Is end user monitoring using conversion?

A: So actually, this is not using any user monitoring. Yeah. But we could do the same in end user. So this is actually all based off of the business transactions in the back end.

And I've done similar with other customers where maybe it's a UI change, and you change a button from green to blue. Does that improve conversion? And you can do the same thing. You can come up with this dashboard for whatever change it is you're making.

The fact that this is version one and version two is just a filter within my data set that says, "Only show me the things where this value is version one or this version is version two." Now, that could be any number of things.

So another customer, their storage vendor came in and said, "We've got this new storage. It's 10 times faster than what you've currently got." And they went, "Okay. Let's do a proof of value against that." So they put in some new storage. And it is 10 times faster. The IOPS prove it.

But when you look in AppDynamics, the percentage change of how long it takes to load a page is negligible because it wasn't the bottleneck. So actually, even when you're testing hardware changes in the back, adding more RAM, changing server types, migrating from on-premise to AWS, any of those types of scenarios, you can come up with this dashboard and say, top one is my data center, bottom one is AWS, third one is Azure. Let's do a bake-off. Work out which one's the quickest for the same code running in different places.

Make sense?

John Rakowski

So basically what you're seeing on the screen is a combination of different elements of data. So you've got the technical data, so response times. You could pull up any kind of metrics which come from the application side.

But what we're doing on the back end is also correlating this through that notion of a business transaction to the business side, and even with user engagement data, we can pull that in. So we're actually seeing whether conversion rates are improving because ultimately, if applications are your business, it's not just the technical perspective of our application which matters. It's not just the technical performance.

If performance improves, the question is, so what? So what if we've improved performance? So what if we've released a new feature? The reality is the so what, how do you answer that, is how that new feature, that new change or release has improved the application from a business perspective.

So what Andy's gone through is actually a number of very powerful kind of concepts. We've tied together business performance with application performance and also user engagement. We've gone down to the application level, mapped out a modern application with a Docker back end. We've seen a change happening in the environment with MongoDB coming in. We've seen how that's now improved conversion.

So the dashboards which you're seeing up in front of you, these are custom-built dashboards. It's easy to produce dashboards like this within the AppDynamics console.

So in summary...

So I'll switch back to the slides.

Applications are key to success, so faster release is definitely needed to kind of stay ahead. But faster releases also increases risk, especially within the enterprise. And AppDynamics really does help to kind of break down silos in each one of those areas.

So with that, I'm going to open it up to any other further questions from what you've seen.

Nope? Stunned silence.

Andy Jackson

It can also mean there are many questions that may...

John Rakowski

Yeah.

Andy Jackson

So we've got a stand if you'd rather come and ask us one-on-one.

John Rakowski

Yeah. Please come over and speak to us. We're in the main hall.

Andy Jackson

We can give more specific demos down at our stand.

John Rakowski

If you want to grab one of these T-shirts, "I've got 99 problems, but an app isn't one," then by all means, come and see us at our stand, and we'll give you a further demo. Okay. Thank you very much.

Andy Jackson

Thank you.