DevOps: Breaking Traditional IT Paradigms
We’ve all heard DevOps can greatly accelerate velocity and efficiency. The challenge is how to transform a large scale enterprise with established processes and systems.
Through the looking glass of a number of DevOps myths (are they really?), we will share how HP goes DevOps, brokering relationships among our business unit and infrastructure IT teams to make the move from organizational silos to integrated teams and continuous delivery pipelines; from physical systems and storage to cloud infrastructure and Docker containers; from templates and forms to infrastructure-as-code; and from change requests to change records.
Chapters
Full transcript
The complete talk, organized by section.
Ashish Kuthiala
Good morning, everyone.
So when we found out that we were following Jason and Disney, we were kind of figuring out, what is it that we did to offend Gene? Because this is... right?
But then we started thinking, all right, how can we up our game? And so, can you play the video, please?
At the beginning of the 21st century, the Earth needed to find a new way to keep up with the data from over 30 billion connected devices.
Just 30 billion.
So a bold group of researchers and computer scientists in Silicon Valley had a breakthrough they called the Machine.
The Machine.
It changed the basic architecture of computing, putting a massive pool of memory at the center of everything. And by doing so, it changed the world. It's been a part of every new technology for the last 250 years.
Everything?
Everything.
This year, Hewlett Packard Enterprise will preview the Machine, and the future of technology will begin.
See Star Trek Beyond in theaters July 22nd.
Thank you. So, the future. How many of you grew up watching Star Trek, Star Wars, and wanting to actually live that future, right?
I think the future is here today. I think you saw the video. Companies like HPE, our mission is to make these new technologies possible, to make that future happen today. And when we help a lot of these huge enterprises with their technology solutions, it takes our own IT organization a massive effort to help our teams produce these solutions.
Next slide, please.
And so, just HPE IT by the numbers. Olivier, if you want to just highlight what we do behind the scenes to make this happen.
Olivier Jacques
Yes, thank you, Ashish. So this is our numbers slide, right? And just to highlight a few of them, you don't have to read through everything.
But in HPE IT, in 2015, we completed 900 projects, and one of them was a big one. It was the separation of HP, a 75-years-old-and-more company, into two companies: Hewlett Packard Enterprise, focused on the enterprise, and HP Inc., focused on the consumer business.
And when you are the IT organization, making eggs out of an omelet is not something that's easy.
No separation or anything-related speech here.
The other aspect was, to run HP, we have 1,400 applications that we operate every day. And that's operated by 7,000 dev and ops people that are all there to make this work.
Rafael Garcia
Yeah, and it's not all about the numbers, right? So there's also complexity involved.
We have an infrastructure that's been built over years and years of acquisitions and mergers. We have processes and policies that have been embedded into the company forever. And it breeds this environment that's very, very difficult to change, right? And it's what you've heard a number of speakers be talking about.
For us, we tried to figure out, how can we move faster in order to be able to support the business that, in turn, has to create that technology that you saw in the video? And we felt DevOps was the way that we could do that.
So about 18 months ago, we started the pilot to be able to try it out, see if it actually applied in our world, which is this enterprise, really stale and very entrenched kind of world. And we found it did.
So we'll talk a little bit about several of the different attributes and design patterns that we found that made us successful in applying DevOps into the enterprise. But let me make it just real with a couple of different applications.
These weren't greenfield, little isolated applications in the corner. These particular ones are, for instance, our myComp Mobile. It's the application that's used by all of our sales folks to be able to track their compensation. Absolutely critical, mission-critical to them, and mission-critical to our ability to be able to generate continuous revenue. They understand where they are real time on quota and things to that effect.
And then the other one is our support automation environment, which is, in this particular case, telemetry that is collected from all the systems that are out in customer sites. And the intent is to be able to collect this, do predictive analytics, and understand when parts are going to be needed so that we can get proactively out there and service customers.
Absolutely mission-critical. Very, very different architecture from a mobile application, and one where we found that, at the current pace that we were doing business, we were having 16-, 18-month release cycles. They managed, through a normal Agile practices and things like that, to get it down to 12 months. But even at that 12 months, we had products coming out that were in the field for extended periods of time without sending telemetry back. And if you missed that cycle, you were a whole year out to be able to get things out.
That was the problem that they were trying to solve by introducing DevOps and trying to release much more rapidly.
And the whole point here isn't that these two applications were successful for us in DevOps. The point is that these are very mission-critical, absolutely impactful-to-our-revenue applications, and we were able to apply the same kind of principles that a lot of the greenfield and unicorn applications we had seen in the past be successful.
Olivier Jacques
So as we entered DevOps, and we hear lots of definitions about DevOps, we really wanted to come to one common shared understanding within our IT organization on what DevOps meant to us.
And if DevOps is the extension of Agile to operations, say, well, awesome, because Agile, we have the Agile Manifesto, and I'll be able to just put that on the walls, and everybody will be happy and my mission is done.
So we did not find any DevOps manifesto, and we understand why. It's because DevOps is kind of whatever you want it to be for you.
So this is what it is for us. This is our DevOps manifesto, an internal one. And a number of things there. And actually, throughout the presentation, we are going to highlight some of the principles there.
Just going to mention that it's many things that you already know, but we optimize for the system, so we try to make sure that we don't optimize for silos. The pipeline is also something that is extremely important. And the teams is also something that obviously is extremely important.
Rafael Garcia
One thing I'll mention is that when we first started this pilot, we brought in a number of teams that were already pretty mature. They were doing continuous integration to a certain degree, doing some continuous delivery.
And we brought in all of the groups that were representative of their services. So not just dev and ops, or not just infrastructure, but also support, security, change control, et cetera. And what we found was, when we got all these people into the room, there were already multiple DevOps initiatives going on.
Dev had a DevOps initiative. Security had a DevOps initiative. Operations had a DevOps initiative. And they all were thinking that they were operating and moving forward to this vision that they had, but they weren't talking to the others.
So it really wasn't DevOps. It wasn't about breaking down the silos. So this manifesto, the whole point was: get a common language, a common understanding that crossed all those barriers.
Ashish Kuthiala
I think, Rafael, I would add to that, a lot of companies we interact with actually struggle with how to get started. And what we did, in addition to this, also was to get some really high-level executives in there as well.
In fact, our CIO from the HPE IT team spent a week in that room with us as we planned this out.
A week from a CIO is...
Rafael Garcia
Not just a week in the room, not just sitting on the side.
Ashish Kuthiala
Right.
Rafael Garcia
Actually interacting. And when things came up, like security would say, "Oh, there's no way you can give access into production from developers. You've got separation of duties to worry about," or any one of the natural, immediate reactions you get, he was able to say, "Look, we have to approach this in a different way. We are going to move much more rapidly. Now help me solve that without losing our security, without losing our compliance."
Ashish Kuthiala
So that was really important. A lot of you have questions: how do you scale this up? How do you take this across the organization? That was an important element for us.
Rafael Garcia
Agreed.
Ashish Kuthiala
Next slide.
So the first thing we paid attention to, among other things, was collaboration among the teams. So all of you are familiar with collaboration is important in the dev and ops and security and production teams.
But honestly speaking, with the current structures we have in some of the large enterprise companies we work for, collaboration is really easier said than done. How do you take these different teams, focused on their own goals, focused in silos, and actually start to get them to collaborate?
So one of the earlier decisions that we took, and this is just a sample of how our organization looks like. It's the dev and QA team, infrastructure team, but even within these, there are sub-silos that work with each other. As you can imagine, teams focus on some particular tasks. They're functionally divided.
To the next slide.
So other than the manifesto that we talked about, where we had made it very clear that as we go down this journey, we are going to have empowered, self-organizing teams that buy into the manifesto but do the right things to take us there, we thought hard about, how do we actually break the silos without reorganizing the teams themselves?
Because as you all go through this, it's really quite hard to take teams and break down the structures and reorganize them, right?
So we looked at what kind of technology the different teams are using, what are they actually working with very comfortably today? And we found that a lot of our teams were actually using chat rooms within their own teams or across teams, but they were quite siloed.
So we looked at ChatOps and adopted it.
And so if you go to the next slide: what is ChatOps?
I'm assuming a lot of you are familiar with ChatOps, but the idea here is to have persistent chat rooms that are always on. And teams that are especially siloed across different global geographies, that are siloed across different organizations, this became a very effective tool for us where the teams could collaborate with each other.
They could come back and look at something if they were gone for the night. They could actually be very transparent with each other about what they were doing. And it's not just collaboration between people and teams. It's actually collaboration with systems, because you have bots running and interacting with the ChatOps rooms.
Rafael Garcia
In fact, that's the magic, right?
Ashish Kuthiala
That's the magic.
Rafael Garcia
So persistent chat rooms have been around for a while and everything else, right? But when you introduce the bots and the ability to interact directly with the ecosystem, and the environment tells you what's going on, and everyone across the group actually sees that at the same time, that's when we found the magic of these teams actually working real time together.
Ashish Kuthiala
Right. And the transparency that actually became very apparent between different teams was a really good value driver on how these teams collaborated. Nothing was hidden anymore. Nothing was, "This is how I need to do it," or, "This is how I do it, and you can't move forward without my knowledge, my team's knowledge."
Everything started to become open, more collaborative. So we forced collaboration through the tools that they were also using.
This is an example of how we integrated bots with our different monitoring systems, for example. So everybody knew what the problem was. You could run commands to get some event data or the status from machines. You could trigger actions from within the chat rooms for the machines to take actions.
Rafael Garcia
And so one other comment around that, though, is that we don't really prescribe what those monitors are or what those actions are. We do make available the bots to all of the various teams, but the teams have the ability to customize their environment as they see fit.
You don't want to force people to do something that isn't valuable to their particular environment, and you don't want to introduce into the chat environment information that's really not processed or used for that team. It then becomes noise, right?
So we leave the decision on what specifically they put into their ChatOps rooms to those particular teams.
Ashish Kuthiala
Right. We were just having a conversation before this presentation where the question of compliance and regulations and audits came up. And we found that ChatOps actually really started helping a lot of our internal regulation and external regulation teams because it leaves a clear audit of who did what, where, when, and what the problem was, making it much easier for us to actually go and help those teams as they got involved in this process as well.
So the other thing we looked at carefully and we noticed is that people actually use chat rooms a lot to interact socially as well. So how could we take that and keep the fun in it, but actually add work aspects to it?
So we found, as we adopted ChatOps, we found more and more people using it to not just interact on a work basis or solving a ticket, but also to be social and exchanging conversations about current political events or sporting events and other things that they could share with their office colleagues.
So we found this is a tool that people have started to rely more and more to interact with each other as well as solving problems. So it's become ingrained as a way for the teams to be collaborative with each other.
Rafael Garcia
So we heard Jason talk about matrix, right, and embedded groups that try to, without changing the organizational structure completely, be able to work across the various silos, right? For us, this ChatOps environment became the mechanism of being able to do that without doing some kind of massive organizational change.
The other aspect was, as we had these teams initially working together, you brought together an infrastructure person, a support person, a dev person, and you told them, "Okay, act as a team. I want you to have a common objective. Go deliver this service."
Instead of immediately starting to act like a team, they started calling on each other to help them remove the red tape or to just do things the same way we were doing before, just with a personal contact, which isn't what we really wanted to do. We wanted to rethink how they work together.
So bringing them into the ChatOps environment and having them engage on a daily basis, real-time, constantly, and transparently, like Ashish was mentioning, and seeing the responses of the ecosystem, what's really going on in the environment and interacting with it, that's what started to get these guys actually working together as a team and thinking together and helping each other and identifying that, "Oh, wait a second, don't go do that because this is going to happen."
Those kind of interactions.
Olivier Jacques
And Rafael, I think there was also an unexpected effect that we saw with ChatOps, is that because all of a sudden you could interact with your automation much more easily with chat.
So, for example, running an automated test suite is not something that only that guy knows, right? Or restarting a system, or load balancing 20% over there and 80% over there, it's not something that we have to talk to the specialist. It's something we can trigger, if you have the appropriate privileges, through chat.
So what we saw as an unexpected effect is that the automation was all of a sudden much more useful to many more people. So the investment in the automation was not only about supporting quality and speed, it was also something that was more valuable and in the hands of more people.
So we raised the usage or the reuse of automation.
Ashish Kuthiala
Again, this slide illustrates we still have our silos. They haven't broken organizationally, but the team is working very seamlessly across these.
And I see it as a collaboration mechanism between man and machine, which is very interesting. Typically, you see collaboration between different teams, but with the ecosystem, it just becomes very seamless.
Rafael Garcia
And again, the thing to remember here, what worked for us is this was a mechanism from a technology perspective to make it happen. But that manifesto we talked about at the very beginning was the underlying key to this. We are going to work together as one team. You have your independence to do what you need, but as long as you're aligned towards that same goal, we will move forward.
I think that buy-in was really critical to make this collaboration piece work.
Olivier Jacques
Agree.
All right. So the other aspect that is extremely important for us are the pipelines, right? So the continuous delivery pipelines.
And very early on, when we started 18 months ago, we were wondering, okay... And we are originating from the tool side of the IT organization, so we were wondering if we should standardize on tools and create a pipeline that is standard and going to be the same for everyone.
Very clearly, we quickly came to the conclusion, same as John with Barclays, that one size does not fit all.
And so, just to recall a little bit what the goal of the pipeline is for us, very simply put, the continuous delivery pipeline is here to move good changes as automatically and as quickly as possible so that they can deliver business value, and reject bad changes as quickly and as automatically as possible.
That's as simple as that, right?
And there is a very important feedback loop that's there so that we can see the impact of the changes on our business outcomes.
So how do we manage really having a pipeline that is strong, that has a lot of capabilities and features? And you see some of the tools. Actually, the logo slide will be huge, so we don't highlight everything. But how do we manage to have a pipeline that's there, that is solid and strong, without enforcing the same one for everybody?
So to do that, we have anchor points, and those are the little locks that you see there. Just two examples of anchor points, because we have many others. But the idea is that we don't force everything. We don't force the pipeline in its entirety. A mobile pipeline is going to look different than a backend kind of a pipeline.
However, we have some non-negotiable anchor points. So let me highlight two of them.
One big anchor point is the source code management. So as we go through the paradigms that everything is code, we used to consider only the application code. Now we have the environment, we have the infrastructure, the storage, the machines, the network. All of that is managed as code.
So the code management capabilities are absolutely critical. So we have a very strong emphasis on actually using one tool set for that: GitHub Enterprise. They are downstairs.
But one change amounts to one deploy, at least. Meaning that even if the change is not deployed to production, because not everybody is up to continuous deployment, at least it's deployed to a temporary, ephemeral environment so that this change can be tested and so that we can have an opinion about it before it reaches our customers.
We also have lightweight peer reviews, very important. And I would say those peer reviews are not only between developers. They're also between the silos, or between developers and testers, developers and security, developers and change management.
So the pipeline and the source code management capability becomes our way to codify our change management process and all of the processes that we use, or we still have, to fulfill. But the continuous delivery pipeline is our way to codify our processes.
Rafael Garcia
And Olivier, one of the reasons that we chose one source code management system and actually prescribed that particular tool is that we also wanted a common plane of discovery across the entire organization, across the entire company.
We really want to build into the concept of reuse and making available all of these, whether it be a bot, or whether it be an actual snippet of code, or it be a compliance test. We want people to be able to reuse that as necessary.
So single plane of discovery is one of the easiest ways of making that happen.
Ashish Kuthiala
And I think transparency was another real strong consideration here.
Olivier Jacques
Absolutely.
Ashish Kuthiala
Because it was opened up to everyone, not just for reuse, but peer reviews, which was part of the promise of the transparency in the manifesto.
Olivier Jacques
So peer review is not only between developers, right?
The other anchor point that we have is the change management. So who has a CAB here, change approval board? You can, it's up. Okay. So we do too. It's okay.
But as we moved to DevOps, we found out that our CAB, our change approval board, was more tailored to manual work. It was really optimized to involve people and to do this task, and then this other task next, and this is where you have a lot of latency in our processes.
We found out that as we do a good job with the continuous delivery pipeline, we can also codify the change management process. But we need to make sure that we record all of the changes properly so that we can be audited, and also making sure that we have the controls in place so that changes can be approved and denied, et cetera.
So the change record service that you see there, sorry, it's very technical, it's an API. But basically, it's a way to connect our continuous delivery pipelines to a system of record that is going to make sure that all of the changes are recorded and can be audited, and can be also prevented before they happen if we don't agree with them or if there is some kind of a freeze period.
And we work with our auditors and our security teams for that, and when they saw that, this is really something that they loved.
Rafael Garcia
Yeah. So this is a change from a six-week, 40-artifact, manually created process with a CAB review board that doesn't really know the asset or the application, kind of checking down the list whether you submitted this or submitted that, over to an instant creation of an eight-artifact, or eight-piece-of-information file that gets pumped to a change record service, which becomes our system of record, essentially, for change, and automatically pushed every time that you have a release into production.
The other key is that the auditors can look at that information, and it's the very bare minimum that we work with them to make sure that it's what they need to be able to know that change happened and that the right tester happened and all that kind of stuff.
But then they can always reference back into the pipeline itself, to the source code management system, the builds, the tests, all the records, and get as much information as they could ever want that's automatically generated, and therefore not gamed the system because it was someone trying to get something pushed into production or whatever it may be.
So the auditors actually love this approach for us.
Olivier Jacques
Yeah. And the big aspect that we continuously get is also telemetry.
So telemetry or monitoring was something that was more the ops privilege. And as we moved forward, we added a lot of telemetry everywhere. We are measuring a lot of things. Actually, probably more things than we could actually even look at. But it's a good thing, because any time we can have a look and see some business metrics.
And one of these focus on business outcomes, and this is telemetry from the...
Rafael Garcia
User experience perspective.
Olivier Jacques
...the Sales Comp mobile application.
So this is for the mobile application. And you see here that we aggregate a score, which we call the FunDex, but it's an aggregation, a composite between UI performance on the mobile, stability of the app, does it crash or not, and also the usage of batteries and usage of network.
And this is really, again, having telemetry that is focused on a business outcome and delighting our customers, in that case, our salespeople.
Rafael Garcia
Sales force.
And so, again, we said at the beginning that we didn't think that we should prescribe a particular pipeline. But we understand that that also means that you can't just leave it to chaos and let the teams go do whatever the heck they want to do, right?
So we came up with the concept of these buoys, not boundaries. So for us, what we do is, for the teams, we establish a vetted pipeline, or a series of vetted pipelines. And these are very well-tested, very easy to consume. If a team doesn't know what they want to go off and do in order to be able to create a pipeline, they can leverage this very easily from a central service.
If they do want to explore, though, they're buoys. They're not solid boundaries. You don't have to stay between the lines. You can go explore. And that's really critical in the DevOps world because the world's changing constantly.
If you're worried about some kind of central team doing some kind of assessment that takes six months, nine months on new tooling, then by the time they're done assessing, the new tool's already out. There's something else already out there.
So you need to give your teams the ability to explore the edges. So again, this concept of buoys, not boundaries for our teams.
And it goes to the next one, which is the final item.
So a lot of what we talked about just now was technology, ChatOps, the pipelines, things like that. But at the core of everything is the people, right? And the real big change in DevOps is establishing a trust culture. At least for us it was.
So we traditionally are a command-and-control kind of model. So we establish a process or a policy to be able to drive a certain behavior and push that policy down and force everyone to do it that way.
And the concept behind that was almost to protect people from themselves. They can't possibly make the right decision for themselves, so build a process around it and enforce it through that.
So instead, we've shifted into this concept of integrated and empowered teams. So instead of establishing a process or policy to drive X behavior, what is it that you need to put in the hands of the teams for them to reach the right decision that takes into account the nuances of their particular situation?
And that is the right skills. So you integrate the team with the right combination of skills, whether it's DBAs or networking or support, integration, so forth.
It's the right visibility. So with ChatOps and some of these other mechanisms, we've shown the teams transparently what's going on in the environment. So you need to give them enough visibility to the rules, to the policies, but also to what's going on in their environment.
And then you have to give them the right permission to be able to make the changes that are necessary, the authority to enact that change.
And then what you're reaching a point of is giving people the ability to make the right choice for themselves. And therefore, they avoid some of these situations where they're following a policy or a process, but it's essentially the fine one that doesn't take into account their scenario, and therefore they end up doing something totally counterproductive. Or they end up fighting the system for six months, hating their job, and then finally be able to push the thing in that they knew was right to begin with.
So, and this is built on this concept of a minimum viable process as well. So we really try to think in terms of not having that process, but empowering the teams. And the process is the last default, or at least a centrally defined one is the last default.
Ashish Kuthiala
Right.
So getting back to the manifesto, we touched upon two or three of these points and emphasized what our challenges were, how we actually approached them. And we're seeing a lot of these manifesto principles are still holding with our teams. We visit them occasionally. We make sure that they're still working, and they seem to be working.
Again, I think the emphasis here is, like Rafael and Olivier said, transparency, trust, letting people do the right things, and then codifying everything that you can.
Rafael Garcia
Give them the tools and the ability to do the right thing. That's right.
And then as far as where we are in our journey, just to close out here. So we started off with unicorns, with teams that were within the organization doing things and pushing the edges, right?
We structured them together into a pilot program to test whether these DevOps activities could really work in some of our real enterprise use cases. And you saw a couple of the applications that are very mission-critical, highly integrated into the environment, and critical to having a significant impact to revenue. And those are part of what we've tried to move with DevOps.
So it's not just the unicorn kind of things.
Once we established that these practices really do work, we've started the job of scaling it out to the enterprise. So how do we make these things systemic, not exception-based? How do we drive this knowledge to the broader teams that weren't in the original pilots or that haven't organically become aware of what's going on?
So one of the concepts that we actually borrowed from the Target group is a concept of DevOps dojos. So we really believe that DevOps is not something that you can learn from listening to people. It's not something that you can be taught. It's something you have to actually live in order to have that light bulb go off, that, "Ah, now I get it. This is what you actually meant."
Because it really is a mindset shift, not just a bunch of automation thrown in.
So these DevOps dojos are seven-to-ten-day internal experiences that we've created for our various teams, and it's tailored to their particular challenges, whatever that challenge is.
And then we embed a number of evangelists that already know the DevOps mindset with that team we're trying to introduce to DevOps, and help them overcome their particular real-life challenges they're having, but with DevOps-y kind of things associated to it.
So it might be throwing in ChatOps for collaboration. It might be creating a pipeline. It might be a number of different things that are specific out of this huge DevOps portfolio of mechanisms and tools and approaches that might apply to them.
And that way, you make it really personal. And that's what triggers that light bulb moment, "Ah, now I get you," and they can carry that forward.
Ashish Kuthiala
Right. So just a last final point to that. Other than the dojo, the other thing we are finding out is Hewlett Packard Enterprise, being really a collection of a lot of different business units, HP IT runs horizontally across it.
We have a lot of our own software product groups that create products and solutions for our customers who also have wanted to onboard, who've also wanted to build things with the DevOps methodology. So they also were doing a lot of DevOps adoption and principles, and we found out that there was a lot of value in these organic groups across the different teams.
So we also started this concept of meeting once a quarter virtually and sharing best practices. So we call them DevOps sharing days. We've seen that there's a lot of sharing. I mean, we have at any point when we do these days, 300 to 400 people signed on, listening, providing input, learning from each other.
So encouraging that community inside of your organization is also important.
Rafael Garcia
Yep, agreed.
Ashish Kuthiala
Yeah. I think...
Rafael Garcia
So I think our time is up. But basically, we also have a booth downstairs, so whenever we're over here, you guys are more than welcome to come by and see some of the solutions that we have out there.
Thank you.
Olivier Jacques
Thank you.
Ashish Kuthiala
Thank you.