Log in to watch

Log in or create a free account to watch this video.

Log in
London 2017
Share
Download slides

Sorry Mr(s) Ops, We Hadn’t Forgotten You

Hiscox are at an inflection point on their DevOps journey where Cloud is about to reshape the IT function (yet again).


In this session Jonathan will talk about some of the challenges and successes of trying to become a “Cloud first” organisation and how this relates to their previous work which has largely focused on the development & test teams.

Chapters

Full transcript

The complete talk, organized by section.

Jonathan Fletcher

My name's Jonathan Fletcher. I am the interim group CTO for Hiscox. Interim, I guess, while my boss tries to work out whether I'm an idiot or not.

I was an architect until last year, and I think my promotion to CTO is less around me being intelligent or any good thing I do, but more around the transformative and business value that DevOps can bring companies. So I think that's a really exciting thing.

Hiscox: who are we? We are a specialist international insurer. We insure all kinds of random things like space rockets and certain footballers' left feet. We wrote about £3 billion worth of gross written premium last year, about two and a half thousand employees. So not massive, nowhere near sort of Jon Smart's Barclays, but big enough to start to become interesting.

Last year, we talked around a couple of things, and really it was around the journey of DevOps so far at Hiscox. That started with us moving from a very large, centralized IT team to federation to individual business units. So IT is now in each of the businesses rather than one central mass.

We talked about some of the efficiency gains through continuous delivery and Agile, but really that we aren't really doing DevOps yet. We're doing continuous delivery, and that's something that's very different. That's half the battle, and through my talk is on: where's the ops side to this equation?

And we talked about, very early, that it's the start of our move to the cloud, and hoping that if we can combine these tools, processes, cultural ideologies, that we can get towards this mythical thing called DevOps. That on the right-hand side, by the way, is a guy deploying an application from his Boris bike outside St. Paul's Cathedral, which is not that useful, but it's very good for conferences.

I stole this out of The DevOps Handbook from Gene Kim, wrote last year. It's from a presentation by Adrian Cockcroft. And just stepping back, really: history of IT over the last couple of decades. He argues that each decade is characterized by a certain type of technology, and that as we move through time, the risk and cost profile of making change decreases.

So 1970s, big, heavy mainframes. We moved to client-server, and then we moved into sort of the 2000 to present, which is about commoditization of cloud. And that journey through, as I said, is about reducing risk, reducing costs to companies.

In 2004, I went to a presentation at TechEd, and a guy called David Chappell put this slide up, which really struck a note with me, who argued the three most important things of this decade are the Salesforce IPO, that shows that SaaS was a working model; the launch of AWS in 2006, which showed the public cloud is a viable business solution; and the iPhone, so the first smartphone, which I thought was a quite interesting thing, actually.

So out of those three things, the one that really drew my mind was the AWS discussion and public cloud being a viable proposition.

In 2016, Puppet and the DORA guys came up with some analysis of marketplace and characterizing high-performance IT. And they said there's basically three buckets of companies: high performers, medium performers, and low performers. And they are categorized by the ability to deploy software, the lead time for changes, the total time it takes to recover from an outage, and the change failure rate. And you categorize each of those companies.

I thought actually that's quite interesting, because when we come back to that slide, I think what we're missing is, first of all, the delivery methodology. So each era, yes, is dominated by a technology, but also by the delivery methodology that's inherent to deliver that technology.

So I think 1970s, big, heavyweight, big risk-based delivery methodologies. Today's high performance, which is more Agile, Lean, DevOps. And actually that cloud is such a fundamentally different thing from commoditization and virtualization, I'd like to break that out into something that's different. And again, it's another step change in the risk profile and pace of change and cost of implementing systems.

So I think Hiscox are kind of there-ish. I think we want the capability to be able to get there, and I say capability because it's about doing it where it makes sense. I think some of our legacy systems that are going to die on the vine in two years, it doesn't make sense to invest vast amounts of money increasing that change profile.

So talking about cloud at Hiscox, where we started. We originally had a look around the marketplace. We used Gartner quite heavily for our product selection. We looked at IBM SoftLayer, Google Cloud, Rackspace, and they were largely ruled out because the top two dominant companies were quite clearly AWS and Microsoft Azure.

Doesn't mean we can't order that later. It just means we'd start somewhere. Sort of Agile thinking.

And generally, when I come to talk about the business about cloud adoption and the pushback we get, I think having that third-party validation from someone else goes a long way. Maybe they just don't believe what I say, but having an expensive vendor does make a lot of difference.

And after a set of proof of concepts, we selected Azure over AWS. We are traditionally a Microsoft house, so a lot closer alignment for us. And generally, we felt that they were more geared up towards the needs of an enterprise. So you can change your contract with Microsoft. I found that the support was better. It just felt a more natural fit for us.

And so far, it's been fantastic. We've had some really good results. Later in the year, we're moving out some really large systems. They'll be production systems, so it'll be running hundreds of millions of pounds worth of revenue through the Azure platform.

How are we doing it? So one thing that worries me a little bit is that until we decommission all of our on-premise infrastructure, we're likely to see the total cost of IT running will increase overall. So it's very easy to ramp up your cloud expenditure. It's very difficult to ramp down your on-premise infrastructure if you're trying to transition state.

Therefore, we need to get from one state to another as fast as possible. And as an architect, I was always against the idea of lift and shift and moving one data center effectively to another. And I always felt like all you're doing is shifting the problem, and you'll only get a subset of the benefits of cloud by doing that.

But now I'm a corporate sellout, and I've a C in my job title, I'm having to be a bit more pragmatic. So if there's an ROI for moving an application, then by absolutely do it. If you can move it to PaaS, then that's fantastic. Otherwise, our approach is we can lift and shift as much as we can, as quickly as we can, so we can turn off our data centers, move it, and then fix it.

We started small. We learned through some really small applications that if they fell over, no one was really going to lose their temper about it. But we're now at an inflection point for us. So three of our biggest systems go live in the production sense this year into the cloud, in half two. And as I said earlier, this is talking about hundreds and hundreds of millions of pounds worth of revenue going through these systems.

So it's a big bet for us, not only from a technology point of view, but from a business point of view. We do that with the thought process that actually the technology transformation, like all the things we talk about in these presentations, is only part of the problem. So what we also need to look at is the people, process, the technology, the culture. All of these things need to be addressed.

And actually, I think when you start talking about cloud adoption, you start peeling the covers back of all the different things you need to go and look at: the way it changes your financing, the types of people you employ. It really is a transformative move.

So a brief history. And I get in trouble when I show this slide because my colleagues think I'm comparing them to Neanderthal man, which is an unfortunate thing, but it's meant to show a set of evolution rather than me saying that they look like monkeys.

So sort of 2008, we started very waterfall-based, very ITIL-based. Our data center was in the basement of our main headquarters until it got flooded, and we thought maybe that's a bad thing to do.

And then as we go through time, we start to bring in Agile. We start to move to co-located data centers as the company's massively growing. And we're kind of at this point now where we're doing some stuff with Agile, we're doing stuff with continuous delivery, we're starting with cloud.

But I think really the future and where it holds is 2020. And I think this is where evolution becomes revolution: when we're cloud first, when we can align the tooling, the processes, all of the ideology that we've put into the dev world into the ops world. That's when I think DevOps will happen.

So I'm really excited about that and what that will mean to the business, what it will mean to the change cadence, our ability to respond to the market. And that's really going to change us as a company. That's really where IT becomes a competitive advantage rather than a subservient order taker.

So 2020 is about revolution, and we sort of end up looking like the Terminator.

I've coined this phrase, the IT-less IT team. I kind of, in the future, think that there won't be an IT department as such. It will be so merged with the business that you won't really understand what is the IT department.

Interestingly, I was talking to a guy last week, 23, just come out of university. He had got the cloud guys to give him a two-terabyte set of storage in Azure, and he was running really advanced R language analysis over this massive data set. And it kind of struck me that the millennials and the people coming out of university are going to have these skill sets. And actually, this divide between what is IT and what the business does will start to collapse.

So really what I want us to do is for Hiscox to become a competitive advantage, I think is the focus of IT for the next couple of years, and to be able to disrupt what the business is doing and to really add value.

Also as a rapidly growing company, again, this is part of my C-suite sellout, is about flattening our expense ratio. So yes, the company's profits are taking off, but if we grow our bottom-line expenses at the same rate, there's really no point. So really it's about doing more with the same, sort of to flatten that expense ratio.

And we will do this through renting stuff. So where we can, we're going to borrow services before we buy them. We're going to rent them before we buy them, and lastly, we're going to build stuff out. So we're really going to try and get away from building stuff and try and rent it instead.

IT will be brokers and integrators of technology capabilities rather than creators of them. And then, as I said earlier, hopefully all parts of IT are using the same processes, the same tool sets and ideology. And BizDevOps, so I think that business are ingraining IT as IT is ingraining the business, so that's quite exciting.

As part of this move, we've looked at our traditional IT operations and infrastructure teams. So that's called IT services in Hiscox. At the moment, we divide those teams up by technology silo. So we have a Citrix team, we have a storage team, we have a networks team.

Each of those is led by a tech lead that manages the team, manages their performance reviews, manages a bit of the road mapping going forward, and their everyday workload. Which is fine, except you get a lot of silo mentality behind the teams.

The tech leads are very overstretched because they're trying to do line management at the same place as trying to be that sort of technology strategist in each of those silos. And all those teams are spread out across multiple different regions. So on the second or third slide, we're in, I don't know, 20 different countries. Trying to align all of these different people in all these different time zones for a project is nigh on impossible.

So what we're going to do is change this, and we're moving our traditional infrastructure teams to a more Agile approach. And what we're doing is rather than working on the horizontal, where you're looking at technology silo, we're going to do it the other way and go vertical.

So your vertical section will be in a particular office, and that will consist of a Citrix guy, a DBA, a networks person, a storage person, all in one team, rather than that technology silo.

To the left there you can see the platform teams. The platform team are the guys that are really looking after the long-term strategy of these individual technologies and trying to create some governance and consistency between each of these different teams.

If you've got two DBAs in each of the silos, then it's quite easy to imagine that they're not going to talk to each other, and actually, then it's going to be the Wild Wild West. So having a team that can help come up with some of this strategy and direction is going to really help us.

And we're in the middle of doing this. So we have two guys at the top. So one is the business service manager who's really looking at project-related work as getting stuff from the business. And on the right-hand side, we've got our operations manager who's really looking at stuff that comes through the ITSM route. So tickets and tech debt and sort of that BAU churn. And they can combine stuff at the very highest level into backlogs that these individual Agile teams are going to consume.

On the right-hand side is the guys that sit inside my team. So these are the technical design authorities. These are the platform guys that are responsible for the individual architectures for things like Active Directory and storage.

One thing, this is a blatant rip-off from Spotify, but one thing that we worried about was really that absence. It's great to have that technology and that delivery focus on the vertical, but on that horizontal, it's about creating that governance and that consistency across the whole of Hiscox.

So we created this thing called tribe. So a tribe is a knowledge-sharing, led by a TDA, capability to try and help people do things in a consistent and common way. And we'll see how that works out. We've just started on this journey, so results are not yet quite concise.

Another interesting thing I thought about, actually, when we started doing this is if we want to bring these two Dev and Ops worlds together, that actually they tend to use very different tooling.

So the devs, very focused on application lifecycle management. They do lots of stuff with things like Jira. They're very used to working in an Agile way. The ops guys have a very ticket-based approach, so they have a lot of stuff that comes through via IT service management tools. Two massive, great silos between the two.

This is a screenshot of our pretty awful ticket management system. And it struck me as I was looking at this that actually it's just a unit of work. A ticket is just a unit of work. It's got a priority. It's been assigned to someone. It's got some stuff in that ticket that needs to be done.

So I thought, well, actually, that's not that far away from user story. It wouldn't take that much effort to take that and just, if you were to turn that around to describe value and a bit of wording, you could probably put that into Jira, right? That could become a user story in an Agile parlance.

In an ideal world, we'd have one tool, right? So if our IT operations infrastructure staff are consuming tickets and they work in an Agile way, could we turn those into user stories and they could work in an Agile way? There doesn't seem to be a tool that allows us to do that.

So I think there's still a place for IT service management and having change management databases and self-service and all this kind of stuff. And I think it's pretty unrealistic, if you want Visio installed on your machine, that you're going to raise a user story and you're going to submit that into Jira. So we need some way of matching end user up with workload.

Not entirely sure how we're going to do that yet. We're trying to fudge our way through it. But it does seem counterintuitive, right, that you have two different systems for two sets of users when ultimately the aim is just to manage and prioritize the activities of that team.

DevOps is really about breaking down silos, not reinforcing them through technology. So I hate the idea that these two teams have got, "That's your tool, that's our tool, and never the twain shall meet."

And I just have a fundamental problem with tickets. Tickets perpetuate a, "It's not my problem" kind of attitude. You know, "I've done my bit of the ticket, now I'm going to assign it to someone else. It's your job to go and look after that ticket now." That doesn't seem like a DevOps thing. "It's not my job" is a very anti-DevOps pattern.

They tend to sit in queues. So if you've got an SLA of two hours for a ticket, probably you're going to wait two hours for that ticket to start to get looked at. So that's slowing down that time of resolution. And I think when tickets are moving around, it's sometimes harder to see where they're at, what's happening. They're not highly visible, I think, like a lot of stuff in Agile. So you won't have the morning stand-ups, and you don't see it progressing through a Kanban board, and yeah, tickets bad.

So we're off-piste now to find a new ITSM tool to help try and bring these worlds together, and whether it's integration or whether it's one tool, I don't know. But this is certainly a challenge that's close to my heart.

I wanted to share some common challenges with you that I've heard or have been addressed to me when we've moved to the cloud, and what I've done to try and argue the point.

So originally, back in 2015, I think it is, when we started, our original drivers for cloud adoption were these nine things. The orange tick says there's some benefit we get from it, but it's not a key driver for us. Green is that really is important to us.

So increased scalability. Hiscox isn't massive, 2,500 people. Scale isn't a massive challenge for us. We don't need to spin up 10,000 servers overnight in response to demand. It'd be a good problem to have, but it's not something we have. We have it in parts, so some of our modeling systems, we need to really have big compute power.

Elasticity, the ability to spin stuff up, spin stuff down, and pay for what you use rather than having a fixed overhead. Yes, we take advantage of that.

The cost one is really interesting. So lots of people talk about, is cloud cheaper? Is it more expensive? It's not really a key driver for us. If we can do it, then great, but really, there are other values to be gained from cloud rather than just cost. I'm talking about resilience. I'm talking about pace of change. I'm talking about some of the other points a bit later.

So the first one, people say cloud is expensive. It can be, but our move isn't primarily motivated by cost. I think we can get some benefits from it. So if you can re-architect your applications to work with platform as a service, then we work out that it's about three times cheaper from a total cost of ownership.

TCO's really hard to calculate. So it's the personal data center, it's the patching, it's the finance team going out and to have to buy a server. It's all those different things.

Infrastructure as a service is roughly equivalent. We found maybe a little bit more, unless you can pare it down.

That said, it's very difficult to compare the two. It's very difficult to compare cloud and on-premise infrastructure because they are different. It's not just about the running cost of that infrastructure. How many costs to spin it up in the first place? What's the cost of waiting to provision? What's the opportunity cost of the people plugging servers in? How much does that benefit the business?

Internally, we'd never build some of these capabilities. So we've now got SQL servers in multiple different geographic regions around the world, but we'd never do that in old world. So how can I compare the cost of it? Because it's fundamentally different.

We ran an IT strategy a couple of years ago that actually talked about an increased need for pace of change and reliability. It's not just about controlling costs. And the move to cloud might incur lots of other costs: automation, monitoring, staffing costs.

I found that actually to get started with the cloud, there's quite a long lead time to do the automation before you get any of that ROI back. So you might have to sink in quite a bit of money, time, and effort before you can start to make the payback from that automation.

The other criticism was cloud is okay to put your photographs. It's not really ready for the enterprise. And I kind of turn that lens back to ourselves, and I look at some of the incidents we've had over the last couple of years. And these are some examples of some outages we've had at the company.

Quite a few of them have been sort of SAN. I'm not going to call them SAN. They're array problems over the last couple of years, where the cost on the right-hand side is our internal lost time. So actually, the total cost to Hiscox is substantially more because our external websites are going down. We can't write business. But our internal recharge is that.

So 2014, we lost a million pounds twice during that year, and I think as the company continues to grow, next year, we could be looking at the next outage that could cost £5 million.

So if I look at ourselves, the resilience we've got, how effective is that? Yes, we have a fairly good day-to-day working resilience, but when the proverbial really hits the fan, what are we going to do? And I think the cloud is going to offer us so many more capabilities.

I look at the data centers in Azure. I think there's 13 in Dublin alone, and we've got stuff in Amsterdam. And it's just the level of resilience that they invest. Each data center is $1.5 billion. We're never going to be able to compete with that sort of level of investment.

Cloud is a value proposition. I don't think there's a cost one. And back to this point about creating competitive advantage. So you get competitive advantage by disrupting away from your current state. So if your guys can look at doing things in a more innovative way, reducing cost or scalability or speed to market or understanding the business better, that's how you gain competitive advantage, not by undifferentiated heavy lifting and maintaining the status quo. You're not going to get to where you want to get to by just staying still.

You guys have probably seen this slide 100 times in various things, which is all the different cloud service types. And again, as I look through this, if you look at the bits in gray, the bits that go away when you move to the cloud, well, what happens to those people that used to manage that stuff, right? What happens to the guys that used to look after networking and storage, and all this stuff is kind of moving away, right?

So the real benefit to the business is more in the application and data bit. That's where the business value is. All the bit beneath there is just heavy lifting.

So for us, it's about freeing up resource to work on high-value tasks, and this is about increasing competitive advantage. It doesn't mean losing people. It means changing what they do and how they bring value to the organization. So rather than employing people to stack shelves with milk, we're going to build the machines that stack them for us.

Geographic reach. All of Hiscox IT is served out of a data center in London, but we're expanding globally. We have offices in Singapore and multiple ones in the US. For us, the shop door is always open. It makes sense to have our data near the end user. Therefore, serving everyone out of London doesn't really make sense to us.

And because the door's open, we have no time to do maintenance, really, because if I have to take a system down, someone's going to be affected around the world. So cloud really helps us locate data in the right geographic region, trying to move everything out of London.

As a platform for the future, cloud spending, the cloud vendors are spending billions on some of these new capabilities. A lot of them aren't relevant to us now, I don't think, but it's about what happens in the future. What are we going to need in the future? We can't predict the future, but we can help respond to it.

So this is about getting on the front foot and really opening those capabilities up. We can never do this stuff ourselves. We're never going to be able to scale what's going to be able to compete with an AWS or an Azure.

This is one of my favorite slides. The move to the cloud, lots of people are going to tell you it's insecure, you can't do it, there are loads of regulatory reasons why you can't do it. And I put a slide up that shows our Hiscox security standards. And I put this slide up, which shows the Azure ones, which kind of looks a little bit like this.

And I think really, that's game, set, and match me, because you can't argue with that. I think once people see that, they're like, "Oh, right. Yep. I don't think... Yeah, we can't compete with that."

So the security thing is a massive challenge. I think it's too easy as a fob-off to say cloud is insecure. I think if you do it properly, the difficulty is that it's a shared responsibility. Whereas your own data center, you're looking after firewalls, you're looking after permissions. Cloud is now a shared responsibility, and I think there is a cultural challenge to sharing that security with a vendor.

A little bit our progress. So we started in 2015. We've had four phases. We started really right back at the beginning. So as a response to some of those storage issues that I showed you was, is there a better way of doing stuff? Can we get some better level of availability in the cloud?

We took a load of infrastructure guys. We took a load of guys from my platform services team that are sort of DevOps specialists. We put them together and we said, "Look, can we make this work? Are we going to get any benefit from it?"

We did some really good stuff around that. We moved to do a proof of concept with doing stuff with infrastructure as code. So is this going to get us any benefits? Really to start to train some of the team that never touched this stuff in some of the infrastructure as code concepts.

And now we're more into the delivery side. We're establishing a base capability. We're moving some apps over, and this year is really around scaling what we've done.

We've got three sources of stuff that we're going to move to the cloud, and I guess this is probably fairly generic. So we've done a cloud readiness assessment of our top number of applications. Is there any benefit of moving them? How difficult is it? How risky is it to do it? If there's a clear ROI, that makes sense to move them out to Azure.

Stuff that's refreshing. So every year, the hardware might be going out of date, the operating system might need upgrading. At that point, maybe that's the right time to move that workload into the cloud. And then for anything new, that's a straight to our cloud platform decision.

This is where I get really interested, is about the skills that are needed to move to the cloud. And I think this is really interesting for me. So the use of cloud, infrastructure as code, really changes the core skills that are needed in the IT department.

And for me, it's about attitude. So our toolset for moving stuff into cloud is so vast, not everyone can learn everything in that pipeline. So it's really, if you've got the right attitude, you've got the right capability to go and learn stuff, that's much more important than technology skills.

And some of the infrastructure guys are quite almost overwhelmed by the amount of stuff they learn. If you've never touched Git before, they don't even know what version control is. It's go right back to beginnings, right?

And it's like asking a cat to become a dog, right? You can't just say, "Right. Off. You're all programmers. Get on with Ruby and TDD and BDD and CI and CD." They're like, "Argh."

So really, it's less about roles, which is about T-shaped people. Sorry. Someone thought it was funny. But it is work in a really broad area with a specialist in one. So T-shaped people are really important to us. That's not Mr. T.

I've kind of led this transformation through skunkworks. We've taken a load of guys. We've moved them out of their regular day job. We've created a little Agile team off to the corner. But really, how do we now scale that to the rest of the organization?

How do we weaponize all of this infrastructure as code capability and all this Agile capability across the rest of the organization? How do we move it into the rest of the IT services organization?

So we're looking at changing our recruiting policy. We're looking at creating an automation academy. I hate that term, but how do we train up a couple of hundred guys in all of these different skills? Because that's just those common development skills take years to hone, and you can't just go and tell someone to go and become a developer overnight.

So what? For me, I think if we don't transform our organization, then our competitors are going to steal a march on us. But if we do, then we're going to steal a march on them. So that's really important to me, and I think it'll be really important to the company.

We've made a good start with cloud, Agile, and DevOps, but it's been largely opportunistic. We've taken things as they've come along and tried to make them better. But now it's really about weaponizing this. How do we formally digitize our business?

And we're putting the plans in now to do that. I'm going to need a massive check to go and do this. I'm going to need executive sponsorship, and I think that goes all the way up to the CEO, whether he's on board to do this. But I'm very passionate this is the right thing to do.

This isn't an incremental update to IT. I think this is a new IT department. So it's not IT 1.1. This is IT 2.0.

So thank you. I hope you found that of interest.

Just to finish, for me, how do we turn our cats into dogs is the help I need. So how do I train people en masse to move from traditional infrastructure skills into Agile, infrastructure as code, and all these new capabilities? If any of you have ideas, experience, I'd love to hear it.

So thank you very much.