Succeeding with Cloud in the Enterprise

Log in to watch

London 2019

Succeeding with Cloud in the Enterprise

Head of Cloud Center of Excellence · Maersk

Cloud is in many ways a paradigm shift, and if we do not change the way we work, we will not get the full value out of a move to Public Cloud Computing.

Rasmus helps great engineers build awesome cloud stuff for Maersk using modern DevOps practices, public cloud and hands-free automation.

Chapters

Full transcript

The complete talk, organized by section.

Rasmus Hald

Thank you so much for joining this session. I really appreciate your attention for the next 30 minutes. I put together a bit of a story from my job, and I appreciate you all listening in because, to be honest, my family is tired of hearing about it, but that's a different problem.

So, my name is Rasmus. You might hear it as a Viking name. I'm Danish, but we left the violence behind 1,000 years ago, so we're quite peaceful. And Denmark, if you didn't know, we are the home of the Lego. Everyone loves the Lego, right? We're also home of the largest container logistics company in the world, right? So we have things with stuff that stacks. So that's kind of part of the culture. But that's me: I love things that stack up.

I would call myself a bit of a cloud hugger. And as you see, the story today is about cloud, but not as a technology, as leverage to get stuff done, but also a bit of a DevOps whisperer. Now, that is not an official job title, sadly, but if you have a job opening, I might be interested.

So with that, my day-to-day job is, of course, as a technologist, but very much with a focus on ways of working. Because with the technology available today, it becomes really clear what is not working in our organizations, and that's kind of the story I put together for you today.

So let me just start by setting the stage. So Maersk, a container logistics company. We are the market leader with 20% of the market in container logistics, so integrated container logistics. So that is quite a lot of containers. We have about four and a half million containers, and I would say they're all running in production. However, only about 500,000 has an IP address. So that's the container joke of today. Might not have landed it, but I'm going to stop there. We love Kubernetes.

So we are in a bit of a challenge. Like any other industry, we are seeing this world of digital and are quite excited about the possibility. But I wanted to paint a picture of the situation that we are in as the market leader within this space.

So we are, of course, looking for growth scenarios, but with a market that is full of providers, we are now seeing that it's becoming harder and harder to really utilize our network, so our routes of ships, and stacking up the ships and filling them up with containers. So the market is getting fuller, so our growth opportunities do not lie in adding new container terminals or adding new container vessels, but could be on this digital overlay, right? Just as you've seen in the story this morning, we are looking towards that.

But we're also being disrupted by a small startup called digital freight forwarders that are facing our customers, removing the complexity of our industry with smooth digital solutions about end-to-end logistics, right? So they are smoothing away the pains of dealing with a big company like ours. At the same time, we see huge companies like Uber, who recently launched an app in the US, where now truck drivers can get routes on their smartphones, and they're completely smoothing away the process for a truck driver. And we see Amazon and the big fears, of course, big container vessels with the Amazon logo on it, right? That's really scaring our board of directors. So it's not happening yet. Thank you so much for giving us a break.

I would say the last thing I want to mention today is the political disruption that's going on. So I don't know about you, but I care a lot about our planet, our species, and the sustainability of the human race. And we are a bit in a challenge, right? Just to put it mildly. So we are seeing now new disruptors coming, not from the politicians themselves, they do the legislation, but from our partners in the market wanting a more sustainable product.

And I'm going to do a confession now. And please, it's not the best quote of the day, but our industry, my industry, where I work, accounts for 2 to 3% of the CO2 emissions of the planet. 2 to 3%. So we are 20% of that as a market leader. So we are quite polluting as a company. However, I see this as an opportunity because we can really impact the CO2 emissions of global trade, right? That is a huge opportunity, and I'm quite excited too that we last week announced the first CO2-emission-free product, we call it, partnering up with H&M, the clothing giant, on moving their products from the Far East to the United States to Europe on a CO2-free offering using biofuels over fossil fuels. So that is quite impressive, but this is also a huge challenge for us. How do we address these environmental challenges?

So with that, that is the pain we are in as a company. And I will get to the story in a second. We need to change. We need to gear up, and we need to be able to put in the market new products faster, and that puts up a new set of requirements for our technology organization. So with that, we have the scene. We're here.

I have a bit of a story to tell you. So first, I want to mention a few of the drivers we have on our cloud journey, and why we see ourselves as a cloud-first company. Now, something came apparently when we did our first cloud implementation. And here, I'm going to introduce you to our concepts for self-service and the guardrails that followed with this. I'm also going to have a boring topic: finance. And I'm going to have that as a cliffhanger. So something boring as a cliffhanger. But it's quite elemental to our success that we get this right, and we get it early. And last, the question of the day, and that's probably more for you: how do we scale these initiatives? Because you're going to see a lot of trends here. But basically, this is the story of today: how we have used cloud to change how we work.

Great. So at Maersk, we have what we call a cloud-first strategy. So when we build anything new, we put it at a hyperscaler public cloud provider first, unless we have a really, really, really, really good reason not to. Okay? So cloud first. But we don't stop there. We actually have a software-as-a-service-first strategy, and let me try to explain that.

It's quite elemental. Because you've probably seen this stack before. So you have infrastructure as a service, where you take care of managing everything from the hypervisor up. You patch the OS, you patch the applications. On a platform as a service, you take care of your code and your data. But the cloud provider takes care of the rest of the technology stack. And then you have software as a service where you basically configure and consume.

Now, I don't know about you guys, but we feel that our lives are too short to be running other people's software. Why in the world would I, unless I have constraints, of course, run an email system in the year of 2019? You might have constraints in your organization, legal requirements, et cetera. But if you don't, why would you have the hassle of running an email system when you can buy great services off the internet that probably can give you a better service than you would ever be able to provide yourself?

Right. So how about we buy other people's software as a service, and then focus our effort on building great software for our users, for our customers? So this is the philosophy: software as a service first. If we are developing new, let's use platform as a service. We are not the specialists of running Windows or running Linux, so why don't we just consume the best of the cloud platform at scale as a service? So we see tremendous value in not having to build from the OS up.

And when we talk about constraints, one of our largest constraints in our industry, and I hope I will see nodding heads now, is getting people, getting what we call warm hands on the keyboard, engineers. Where do we find enough engineers? And if we spend our engineers' time on patching OSs, on running email systems, et cetera, I'm not sure we're spending it where it differentiates us in the marketplace.

So basically, our cloud-first strategy is a software-as-a-service-first strategy, then platform as a service, and then you can have a VM if the two others didn't work for you. So that's number one cloud driver: the highest possible abstraction.

Next, we talk about co-innovation in cloud. And here it's really about, again, not trying to reinvent the wheel. So we are big on IoT. We have some of our coolest digital solutions run on IoT, where we can fuel optimize our vessels, so we spend the minimum amount of fuel based on machine-learning algorithms, by collecting data from IoT sensors, IoT-enabled sensors.

Now, this is a great potential for us. We are an asset business. We live by our vessels, our containers, our terminals. These are physical assets, and we will never leave that world. But IoT enabling is important. But building an IoT stack is not differentiating. So why not consume IoT as a service, so these platform components? And we really strongly believe that your cloud providers shouldn't be vendors. Your hyperscaler cloud providers should be business partners. And this is a paradigm shift for many people in technology, especially procurement. I'm just saying, that's a bit of a challenge.

Great. Number three. So we discuss internally the move away from the monolithic infrastructure. So we have discussed for many, many years monolithic applications and why they are painful to maintain and improve. But have you ever considered that you also are running a monolithic infrastructure?

So, if we go to Wiki, and we look up the definition of a monolithic application, what it says is something like, and I'm transcribing now, it's a system that is more complex than one single person can comprehend. That's sort of the definition, the thumb rule. If it's more complex than one person can comprehend it, then it's a monolith. Now consider your internal infrastructure, your Windows network, your Linux network, where your users connect to. So the corporate network of any big industry or any big enterprise. Those are basically monolithic, and they are the source of pain of many of the constraints we have within our organization.

So transforming that into a loosely coupled or decoupled infrastructure, where each little app, each service gets its own infrastructure that's decoupled from the rest of the infrastructure, like a little cloud in the sky, not connected via what we call layer two connectivity or IP connectivity or internal IP connectivity to the rest of the network. And we have been practicing this for the last almost two years now, that anything greenfield does not go on the corporate network. Now, the reason this relates to cloud is cloud makes this super easy. We're not running our own switches. We're not running our own VNet. We are just consuming them as a service.

Self-service is, of course, the next topic. But self-service gives us a higher level of agility, and I'm going to treat this topic in a moment. But basically, what we have is instant environment provisioning. So are you building something new? You can get your environment within minutes, and that's quite powerful. If you consider the traditional lead times in enterprises to get environments, you're going to like this.

Then we talk about consumption-based and elastic infrastructure. Again, a topic for later. But the pattern here is not a capital investment in hardware, but we can consume what we need when we need it. So it's elastic. When we're developing and not in production, we only need dev environments. When we're in production, we can just spin it up and scale it automatically. So it's absolutely a driver, and it enables us to do what we call a direct cost model. And the direct cost model opens up for even further agility.

So with that, I hope I draw a picture of the drivers we have, the trajectory we have to cloud, and what we see it enables for us. But I wanted to dwell on the three topics of self-service, guardrails, and cost because they are quite important to succeed in accelerating deliveries here.

Self-service. Using self-service to reduce lead time. So lead time being the mean time to get environment. How much time does it get to get a new dev environment, a new production environment? Basically, what it's about is reducing the handovers. Yeah, we need great tooling, but it's probably more about processes.

So I just pulled some fresh numbers from our statistics. And we look at our cloud journey as being the third iteration, and I'm going to walk you through all three of our iterations. But currently, we have a fairly long lead time even for cloud environments. But let me try to build a context here.

So traditionally, Maersk were a functional organization with a functional technology organization. And we have this definition of plan, and we plan for a while, then we build for a while, and then we run for a while. So you have clear handovers between the three functions in this functional organization. And it took about a year with each of them. We plan for a year, we build for a year, and then we run forever as technical debt.

Now, what we did in our first iteration of cloud was basically the same thing. We planned for a while, we built for a while, and then we ran for a while. Now, if we're going back to instant environment provisioning, the cloud enables you to just spin up an infrastructure when you need it. I think we all agree with that. Now, when we moved first time to the cloud, we reduced the lead time to environment from 100 days to these classic scenarios, as today, 85 days. So by having an elastic infrastructure, it still takes 85 days to get an environment. That's quite impressive, isn't it? This is what I call self-inflicted lead time, because this is not a physical constraint in your hardware capacity. This is processes that's keeping you from moving faster.

So we also have an average of 13 people involved in a classic delivery, which is not bad. And we figured it cost us around €40,000 to provision an environment in work time, in handovers, in meetings, in whatnot. So not in consultancy or cloud cost, but just in man-hours. So it's quite impressive, but I would say we can do better than that. So in this scenario, we had classic data centers not differing a lot from cloud deliveries.

And two years ago, we started up with what Gartner advised us to do was the bimodal IT. Have you heard of that? It's quite amazing. So it's basically just let the old stuff be as it is, and then introduce, yeah, the two-speed model where you have a classic delivery that's slow. It's good for maintaining or sustaining. And then we have a modern product delivery on the other side, where they do plan, build, run in whatever increments you want, right? So you basically have a startup mentality in one end, and then you have corporate IT in the other end. So it's quite normal to do. I know a lot of enterprises that do something like this. But it comes at a price. There's no reuse of capabilities here, right? So all the capabilities you've built in your functional IT does not apply to your fast product delivery. So this two-speed IT was our second attempt of cloud.

What we are introducing here, in this calendar year, is what we call the-- We don't call it the third, we call it the right one, until we figure out it's wrong. But basically, the future model that we are advocating today is called the unified delivery model because it's inclusive. It covers everything we want to do from the developers producing valuable code into production and reusing the capabilities of the organization.

So here we talk about having three delivery options. You have three speeds of delivery. Basically, you only have two, but I'll explain that. The first one is sustain, which is for existing workloads. If you need to introduce a change to something that we already built and it doesn't have a dev crew, right? It doesn't have a product team. You can do change. We have a normal ITIL-based change process, and changes get delivered. Nothing is fast. It's normal, it's functional, and it's stable.

On the other hand, we introduced DevOps, and I come from a company where if you mentioned DevOps two years ago, people would just ignore you because you would be crazy. Where today, this is actually a legitimate way of delivering your capabilities. So DevOps teams do the plan, build, run at the cadence they decide. They put together the technology they need, and they adhere to our guardrails, and I'm going to come back to them in a second.

Then we have our continuous delivery, which is kind of the compromise between the two, where we have a dev crew that builds, and then we have a run crew that runs. Over time, we would like to see this transition into becoming site reliability engineering that focuses on keeping and maturing these built applications. And then we have the definition of IT core services consumable via API and as-a-service function. So IT is no longer a handover. IT operations provide monitoring as a service, security operations as a service, et cetera. So everything becomes frictionless and hopefully reduced handovers.

So with that, I just wanted to give you a peek into our world of self-service. And what I have here for you today is the Admiral Portal. It's something we developed, targeting our lead engineers, engineers, and our product owners. So Admiral is where we orchestrate our fleet of awesome digital products. And basically a self-service portal, a thin overlay over the cloud providers that we use, where you can go and self-manage, self-service your environments.

And here, as a product owner, could be technical, non-technical, I can go find out my product. Let's take an innocent one, the Maersk Cloud Services. And here I would be registered as an owner, I would register my cost center, et cetera, so I can manage all the housekeeping stuff. I can also go here and provision new environments. So here, the environment, the ability to create environments is at the fingertips of whoever is designated to manage this.

Here, a fun story. Now you're thinking, why this UI? Why wouldn't you do this via automation? We actually started with an API, and every other engineer would ask us, "So yeah, yeah, that's fine, but where's the portal?" So we had to reverse engineer a portal for the APIs. But basically, this is a simple portal where you can go and self-service your provisioning of environments as long as you have a cost center, right? That's all we require, that you pay for what you use.

So I'm going to come back to the portal in a second because we need to discuss the guardrails. So the guardrails are quite critical to what we do here. In order for us to succeed with self-service, we need these guardrails in place. And the guardrail concept comes out of LeanKit. It's quite common, and I also linked it in the slide deck here so you can go and grab the background here. But basically, it's a set of good practices and principles that you can use to support the self-servicing team. They are internally open sourced. Anyone can make a suggestion if they disagree with what's there. We continuously evolve these as we get smarter in our cloud journey. We document and update them. They're owned by our team, and we enforce some of our principles, our guardrails, and other principles are more soft, right? We report on them.

So I brought some examples here. As I said before, software as a service first and then PaaS services. We try to avoid the VMs. We can't really do it, but they are a principle in itself. No secrets in code is a good example. Yeah, we need to write it down, apparently. Encryption at rest. Enforce the use of pipelines. So we don't actually directly enforce the use of pipelines. What we say is no human access to production environment. And then, of course, the isolated architecture.

So I wanted to highlight this one about access governance. So what we basically do is we ask engineers to categorize the environment for production, which can hold production data, and then non-production, which can't hold production data. We manage the access for production environments by saying no human intervention in production. So if you can't automate it, you're not getting it in production. And this automatically enforces or supports the use of pipelines and the support of infrastructure as code and automation. So with this simple principle, we are actually able to accelerate how we do this.

So I just briefly wanted to show you that in the portal here, we manage the access governance. So we can go and see who is owners of this service, what is the role-based access model, who has the ability to change production. You see no humans, only service accounts. And the teams can also go and set up their service accounts, so the credentials in the pipelines, for themselves. Again, self-servicing them into accessing production from the pipelines. So we do all the wiring automatically and via APIs or portals so that the engineers are not constrained from moving forward as long as they stick within the guardrails.

The last topic I have here is about cloud finance and cloud financial management. Now, the reason why this is important is because it's a huge paradigm shift for IT organizations. So I wanted to bring this as well because we have been struggling in this topic internally at Maersk.

So what cloud enables is elastic consumption of infrastructure. In a classic IT scenario, you would have the cost ownership in IT operations. But if you take that paradigm into the cloud, you're not really thinking that you have a consumption-based model, and you will not promote engineers that right-size their solutions. Right? You will not promote the shutdown of dev environments when they're not used. However, if you anchor the cost at a direct cost model within the product teams, so product teams pay for what they use, then you would start seeing a change in behavior where product teams will right-size, spend time, spend effort in right-sizing solutions, automating shutdowns of dev environments in the weekends when they're not working anyway. Because maybe that's the trick to affording another colleague, right? Or a better Christmas party or whatever, right? So anchoring the cost where it can be influenced is quite important to get a lot of value out of cloud.

So that is the wiring internal mechanics. But what you also need is great tooling. If you can't show back, then you leave your product owners in darkness if they only see an invoice at the end of the month, right? Then they can't react. So you need great tooling in place. And here is quite a learning we did, because what you see from cloud providers is they provide quite a nice UI in their portals. So if you go to Google Cloud, you can see the cost of your running services. You go to Microsoft Cloud, you can see the same.

But the thing is, product owners are often business people, right? They are good at managing ships or maintaining vessels. They don't know what a cloud hyperscaler portal is. They don't even have an account to access that. So what we did was we pulled out that data into some Power BI, and we're now surfacing this, both here where we can see what each environment costs. So this is a function SAP, so it doesn't really cost anything. But we also compare the cost production over development. We highlight development cost and suggest optimization. So this is just a simple dashboard. You can drill down in all this data if you are the product owner. But again, just putting that data out there in the face of the people who can influence it is the trick here. Don't hide the cost here. Make it real-time and make it valuable for the people that need it.

So with that, out of time, my ask here is, or the last statement here is, we're really leveraging cloud to change how we work, right? Cloud makes a lot of technology constraints go away. But if we don't change how we work, we will not realize the full value of cloud. So scaling this cloud journey is really, really hard. And what we're trying to do here is really use the rituals from Agile in scaling the journey and becoming more inclusive as a company.

So with that, I would love to take questions. I'm going to be around here for a second. I would love to hear your feedback and ideas on my thinking and the story I shared here. So I'm going to be at the speakers' corner when it opens up this afternoon, and I would really appreciate your feedback. But with that, I would like to thank you so much for listening in. I hope you enjoyed this talk. Thank you.