Large Scale DevSecOps Transformation Including IBM Z

Log in to watch

Las Vegas 2022

Download slides

Large Scale DevSecOps Transformation Including IBM Z

Rosalind Radcliffe

IBM Fellow, CIO DevSecOps CTO · IBM

Thomas Lawless

IBM STSM, CIO Developer Experience · IBM

Large Scale DevSecOps Transformation Including IBM Z

Chapters

Full transcript

The complete talk, organized by section.

Rosalind Radcliffe

My name is Rosalind Radcliffe. We'll get into a little bit more about that in a minute, but I want to say I'm so glad to be here in person again, telling the stories, meeting with all of you, and talking to people in person. It is absolutely wonderful to be back, and we're going to be telling the story about our large-scale transformation.

First we'll introduce ourselves. I've changed roles since the last time I spoke in person. I'm now an IBM Fellow, and in our IBM CIO's organization, responsible for our DevSecOps transformation as our CTO.

Thomas Lawless

Hi, I'm Tom Lawless. First-time speaker here at DevOps Enterprise Summit, so I'm super excited to be here. I'm a senior technical staff member, an STSM at IBM. I work really closely with Rosalind. In the beginning of this year I took on a new role as part of our developer experience organization within the CIO, and some of what we're going to be talking about is going to touch upon that.

Rosalind Radcliffe

So what is the IBM CIO's organization? It's the IT organization that runs IBM. IBM is a relatively large company. I don't know, 250,000 people. We build hardware and software, provide cloud, provide consulting services. We split out our outsourcing business into a separate company, so that's now Kyndryl, and that will come into play in this discussion a little later.

When we look at the organization, it is responsible for providing the IT for IBM. We provide payroll. We provide manufacturing support. We provide quote-to-cash. We provide close the books. IBM has IBM Financing, so we provide a financing organization, a bank. We have a lot of IT infrastructure, a lot of responsibility to the organization, and a very large organization within the organization.

Thomas Lawless

The CIO organization is 12,000 people strong. About half of them are developers or are still in some sort of development role: data scientists, data engineering, SRE, and general-purpose application developers.

We have a really large portfolio of applications that run IBM IT, around 6,000, that span everything that you could imagine. We have everything from mainframe to container. We run on-prem in a private cloud. We run IBM in public cloud. We have Power and AIX. We have vendor solutions. We have SaaS solutions. We have Mac, Mac-based, iOS, Windows, programming languages, databases. Seriously, name it, we probably have it somewhere in our organization. It's huge.

Because of the size of our application portfolio, we also have a very large on-prem GitHub Enterprise deployment. We have the largest on-prem GitHub Enterprise deployment in the world. Just within our organization, we have somewhere in the neighborhood of 70,000 GitHub repositories. That's a big number, and think about the amount of effort that our teams do across those repositories for very simple maintenance tasks: configuring CI/CD, hunting down vulnerabilities, everything that your developers probably do.

Because of who we are and because of the types of applications we build, we take a very basic stance that all of our repositories in GitHub should be private. Even though we have an on-prem deployment of GitHub, our repository access is still need-to-know within the company. We do that to protect intellectual property. We do it to protect customers and clients. It also introduces a lot of challenges, because it really limits our visibility into the repositories at our scale.

Rosalind Radcliffe

So we have a small problem. I think everybody had a small problem in December last year. We have a lot of applications. And this isn't just the CIO: if I look at all of IBM, we build a lot of products, we have the CIO applications, and Log4j hits, and it's Christmas. A lot of people haven't taken a lot of their vacation, so they've left early.

We have no way of knowing what applications are impacted by Log4j. Every single application team had to report whether or not they were impacted by Log4j. This is a problem. We should have said, okay, if you're running COBOL, you don't have this problem, but they didn't start that way. Every single application, because we have no inventory. We have no clear understanding of every application we have.

Think about our private repository setup. We can't even use GitHub Enterprise search to search for Log4j, because unless you have access to those repositories that are private, they don't come up in the results. So we had a really large problem.

One of the reasons I came into the CIO organization, and Log4j just happens to be the easiest way to explain the problem, was to help with this transformation. What we're trying to do is truly be a showcase for hybrid cloud, to really show what's possible with today's technology.

I want to give a little more background on our systems. We've only been around 110-plus years; I don't remember how many years anymore. We have applications that were written in the 1970s running on our Z systems that are still running on our Z systems, and they're still providing real value to the organization. We have SAP. We have everything you can imagine running our systems in our environment. And as you would expect, some are somewhat siloed. Different teams work differently. We're still using some legacy library managers for our COBOL and PL/I, which didn't make me happy when I showed up and noticed that. But here we are.

We have a whole bunch of different processes. I was brought in to help facilitate this transformation, to help change the culture of the organization. One thing we realized is that we really do have everything, so we need to do this as a platform approach. We want to provide a platform to run our applications and a platform to be able to get into the application: a developer platform, and Tom's going to talk more about that in a minute. I want to focus on the operations platform, the runtime platform.

If you look at this picture, it's somewhat complex. But at the core, it basically says it doesn't matter. You can run your application on Z, on z/VM, on IBM i, on AIX, containers using OpenShift on any hardware platform. You can run your application where it makes the most sense. But the problem with this is, why should a developer have to tell or care? Okay, if they're writing COBOL or PL/I, they're going to run on Z; I got that part. But other than that, if they're writing a Java application, if they're building something, why should the developer actually care?

Our platform is being built such that they don't have to. We're going to focus on this intelligent workload placement so that the applications can run where they belong based on the qualities of service of the application, abstracting that away from the developer having to worry about that. We can ensure that we provide the qualities of service required for that application. If it's financially significant, it's going to have the right DR. It's going to have the right capability.

The other thing we're doing with this picture is bringing z/OS into the modern world with infrastructure as code. When Topo was talking in the other sessions this morning about the fact that we really have to change the ways of working from an infrastructure standpoint, we want one way of working: infrastructure as code, even for z/OS.

I made the comment that we separated out Kyndryl this last year. Understand that with our outsourcing organization, they provided services and systems to companies including IBM. We had, with Kyndryl, data centers all over the world. We still have data centers all over the world, but we had to split those data centers. There's a set that's IBM and there's a set that's Kyndryl. I bet our applications are not necessarily running in the right data center now. Do we have applications running in places they probably shouldn't anymore? These two years are a process of splitting all of the assets that we have into our data centers instead of Kyndryl data centers. We have to do this entire split, so it's a perfect opportunity to change.

The other thing, if you happen to know Z very well, is that one of the things I discovered when I got here is we have over 350 LPARs. I don't know why. I can't figure out why, but I know we don't need 350 LPARs for our 615 applications running. So we're going to streamline this using infrastructure as code, creating a z/OS image that I can deploy with a pipeline. This entire back end is done via pipeline with no user access to production. We're removing the ability for users to get into production. The pipeline is going to make all those changes. It's going to be configuration as code, even for z/OS, in order to simplify this environment for everyone working in the environment.

Thomas Lawless

Now, from the developer experience, I think you already touched upon a little bit with the pipeline aspect. As we build this out, there are a couple of exciting things here from my point of view. First is containers everywhere. Rosalind talked a little bit about the traditional mainframe apps, but we're also looking at running containers not only on x86 but also on Z and Power.

Most of our developers, especially the ones who have joined us in the last five years or so, are mostly cloud-native developers. We do have a strong set of mainframe developers as well, but most of our younger, less experienced developers are going to be creating simpler applications, microservices, container-based applications.

This is where we want to make sure that our pipeline and our platform lets them just write code. Ultimately, we want them just to write code that solves their unique business problem. We don't want them to recreate the wheel with different CI/CD pipelines or figure out how to be secure and compliant in our enterprise setting. The pipeline, and by extension the developer platform and developer experience, is focused on providing the automation and the intelligence so that they can just write code. The platform handles security. The platform handles all the compliance checks, the audit readiness, the evidence that we need to prove that. Based on the characteristics of the workload, we'll place it where it needs to be, where it will run the best.

If we look at it from a different point of view, this slide is a little more of a cultural transformation that we need to drive to be successful here. The diagram in the middle is just a different way of looking at the platform. We have our core hybrid cloud platform in the middle. Around that, in order to start to deliver these higher-level features, we're looking at how to build a unified control plane across all of those feature sets, as well as an enterprise data fabric.

The combination of the control plane and the data fabric will enable what I like to call our enterprise capabilities. That's the outermost ring. It's our ability to do dynamic workload management to make decisions about where our workload should run. It's our ability to do DevSecOps at scale. It's our ability to infuse machine learning into what we do, as well as manage our portfolio. We have a large portfolio. Like we talked about inventory, we may or may not have more than one inventory system today. We do.

This is a way for us to unify that experience, even into our security and compliance. We can look at this as a transformation of culture, starting with application and data modernization, process modernization, and platform modernization. We have a lot of teams that run their own quote-unquote infrastructure today. We're finding that it's expensive. It's difficult to keep updated. It's difficult to keep secure. So we want to start to consolidate all of that into our one platform.

From a developer experience point of view, this is part of our developer advocacy program. It's to go out to our developers that are maybe mainframe developers that aren't as familiar with GitHub as some of our cloud-native developers are, or aren't as familiar with end-to-end CI/CD pipelines, and start to lay the foundation of education and skill building to start to push this modernization into the platform.

Our key roles -- business stakeholders, product development teams, SRE teams, cybersecurity experts -- revolve around our enterprise capabilities to deliver products. Coming out of this, what we hope to see is improved developer productivity, higher quality and more secure applications, faster time to value, and lower IT costs. Faster time to value is a big one for us. As we sat down and started to talk about this from a strategy point of view, our ultimate goal is to be able to go from idea to deployment within a week or two: really short, business-led outcomes and objectives. Finally, lower IT costs in general. At our scale, there's an economy of bringing all these things together, reducing developer waste, reducing redundant effort, and that will lower overall IT costs.

Rosalind Radcliffe

As part of changing the culture, we've set up a bunch of guilds. We need to help teams transform. We need to help them understand, and so we have guilds focused on being a community of practice, a set of experts to share to the rest of the organization: this is how we want to work, this is the change we want to make.

We have an architecture guild. I'll admit IBM is pretty freewheeling. That's probably the best way of putting it: you can do what you want. I have always enjoyed that as a developer. I can do what I want. As someone who has to deal with this mess now, I like that less. So we are trying to come up with more standardization to say, unless you have a reason to not do it this way, let's do it the standard way. Let's have commonality for business value. That doesn't mean everything's the same. Obviously we run on all sorts of platforms and we do lots of different things. There are business value reasons that we make decisions about how things should be built. But that's got to be a business value decision, not a cutesy technology-of-the-week decision.

The DevSecOps guild, the architecture guild, the security guild, the modernization guild: all of these guilds help bring those like-minded people together to then share that knowledge out into the environment.

Thomas Lawless

One of the other major initiatives we have going on is inner source. We're building a developer platform or developer experience. Inner source is a mechanism that we can make sure the developer has their voice in what we're building. They can have an ownership stake in what we're building, because all of our code and all of our automation is open to our internal developers, and we encourage them to codify their expertise.

Like Rosalind and I have been saying, we have a huge platform that we work with, a ton of technology that's available to us. We have subject matter experts all over the place, but they work in silos today, which doesn't help us as an organization. Inner source is a mechanism for them to take their expertise, put it into code, and share it across the organization.

Rosalind Radcliffe

It's really important. IBM has always been into open source. We've been an open source company from early days. We want to drive inner source as well, to get that collaboration not just in the open, but to help our internal teams where it's appropriate.

Now inner source is a challenge in IBM. When we said we have private repos, we have lots of private repos. We have organizations that will not share. The reasons for that are heritage and legal reasons, security reasons, all sorts of different reasons why we block access to things. Because of that, we have a whole bunch of silos. We have a whole bunch of challenges. It's hard for developers to understand: how are they supposed to do things?

We also have a very strong security-by-design practice. You have to follow security by design. Well, what tool am I supposed to use? How am I supposed to understand? So we survey the developers and ask them what's going on, ask them what their challenges are, ask them about their problems. We get that feedback on a sort of quarterly basis. Actually, it's three times a year, not four, because of vacations and holidays. We have developers all over the world, and parts of Europe go on vacation during the summer, so we have to deal with all these things. It's not quite quarterly, but we get feedback from the developers about how they are working today and what their challenges are.

In this developer experience, we're trying to improve that capability. Reality is, we're starting with the CIO organization. Our developer experience survey is actually IBM-wide. We are starting with the developer platform within the CIO's organization, but logically the goal is to spread this to all hundred-thousand-plus developers in IBM.

Thomas Lawless

That 70,000 repository number we threw out in the beginning is just the CIO. Across IBM it's closer to a million repositories. Think about that: a million GitHub repositories just internal to IBM. Gigantic. And by the way, not all of our source is in GitHub. We have lots of teams in RTC, SCLM, CMVC -- every SCM you could think of.

We know not all million repositories are meaningful repositories. The best estimate right now, and I say estimate as part of a problem that we have to solve, is somewhere in the 250,000 range, which is still a legit gigantic number. So as we look to make this better for our developers within the CIO, we also have an eye on what we could potentially do for our enterprise across the company.

The feedback that we've gotten from developers has really acted as a guidepost for us as we're getting up and running, and it aligns with the features that we're building. Centralized common CI/CD platform pipelines: it'll be the only way to deploy to the platform. You have to go through a CI/CD pipeline, but it's also the lowest-hanging fruit for inner source contribution. Everyone has CI/CD automation today -- almost everyone, but everyone should have CI/CD today. Attribute it, get it into a shared common catalog where everyone else can benefit from it.

The developer data lake: we just heard one of the keynote speakers talk about collecting evidence, collecting all this data during pipeline execution. That is what our developer data lake is doing as our pipelines run. As they execute steps, not only do we capture basic log output, but we also capture evidence about the security and compliance results that were executed during the pipeline.

Finally, one of the big pieces of feedback we got from developers was context switching. Sure, I have a lot of tools available to me. I have tools that do vulnerability scanning. I have tools that do code quality. Name it, I have a tool for it. But I'm constantly jumping from tool to tool to understand my application and understand where I need to make improvements. It'd be great if you could pull all that data into one place. The combination of the developer data lake and the developer portal is our first pass at that.

If we take a look at that feedback again, it also gives us an idea of priority or delivery. We're in phase one right now. We're really just getting started. I would say we're probably approaching the end of phase one, but as we get into phases two and three, we have the opportunity to drive more customization and more intelligence into the feedback that we're giving to developers, even to an individual level as we get towards the end of phase three.

Rosalind Radcliffe

If we look at this developer experience, the reason we're driving it in the way we are is we want to get that early success. We want to see the CIO office successful in this inner source model of building out the pipeline to get contributions, so that we can get the value out of inner source and reduce the effort. You can imagine in just the CIO's organization, we have a ton of pipelines. Each development team that's managing their own pipeline is spending a set of resources to do that. We have a ton of resources working on security and compliance. By simplifying and standardizing, we hope to reduce the cost of this overall effort and allow developers to focus on the things that they actually want to focus on.

I know we're running out of time, so I want to make sure that we get the help that we need. We're trying to do this centralized CI/CD process, and so I want to talk to organizations that are working toward centralized developer experience. We want to understand how you share the best practices across your organization. How do you provide this visibility in the organization? I'm always interested in talking to people about how they're doing their z/OS environment, because we're doing it as infrastructure as code. By the way, we will share that as soon as we get it. Thank you all.