Test Automation For Mainframe Applications
Mainframes - what are they today, how do they fit into DevOps and how do you deal with automated testing for traditional mainframe applications.
Chapters
Full transcript
The complete talk, organized by section.
Rosalind Radcliffe
Hello there. So hopefully you will stay and listen to a fun set of stories about the mainframe and test automation for the mainframe, and how DevOps really does apply in the mainframe space. If you think about what I'm going to talk about, it's more than the mainframe. It's the system of record. And in the mainframe case, it's a very particular example of a system of record, but it is a system of record.
Let's see if I can move slides. Next slide. Yeah.
So why am I here talking? Gene asked me to come and present, actually at the last conference and now at this conference, about mainframe, and it's probably because of my background. I've only spent 29 years in the mainframe space.
I am an IBM Distinguished Engineer, and so I have spent a lot of time, a lot of effort, and a lot of energy around this space. And I am now our Chief Architect, or have been our Chief Architect, for DevOps for enterprise systems. I started in IBM in ISPF development, so I know a little bit about the z Systems and the requirements around Z.
One of the things I wanted to do was start with a few fun facts about Z, because when I'm in a DevOps conference, a lot of times I talk about the mainframe, and users talk about that old system.
So I want to start with a few of the fun facts that we have. A little bit about 92 of the top 100 worldwide banks use it. Not many. Okay. Only nine out of the top 10 world's largest insurers use it. Eighty-plus percent of the world's data is sitting on the mainframe today. Structured data. Think about it. It's there, it's in the world, and it's what we're running on. But it's our backend systems, and it's important that they play and participate in this environment.
If we look at a few more facts, you can come up with lots of them: 30 billion transactions, percent of the data. There's a lot of fun things around the system. A lot of companies use it for their backend transaction systems. And why would you? Because it's reliable, it's scalable, it's there, and it's worked the same, in many ways, for quite a long time.
In fact, one of the strengths of the mainframe is probably also one of its weaknesses. As I was talking earlier today, someone commented that they probably have some code that they wrote back in 1959 that's still running on the system. And at least from the 1970s or '80s, you may not have recompiled that code, and it's still running. It's still providing business value.
So that's a great advantage and a disadvantage, because it just works. Why change the process? Why change the way you interact with a system? And if we think about it, yes, it works, but it needs to be faster. It needs to work at the speed required by business.
One of my most favorite stories, most recently, has to do with Walmart. And you can Google Walmart and their Z story, and so you can go hear them give this story. But it's all about their need for a caching service. So the distributed applications needed a caching service.
They worked really hard to build a distributed solution, to buy a distributed solution, to work with open source, take your choice. They couldn't get a caching service that worked well. So the Z guys thought about it and said, "Well, we've got a large system. Why don't we try this?"
So they built a caching service that's a REST-based interface for their distributed teams to call. And they gave it to a team and said, "Why don't you try this?" And it worked. And so that team shared it with other teams, and then shared it with other teams.
And so the Z guys came back and said, "Hmm, you're using it a little bit more than you said you were. We're at a million transactions a day," or something like that. "What's going on here?"
"We've shared it with a few of our friends."
Okay. So what they did was they made it easier. They've got a simple web interface to allow you to ask for a caching service. You can have one. Any team can have their own. They dynamically provision this caching service. It runs on z/OS. And for those of you that know about z/OS, it's running in a very large sysplex. It's written with Assembler and COBOL and CICS.
So I wouldn't call it the most modern code, but it's very efficient. It works very well. It's a REST-based interface, so the distributed teams don't care where it runs. And, oh, by the way, it runs at one-fifth the cost of the distributed solution that didn't work.
And it can handle 100 Cyber Mondays in a day. They test it out, the whatever. Billions of transactions have run through this thing without failure.
That's an example of what people are doing with the modern mainframe today. It's the backend system that scales the way you need it without question. It provides the value for those transactions that you need the transactional reliability for.
The latest machine, built for today's mobile workloads, allows you to interface to the system, provides the scalability, the reliability, the security, the encryption capability to allow companies to do what they need to do. And you'll notice the line at the bottom: only 10 terabytes of memory on a machine.
One of the things in Z, we've always worried about memory. Well, we solved the problem. We just put more on the box.
The other thing that's important to point out is that when you build systems, when you interact with a system, you're building that system for a particular set of capability. And you can do a lot of work and take unreliable systems and make them reliable, or you can just use a reliable system in the first place.
Now, it is important to remember that you don't need a reliable system behind everything. If you look at Google, for example, it doesn't actually matter if one of your Google transactions doesn't work, because they're sending it out to 1,000 machines. They get the results. They bring them back together. Does it matter? No.
However, if you do that ATM transaction, you really actually care that the money only comes out once, right? Okay. So there's a different set of transactions. There's a different set of capabilities. And the point here is target the platform that makes sense for what you're doing.
However, if we think about mainframe, usually you don't think about mobile. Usually you don't think about REST. Usually you don't think about fast. You think about slow. You think about it's that other team over there that's been doing things the same way forever. You think about, oh, they use that green screen thing. Take your choice of different things. It's too costly.
Well, it's not, but these are the problems. It works, but it's worked the same way for the last 30 years.
One thing that amazes me, sort of, is when I walk into a shop, the development environment looks exactly the same as it did 29 years ago when I started development. Same tools. Same process. Same ca--wait. What? Thirty years? What else on the planet is the same as 30 years ago?
Yes, it worked 30 years ago, but not today. So let's take all those lessons we learned in development and in operations and apply them to the mainframe as well. There's no reason it needs to be different. Yes, it's a shared environment. Yes, it's a reliable system, but that doesn't mean you have to do things an old way.
So what do we need to do? The first thing we need to do, and the first thing that I need everyone to do, so as Gene says, we all have an ask. My ask is that the mainframe is just another platform.
Don't go back and say, "Well, it can't do it." It's as modern a platform as any other platform. Challenge your system of record to say you can move as fast. You can be as agile. You can do all the same kinds of practices that you do in the distributed space. There is no excuse.
When we go through here, none of this is going to sound new. It's not new to do automated deployments. It's not new to do automated testing. It's just new for the z space for some reason.
So we're going to go through each one of these, and the first one might actually be new for some of you. In the distributed space, many organizations have dynamic provisioning of their test environments. And realistically, with Z, you could have done that forever using Z hardware and using emulated system, using z/VM to cause yourself to have additional LPARs. But nobody really wants to do that.
So we have something called Rational Developer and Test Environment for z Systems, which is really z/OS running on Intel Linux. So now I can dynamically provision as many test environments as I wish, and I don't have to worry about automated testing, or MIPS charges, or any of those other things. And I can make my development process work exactly the same as it does for the distributed systems.
No difference. No different way of working. No different thought process around test environments or test data.
I need to think about doing automated testing. I need to think about having developers build unit tests. You can build unit tests. Why don't you? You do in the distributed space. You build JUnits. Why not build ZUnits? There's no reason not to.
Run code coverage. We've got it. Capability's there. Run it. Understand what you're doing. Use interface testing in order to test your existing applications so that you understand how they work.
Most people don't use a green screen today for their actual application. Most applications have front ends. You've got a mobile front end. You've got a browser front end. So you're calling your backend system through an API. Why not use that as a testing interface and easily test and do test automation? It's the easiest way to test.
Don't just do happy path testing. Well, this comment applies to the distributed systems too. Remember, negative testing is actually a really good idea. And automate it as part of the build. Do things like code rules as part of checking in the code.
Now, this is really hard to do with 30-year-old technology. Existing host-based SCMs and build systems don't allow you to do this easily, so you need to modernize your tooling. But you can. It exists. It's possible. So modernize the tooling, modernize the practices, and do development the same way that you do on other systems.
I already mentioned this, but interface testing is critical. Allowing you to build an interface test and allow you to create a virtual service. You can create a virtual service of your mainframe, basically, and allow your teams to work independently, and allow them to test based on that same interface.
By using a z/OS system, self-provisioned, you can actually have a system that has the application you're using, but you can use virtual services to isolate it from the rest of the environment. Use concepts like database virtualization, the same you use in the distributed space, to allow you to do this testing early in the environment.
And when we think about systems of record, what should we be doing? Those backend systems really should be providing a set of APIs that are being called from some other system: my mobile environment, my web environment, whatever it is. I want to be able to compose my services together.
So most existing mainframe applications probably don't currently look like something you can call as a REST service. They've probably built up over years. But if you look at the core, it's a transaction system. So down in the core of that, there's a set of transactions that are providing your value.
You can expose those as services, and now you can more easily call them and provide them to the rest of the organization.
I heard earlier in one of the sessions talking about making payment systems or clearing systems being available to other parts of an organization. A bank having to provide its APIs externally. Well, if you think about that, that service probably exists today sitting on a mainframe. Just expose it as a REST-based interface, and now it can be accessed running the existing same code, just a new way of accessing the function.
Now, if we think about modern tools, modern practices, exposing things as services, I can now think about breaking up these existing monolithic applications into smaller applications and make it easier to manage over time.
Making sure I have the right integrated monitoring, making sure I'm getting the right feedback about the system early on in development phases.
One thing that amazes me in this space: on a z System, in general, you're running the same monitoring tools on every LPAR. Why not take that data and let developers see it? Get the information so they can see the data earlier, they can understand what's going on, and they can improve the system.
You can also look at that system from the standpoint of what's the most used modules. Look at those for optimization. Improve those in your processes.
And the biggest problem we have, the biggest problem we have in the mainframe space, is the fact that people are using tools that were built in the 1960s. That's when they started. They haven't changed much since the 1960s. And we're expecting people to use these tools to do their development.
What's the likelihood a 20-something kid is going to come in and want to use those tools? I hope everyone in the room understands that's somewhere near zero.
No kid wants to come in and use a green screen with no modern SCM, with no modern practices to do their development. I get it. I understand. But it's just a language. So if you're writing COBOL, you're writing PL/I, you're writing Assembler, it doesn't matter. That's just the language.
Let them use modern tools, modern capabilities, modern practices, and write it in that same language. Doesn't matter. Or write Java on Z. That works just the same. Any modern language, you can run it on Z too. The point is, it's the modern tools and modern practices.
Why don't you have an automated test bucket?
Now, this is actually one area that surprises me even more, because when I started back in ISPF development all those years ago, I actually had a regression bucket. I actually had a complete regression bucket. So whenever I made a change to the product, I ran my regression bucket.
So I'd make a change during the day and I'd go home, back when we could go home at night and not work. Okay. In the morning when I'd come in, I'd go look at the results of the regression bucket. I knew for sure that I hadn't broken the existing code. Now, I had to add a new test case for my new code, but I made sure I didn't break anything.
And in a system of record, the most important thing when people think about them is, don't break. So if I've got a really good regression test bucket, if I've got the capability for automated testing, I can run that test efficiently and make sure that the changes I'm making are not going to cause a problem.
If we think about ITIL, and we think about the CAB processes and all these change processes, they have built up over the years to help, as we say, reduce risk by reducing the number of changes.
Well, realistically, we're increasing the risk because we're grouping up a whole bunch of unrelated changes and praying as we put them in. Are they all going to work together? Please? Okay.
If instead, we have an automated test bucket, and we make good use of test to make sure we're testing the application, we can now make these changes into the environment on a more frequent basis, and we can take advantage of the same practices and procedures.
If we think about the z System, it's optimized to stay up. It's optimized with a sysplex environment to allow you to be up 24 by 7 by 365, et cetera. In fact, there's some systems and some organizations that say their systems have been up for over 14 years.
That's not a joke. That number is actually a little too low. I think it's probably 16 now. Literally, the system has been up that long. Now, they've changed out the hardware underneath that system. They've changed out the operating system underneath that system. So it really hasn't been running that long.
But the application has been running and live in an LPAR all of that time, because you can upgrade around it. You can upgrade part of the sysplex and bring it up. So it's had stay-up systems. It's had this thought process of keeping systems up forever.
So you can take advantage of that in your system updates to make sure your system stays up and running while you update it, while you update parts of it, so that you can keep your reliability, you can keep your scalability, and you can move efficiently.
One of the comments that I've heard from some of my clients is that they actually make changes faster in the mainframe space than in the distributed space, because it's easier to make changes there, because we've optimized for small changes.
And so those companies that have begun this transformation, that are on the path to DevOps, actually can make changes faster in the Z side than in the distributed side.
If you look at the bottom of this screen, I intentionally put those two pictures there. One is a nice traditional green screen, the thing that I started with all those years ago, and the thing that many mainframe developers still use today. And next to it is a modern development, Eclipse-based tool to allow you to do mainframe development.
If you were coming into this space, which one would you choose? Think about that, and think about what your environments have when you think about what should you be doing in this development space.
Now, I can talk about transformations, but let's think about real customers and real customer environments. I've already told you about the Walmart story and their ability to dynamically create a service on the fly. And it works for their distributed teams.
But there are many other organizations that are also doing such transformations. Some of them are very willing to allow me to use their name, and some of them are not.
One of the most used DevOps transformation stories is actually Nationwide Insurance. Nationwide Insurance is one that is willing to allow us to use their name, and they go out and talk as well. So if you go YouTube, Nationwide Insurance and DevOps, you'll get to see lots of stories about their transformation.
They have literally moved people around their building to have them sit together, to do a complete transformation, to have distributed and Z working together. And they have significantly improved their process to, say, zero defects in production. Realistically, no found defects in production.
But they've now successfully delivered on time, on schedule, 90% of the time, instead of a much lower percentage before their transformation.
If we look at some of these other examples, we have examples of organizations that have literally taken testing that could take weeks down to hours through using automated testing procedures and automated deployment procedures.
Now, in the Z space, you probably already have automated deployment and build, because that's what the existing host-based SCMs do. The only problem is they don't do the new stuff alongside the old stuff. So if I have a new service interface I'm plopping in front of COBOL, I need a single environment that allows me to deploy those. And so this company moved to a common deployment technology, allowing them to deploy the distributed and Z together to do the automated testing.
We have another example of a company in which they have transformed their entire development process to modern tools, and they are on the pipeline of transforming their entire distributed side to the same tool.
So there was a comment this morning in one of the sessions about the fact that you should have a common SCM across the environment, so you have a common source of truth. Well, in this case, they have a common SCM across all of the environments: Z, distributed, configuration management, et cetera, so that they can manage and have visibility into the entire system.
You're not going to do that with host-based SCMs. You have to transform to modern tools and modern technologies.
One of the example companies, we have a number of them, are using RD&T, or the z/OS running on Intel Linux, to allow them to provision systems. This example says they can provision a new entire z/OS environment to allow their teams to work in 40 minutes.
Now, there are lots of reasons that that's as slow as it is. But when you think about they used to take never to get a new test system, or if you worked really hard, it might be six months to get a new environment in z/OS, 40 minutes seemed like wonderful.
Now the new target is actually three minutes, and I think we'll make the three-minute target. But the point is, when you can't do it, or you don't do it today, 40 minutes is great. This allows all of their teams to have their own individual environments to do work as you would expect.
And the last example: IBM itself is doing the transformation as well. You can imagine we do a little bit of z/OS development. Maybe a little bit. We might write CICS. We might write z/OS itself. We might write all the middleware that runs there.
So for our own internal teams, we're doing the same thing. We're transforming to modern development tools and modern development practices.
I still have teams in IBM that use a tool for development that was written when we wrote MVS the first time. That would be a little old. So they're moving to modern tools and modern practices to help this process.
Another example of: I can improve, I can evolve, and I can bring modern practices to do better across the board.
The key to this: we have to transform. Your system of record is key to your business. You're doing DevOps transformations for your distributed space. Why leave out your system of record? As long as you think about only doing one side of the picture, you're going to have a disconnect.
There is a discussion in the industry about bimodal IT, or two-speed IT, or take your term that you want to use. That's not real. That can't work. You can't say that I'm going to have one set of slow systems and one set of fast systems. It doesn't work.
We talk about multi-speed IT, because realistically, I believe that everything doesn't have to go at the same speed. Parts have to go at the speed required by business.
And if I build an API, that API might not need to change three times a day. It might not need to change even once a week. It's an API. It provides a set of value. It doesn't need to change as often.
But my mobile front end might need to change every hour because of something going on in the environment. Or it could be the exact opposite. My system of record needs a business change tomorrow because some legislation just hit.
Our governments--I can now laugh in the UK. Used to be I was always embarrassed by the U.S. and some of our silliness, but now I don't have to be quite as embarrassed. The rest of the world can be silly, too.
So if we think about this, legislation can hit, can require me to change my backend system now. If I need to do that, I need to be able to do it.
So all of the systems have to be able to move fast and efficiently. You have to have automated testing to do this. The key piece on the distributed side is because you have automated testing. If you've got an automated pipeline and can run your code through it, then just run your code through it.
The problems about, "I can't deploy to production because I haven't gone through all these change reviews and all of this," and take your choice. If you have an automated process that has all the right quality gates in it and all the right automated testing, why not?
One of my goals in all of this transformation is to finally get rid of the emergency change process. If I've got a pipeline that can go fast enough, I don't need an emergency change process. I can just have it flow through.
So that's what I wanted to talk about today. Hopefully, this has been helpful.
Now, I want to let you know that there is a session right after this downstairs. There's a panel being done by Sanjeev, and he's going to have Lloyds, Monitise, and myself giving additional real stories about DevOps transformations in the environment.
And in the expo tonight, some of us, namely myself and Sanjeev, will be doing book signings for the DevOps for Dummies books that we've written.
So I want to thank you. Appreciate your time, and I think our next is a break, if I'm correct.