Log in to watch

Log in or create a free account to watch this video.

Log in
London 2019
Share
Download slides

Monorepos, Mainframes, and Modus Operandi Accepted

Building a legacy application which had 9 development teams that needed to build at the same time for a successful deploy. Our challenge involved us moving to a 'super build' - which we tried to combine all developments into a 'monorepo'; including our challenges with the the legacy mainframe.


Phillip is a senior manager for American Airlines, the world's largest airline, located in Dallas Texas. Although he resides in New York City, he commutes each week. Originally, Phillip hails from Melbourne Australia, where he received his degree at Monash University, a BS in Computer Science.


After graduation, Phillip launched his career with General Motors, where he began his career as a programmer/analyst. Initially, working on ERP systems and warehousing he then moving into their e-Commerce space where he successfully led the Holden Special Vehicles website to the number one auto-manufacturing website in the Australian Market. Soon afterward, Phillip's career led to him to the State Government of Victoria Australia where he worked for 3 years and was responsible for overseeing the smooth operation of over 40 government websites.


Phillip's career shifted to the US when he moved in 2006. Phillip spent four years in consulting working for various companies in several roles; he was especially involved in building several commercial software products, the biggest one being Econometrix for Manpower but also instrumental in architecting/designing several applications in the desktop, cloud and mobility space.


It was in this capacity that Phillip "landed" at US Airways in 2010 and is one of the key players in ensuring a smooth integration into the newly reborn American Airlines. While his roots are in application development and architecture, he oversees infrastructure, platform services, desktop support and network support in the technical operations division at American.

Chapters

Full transcript

The complete talk, organized by section.

Philip Knezevich

Good afternoon, everyone. It's a little bit smaller a crowd than I thought. I guess being the last speaker on the last day does that. You are probably the ones with a lot of gas still left in your tank, so I appreciate you guys coming along.

Did you know Google stores two billion lines of code in their source repository? Two billion. To equate that, it's nine million files. It's 86 terabytes of hard drive space. It's 3.5 million commits. It's phenomenal. It's the stuff of myth and legend. They've had 15,000 developers over 20 years. It's the sort of thing that you think is impossible. And so, the idea came to me of the monorepo from an architect about 18 months ago, who suggested that we could try and do something internally at American. Facebook, Twitter, React, Babel are some of the companies that have adopted this, but they're all software companies, and we fly planes. So, I took the idea. The technologist in me was giddy, excited, but the reality of it was that this seemed like a massive undertaking. And whether or not we could do it, we didn't know at the time. So it just added to this itch I have in my DevOps scratch, and I'm sure a lot of you guys have itches in your DevOps scratches, and that's why you're here. So the question came out.

I've got three topics today. I have mainframes, monorepos, and modus operandi. And I can assure you, I could talk about any of these for 30 minutes on their own, but for us, they blended together quite a bit. We had the core, the modus operandi, the Latin for mode of operations, and then on top of that, exosphere, we're trying to do DevOps. And then there's so much stuff in between the subset work of DevOps that you guys have spent the last three days working on and understand. For me, particularly, the monorepo and the mainframe, they're on polar opposites that need to come together.

So just before I get into my talk, a little bit about myself. My name is Philip Knezevich. I'm Senior Manager of Architecture and Platform Services at American. Been there for nine years, almost 10. Originally from Australia, and unlike my compatriot who just walked in, Jennifer, I love coming back to local vernacular. I can drive on the left. I can say zed instead of zee. And when I go to a coffee shop and ask for a biscuit, I don't get this scone-looking thing. I get what they call a cookie. So glad to be here in London. I come from a developer background in the .NET space, but proficient in Java and C. And I love open source. And I don't know, am I the only one here that scratches my head as to how CLI came back into vogue? Let's be honest. There we go. I grew up with fancy GUI tools, progress bars, modal windows, and all of this, and suddenly, Homebrew, Node, npm, Yarn, they're all using CLIs again. And I can tell you internally, our mainframers are chuckling at this because they use two colors called the green screen, and they've started that way, and they intend to end that way.

So just a little bit about American Airlines. We are the world's largest airline by routes, size, destinations. We're 93 years old, founded April of 1926. We fly to 350 destinations across 50 countries and five continents. That's a very vast network. We don't fly to Antarctica, which I wish we did, but we do fly 6,700 flights a day. 6,700. Think about that number. We have to take off 6,700 times a day. We have to ensure the paramount in safety, security, and reliability. We have to cater for customers' needs, and as you all know, customers are very different, so their needs are very different. We have to dodge weather, bad weather. We have to land. We have to comply with FAA. That's a lot of moving pieces. So to run an operation like that, we have 130,000 employees, 27,000 flight attendants, and 15,000 pilots. Our fleet size is now over 950 and growing. This is a strategic investment by us. We're investing $25 billion into the airline just to enhance the customer experience of flying in new aircraft. And I can tell you from my own experience, it feels like you're in a new car, minus the new car smell. But it just feels great. It feels fresh being in a new aircraft, and my only plug for American Airlines, I promise, fly with us to see the difference.

So cynicism in IT is not a new thing, and these are some of the memes I've collected over time, and just seeing some of these smiling and nodding faces, I'm sure a lot of you have seen these type of memes being socialized around. But primarily underneath them is this underlying message that has come out, and the one that always gets me is time estimates. I'm using this, "Deadlines," where the poor developers are being asked to estimate on things, and they'll be like, "Well, with a solid requirements document and the infrastructure in play, and I don't get on anything else, maybe four months." And then, of course, the PMs and the people that want to know this so they can try and forecast will say, "Great." And they'll say, "They're probably overestimating. It's probably two months." Right? So that type of cynicism does tend to come out in memes.

So we've identified five problems that are happening right now in the IT space, not just at American. I think all of you can relate to these sort of problems. And the first is a disconnect between IT and the business, and I facilitated a lean coffee session yesterday where we had a really engaging talk about how to align IT values versus business values. And for me, being more in the usability side, I often think of it like this. There's a SME or a BA feeding requirements to a developer who typically has headphones on, two monitors, coffee, keyboard, typing away, hates meetings. Right? They have to then translate this back into a product which they have no visibility into what the end users are. So in my case, it's flight attendants, it's pilots, it's line maintenance mechanics, it's stores clerks. It's even more difficult in the commerce area because you're dealing with everyone. How do you connect business to vision? So we've thought about this a lot in the way we need to engage with our business a lot more. It's difficult when you're dealing with a line maintenance guy who is using this on different hardware at an airport with loud noises around, probably on a smaller monitor. It's dirty, it's noisy, there's planes landing, there's machinery going on. So it's trying to align those values a bit more.

And then the second one, which really resonated with me, is leadership often tracks activities and not results. So what do I mean by this? When I first started, I was very obsessive about activities, especially in the Agile space. I wanted to see scrum meetings, I wanted to see retrospectives, I wanted to see velocity, I wanted to see stand-ups. So I believed in the prescription that if we do Agile, the rest will just speak for itself. My thinking has changed on this now. Now I'm more interested in MTTD, MTTI, MTTR. I'm interested in change failure rates. These are things that mean more to me. And the reason for this is because DevOps is becoming more cultural. So those activities, like doing a daily stand-up, you may have coders that don't want to stand up at 7:00 in the morning because they're late-night coders. So I think a lot of those cultural stuff you have to learn to trust your teams to do with it and focus on the metrics around results.

The other big thing that always gets me, and this is a lot of my inner stuff coming out here, actually, is the project funding. So I'm not sure at your organizations how you guys do project funding, but we typically have groups, divisions, and they play it like a game. Finance needs to budget. They have a certain amount of money budgetary every year. And so these guys will throw as much of that as possible. And it kind of looks like this, priority one, priority one, priority one, priority one, priority one, priority 10. And then they hope, if they can throw enough at it, that they'll draw a line as much in there as possible. And it's frustrating because they know if they miss, the funding is gone for a year. Budgets are in play. So we've changed our thinking now in the investment of product lines and feature teams. We now want to invest into the product and keep a steady cash flow going through. It's a different mindset. It's a different thinking.

And then finally, on points four and five, this is one that I'm sure we all relate to. Again, at my coffee talk, the frustration around explaining technical debt to the business. How do you do it in a succinct way? It's frustrating because you have to tell them that you need time to refactor code, or you need to do end of service life, but the business gets no new features out of that, no new functions or enhancements. So it's very difficult to draw that line with them to explain where the balance is. And so subsequently, all of this leads to burnout. For those of you that saw Dr. Maslach's call on, speech on Monday, it was great, talked a lot about burnout, and those are things I think we all empathize with.

So we had to change, and typically, in our culture, we had the hero developer, and I'm sure all of you guys have had the hero developer. The hero developer is the guy or girl who knows what's happening in the industry. They're seeing what's happening with DevOps and will come into the organization and say, "We need to do this. We need to do CI/CD. We need to validate unit tests. We need infrastructure as code. We need continuous deployment," and all that good stuff in DevOps. And they would often say this to non-DevOps people, to PMs, to middle management, to leadership, and everyone would just smile and nod and say, "I agree. Do it." And that was pretty much it. So we've changed that way of thinking now. We're not like that anymore. We're investing into a DevOps first culture. Our CTO, Maya Leibman, has really made this an emphasis within the organization, and now we have DevOps as part of our life cycle as we go through.

So I just want to show you a quick example of one of the challenges that we had moving from projects to products. So you'll see here on the far left is a typical user, so in our case, let's say a mechanic, and all they ever really see is a GUI, a web page. And then everything else that happens after that is magic. So, in our case, we have a SOA layer, and our SOA layer has changed over time. It used to be ESBs with high-level services, SOAP-based services, and then microservices, composite services built on top of that. And then we changed. We said, "No, we need to be more faster with our delivery. Let's use Swagger and a RESTful API and deliver JSON." And now we're even granularizing that even more. We're looking at GraphQL, and building resolvers in GraphQL. But either way, that layer there is separation of code, separation of concern stays in the SOA layer, and they typically have to then work with all these other groups. They may get their data from a BRE, from the mainframe, from another application database. They may just go to another endpoint, or they may have to go to MQ. So it becomes interesting because each one of these groups are managed by their own teams. So these teams have their own managers, their own QA, their own DevOps process, and suddenly, to get from one point to the other is tedious. And what I've seen is there's often a lot of negotiating via email and a lot of elbow-rubbing and back-scratching and stuff like that. And it works. We do coordinate with each other through email, and we are able to deliver our products. So a typical route may be this, which is the one I'm going to lead into, which is the mainframe. And so imagine four different teams needing to work with each other. It's difficult. So we simplified it. We just got rid of that subject line of teams, and we said, "We're going to build product teams. We're now going to have resources from each of these groups come in and actually work as one cohesive team." So a lot of that time slippage, time waste, has disappeared, because now they are all accountable to each other.

So it gets interesting because as teams accumulate, we now build this DevOps pipe, and in the DevOps pipe, we have a UI team, which is typically Angular, Node, TypeScript, those type of technologies. A Java team, oh, I'm sorry. A Java team, which is... Sorry about that. A Java team which manages SOA, and then we may have a .NET team, so each of them have their own different pipes.

Where this became interesting to us is when we hit the mainframe. And I say this fondly, they're not really the black sheep of the family. It's just very, very difficult to bring the notion of DevOps to developers that have been writing code since the '70s, and now in the last five years, we're telling them how to totally transform. It's an interesting experience, because they have a different plethora of knowledge. But we did say the problem we have is that the mainframe is where the centralized code is, and everything reacts from the mainframe. When the data changes there, it changes this service, which changes this UI, and changes everything else. So our DevOps pipe had to really react off the mainframe.

So we decided we're going to DevOpsify our mainframe stack, and kudos to Compuware. We've worked with them quite a bit on this, and I loved Chris O'Malley's talk on Monday. But we decided we would actually DevOpsify our mainframe. So the first thing you'll see is that we do use ISPW as our source control repository. We use Topaz Workbench as our IDE, and Topaz TotalTest for our testing. And, finally, we use ISPW to help promote to integration, and then ISPW across the line with XL release, XebiaLabs's release, as our package management deliverer. So from Jenkins, we'll build the pipe, and then from XebiaLabs, we'll combine it with all the other builds that we have running at the same time. It's been an interesting experience, but I really wanted to talk a little bit about this slide here, and testing in the mainframe.

So before I get into this, I'm not sure how many of you here know about mainframes, but it's not an object-oriented language. You don't have classes, methods, function pointers, interfaces. It's very script-driven. It's triggered typically by a three-letter transaction code, which then hits something called an MFS, which is a message format service, and then from there, it'll trigger and call copy book code, which is COBOL code a lot of the times. And then there's a PSB to work with IMS. But it has its own architecture, and fundamentally, one of the challenges that we've had is that the mainframe is stateless, and it has this annoying thing called a conversational transaction. For those of you, I see a few nods over here, and the conversational transaction can give back many different routes a lot. So imagine trying to write a unit test for something that has a conversational transaction and has many different outcomes. It's difficult.

So I draw it akin to this image here, and for those of you that have children, you guys know children love to squeeze toothpaste in the middle, and they'll squeeze at it, and they'll squeeze at it until there's no toothpaste left, and then they assume it's empty, and they throw it in the bin. Drives us nuts, right? Because there's all this toothpaste at the back. So that's how we felt a bit with all of the unit testing we were writing. They worked fine in some use cases, but then when we tried to adopt them to other ones, they didn't work. So they assumed that it was done, but there's still all this paste in the tube. So we're going to try and automate it now, right?

So this is a sort of technique that was very similar. We can pull at it. We can actually try and get more toothpaste out of it, but still the same problem, right? It was unique, it was custom, and it was non-reusable. So we could have invested a lot into the automation in the mainframe. And I can tell, I'm not sure any of you here are experts in load testing or performance testing, but setting up a test environment is difficult. You can't do it off QA or dev or stage because they're designed to bring down the system. So you have to do a lot of component load runner and all these other tools to actually get testing done, and we felt the same way in the mainframe. I could very easily go to Amazon and purchase this $120 toothpaste dispenser, and I apologize if anyone here actually uses this and loves this. But think about it. It does what I want, but now it's harder to clean and maintain. It's limited to certain forms of tube, and now there are all these dependencies that you've got. Power supply, location. It's harder to move.

So in the end, we had to take a step back, and we needed to understand, where is our testing going? And so the biggest fear that I see in DevOps is the fear of failing because deployment is hard. I want to get it right so I don't have to deploy again. So what we needed to do is increase our time to release, and we had to trust in open source and inner source techniques. We needed our developers to speak more to each other. So I draw that akin to the $8.95 toothpaste glider, and I'm not sure if anyone here uses this, but it really does work at the back and work its way up, and that was the way we had to think about when we were doing testing.

So the stack is fairly two-way with us. We have our modernized apps using GitHub, Coverity, Jenkins. On the .NET side, we use TFS build and now Azure DevOps. And then, as I said before, we're trying to orchestrate through XebiaLabs. On the mainframe side, which is up the top, you'll see we use ISPW, Topaz, and then our ops is done through ZAdvisor. We use xMatters as our monitoring tool, and they're here, and I can't speak enough about xMatters. They buzz my phone at 3:00 a.m. I love them for it, and I hate them at the same time. But no, they're a great company, and we use them for a lot of our alerting.

So I'm going to switch for a moment, and I'm going to talk about plumbing of all things. So I don't know how many of you here know what the plumbing looks like in your house. I don't imagine too many of you do. Some of you might. But it typically goes like this. Water comes in through one pipe. It hits the hot water tank first, and then the hot water tank will dispatch water to wherever you need hot water. Kitchen, laundry. You may have a dishwashing line. Water goes wherever it needs to. And then cold water is typically alongside it, because you need cold water where you need hot water. But cold water can dispatch to other places as well, like a fridge line or out into your garden. But the point is, water comes in, water does, and then water goes out. And typically, exhaust of the water is one pipe. One pipe in, one pipe out. Some people have different septic systems, they may have multiple pipes out, but typically it's just the one pipe in and the one pipe out, especially in suburban homes. So we had to think this way when we moved to the monorepo. We had to think, what are the fundamentals behind software development? Let's forget about everything in DevOps, and let's forget about technology stacks and the people that are religiously passionate about Java and .NET. Where are our fundamentals in software development?

At the absolute root core, we write code, we build it, we test it, we deploy it, irrespective of whether it's... Oh, sorry. Irrespective of whether it's Java, .NET, or what have you. So we were thinking to ourselves that this practice can feed into the monorepo. We can build software in its rawest form in the monorepo, and then from "Lord of the Rings," I've substituted the word ring with build, and you can see here that this is kind of an adage of what the monorepo says. One build to build everything together, and then with DevOps, bind it.

So coming back to my original starting comments, could we do this? I actually had this face when they brought the idea to me internally. I'm like, "No way. That's impossible."

So we thought we'd start small. We said we would experiment with the smallest subset of apps that do have some notoriety for software challenges, and we said that we would only focus on our UI first, so our Node.js apps and the like. We do have .NET stuff in there now, and then the Java stuff is coming in a bit later. We started with 15 apps, 41 shared packages, and these are internal packages. These are things written internally, like a flight web service or an aircraft web service. Could be an MQ, could be a compiled library. So it's just 41 shared internal packages. And then we imported 90 third-party dependencies, typically from npm. This is interesting because if you look at the top modules that are actually pulled from npm, I believe it's AngularJS is up there, React, body-parser, there are a few others. So we pulled down all these modules, and then there are sub-dependencies that these modules have. So even though it says 90, it's actually probably a lot more than that. But here's the big win for us. One CI/CD pipeline. Can we check in all the code into one branch, into one trunk, and can we release it all? So similar to my water analogy, could we do that?

We also have 70 contributing developers, so this is not a small experiment. We said we're going to start high, and I can assure you there was contention with this because we had developers that did not want to contribute to the monorepo. "I've worked in this app for 20 years. I know it back to front. Why do I need to learn everything else?" We had that type of mindset, because it's a big deal now. You're suddenly inheriting the entire code base, and it is way big for anyone to actually learn on their own. So how do we get around that?

This is a typical structure of a monorepo. I've actually simplified this. It's a lot more complicated, but I didn't want to bore you guys with a big screen of this. But typically, we said 15 apps, so I've stopped at three. But typically, at the top level, we'll have a shared folder. Node modules, build scripts, dist, libs, tools. These are all shared componentry folders. You may have an etc folder in there. You may have a cgi-bin folder, depending on what technology you have. But everything is shared at the master.

Now, the typical developer will just stay working in their application like they've always done. They'll open up app two, and they'll work in it. Where it becomes interesting is when they need to modify something in the shared folder. So they may modify something in one of our packages that affects app seven, eight, and nine. So now suddenly, they have to go to app seven, eight, and nine and work with them to make sure their changes are propagating across properly. That's frustrating, and there are two ways that this works successfully for us. The first is we have great tests, and we do test-driven development, and we have tests that these people can run on those apps to see if everything is working. Did anything break when I made this change? But as you technical people may know, that may not be enough. So on top of that, you actually have to go over there and talk to those developers and say, "I've upgraded body-parser to version two. Is that okay with you?" It's that type of collaboration.

So in the workflow space of monorepo, it's fairly simple to what I was describing fundamentally. You sync the repo space. We're not 86 terabytes like Google. I think we're at 19, so it's still feasible for most laptops to sync everything. You write code as you always did, and then you check it into this master trunk, this huge branch, and that's the daunting task. That's the hurdle that we need to try and get better with. And then we automatically deploy.

So I'm going to talk about the deployment first before the merge process. There's this notion of... And I'm wondering if a lot of you guys are thinking, how do you release 15 apps out of one trunk? So we actually do this notion called cherry picking, and it's a fancy word. Honestly, all it is, is a JSON file that has a portfolio of all the files and modules in there, and it knows how to compile them. But you cherry pick. You pick the stuff that you want to deploy in your app. So we could actually, from the trunk, we can create 15 of these and deploy at the same time. And we have this thing called tagging, where we actually know the files attributed to each application. So as soon as they are modified in the trunk, it will trigger off an automated build into that branch. So we're actually at that point. So there's a company up here, LaunchDarkly, that you should speak to because they do a lot around feature flagging, and the ability to control functionality as you go through.

But before we get to that point, I want to just go back to that discussion around breaking the build. This is the pain every technical team feels for one app. Don't break the build. It's daunting for one app. Now you're telling me I have to worry about 15 or more? It's a scary thought, and I think one of the mindsets behind the monorepo is understanding your application suite, not just your app, your suite, at a high level, understanding the functionalities of how it works from the top. So this becomes less and less painful as time goes on, and that's what we learn.

So the commit process is a little more scary. This is our Microsoft team, this is our .NET team, and this is TFS, and you'll see that we have a commit that happened 17 minutes ago, and then there are some that go down the scroll bar, doesn't show up, but we have some commits that are a week old. And the scary thing is the longer you leave it out to commit, the more likely that something wrong could happen. But the reason why this is slow is that we spend a lot of time in code review. We spend a lot of time looking at code, especially ones that are changing the shared componentry. We spend a lot of time analyzing that and trying to figure out how we actually merge this in successfully without breaking anything. So code review is very complete. It goes through code coverage, it reviews test cases. We look at the app. Sometimes we'll drag in our QA teams or our UAT to actually help make sure it works. So it is a bit slow. We would like to speed this up, but we're also very diligent in ensuring that we have that quality as well.

So how are our results so far on the monorepo? So there are some fancy numbers here, 330 times faster to deploy, multiple applications. Because we've consolidated a lot of these files, we have one source of truth, and that's music to my ears. We don't have 15 different left navs and top navs and shared componentry like a web service or an MQ. Now we have one version source of truth. That's huge. Extensive code sharing, which I've noticed, this has been a big thing for us, and I've said in here that there's a 250% increase in developer collaboration, and the bottom line is you have to. You can't be alone. You have to talk to your comrades to actually say, "How does this work?" So it's kind of a forced, but it's not, because our teams are very happy. They're enjoying that experience of collaborating more with each other. I've seen a dramatic increase in productivity as a result of this. We are continually onboarding new developers. There's an interest on this. Now there's a flexible boundary. It's like, "Oh, that team, I don't know much about them." That kind of thinking is now gone because now the monorepo allows us to openly have everything available. We're realizing the benefits of write once, use many. I'm sure a lot of you guys here are frustrated when you see services being written, rewritten, and rewritten, and the version of that service is applicable to that application. That's frustrating. So we've estimated, with a lot of this refactoring work, that it's about 4,200 hours regained by eliminating duplication, and there are about 1,600 working hours in a year, so that's a massive number for us.

That being said, it's not all rosy. We do have challenges and risks. We're still very small in the monorepo, and as it gets larger and the complexities grow, so too does our need to maintain it. So from Google's side, they use Piper as a source repository, and I think CitC is one of their deployment tools. We're noticing our need for more tooling. It's 15 apps right now, but what happens, because we have thousands of apps, what happens when we start scaling this out on a large level? That's our next step in this journey. So as I said before, TDD is a very important part of the monorepo. If you don't do TDD, you're probably going to fail. Monorepos need test-driven development. Feature toggling has been a struggle for us. I mentioned LaunchDarkly a bit earlier. We're doing our own, but for those of you that don't know what that is, that is the ability to turn things off, but deploy them to production, and then later you can turn them on. And the reason why you do that is you may have a dependency to another module going out, and you need to make sure it works. So we've had some struggles around that. So to do this, we ended up bringing a consulting group in. Their name is Narwhal, ex-Google developers that worked on the Google monorepo. They've been great, and they have guided us all the way through this, with a lot of the patterns, practices, principles, and stuff like that. Narwhal have been great. But from my perspective, I'd like to cut the cord a bit more and swim a bit more. So we're still very highly dependent on them for questions, and hopefully we'll get better with it as time goes on because we are new to this.

And then the other big thing is that the old polyrepo style of source management is still prevalent for the other thousand apps. How do we bring those into the monorepo? If we're going to get serious about this, we need to somehow figure out a way to bring them in.

Which brings me to my last question is, could we do this for all of AA, just like the Googles and the Facebooks? I'd like to. I don't know if we can. Okay? It's our next step. It's the questions that we're challenged with, hopefully for 2019, 2020 is how far do we want this to go?

So I think I'm just on time here. So three different topics, trying to bring the mainframe together, trying to bring monorepo together, make them build in one harmony, one repo, one pipeline, and to include the mainframe. It's a big deal, and it's a challenge that we've been tasked with. It's just another one of the things in our modus operandi that we're trying to get better with.

So my takeaway for you guys here is, if this is something that you think you can do in your company, you should consider it. If you have a lot of duplication in code, if you have a lot of code management, the monorepo may be the way for you. But if you want to do it, you have to build a model of transparency, and collaboration just begins with trust. Let's go back from Dr. Maslach said words around burnout and fear. It's not just in the monorepo. I'm sure these are words that apply to everything, but definitely something we've noticed in the monorepo. And with that, I'm out of time, and thank you for your time and for your interest with me.