Why You Shouldn't Run Business Applications with a Build Tool
You wouldn’t drive kids to school in a golf cart. Why run business applications with a build tool?
Chapters
Full transcript
The complete talk, organized by section.
Joe Goldberg
So I'm going to talk about something today that I certainly do not intend in any way to be bashing build tools or any other kind of automation.
The line from Caesar came to mind, taking a bit of a paraphrasing: I have not come to bury build tools, but really to try and praise or push for domain specificity.
There's a whole bunch of different categories of automation. We talk about that certainly at this kind of an event and other similar events in the DevOps marketplace. There's a lot of different tooling, and this perhaps is not doing all of that automation justice, but it's an attempt to categorize and put them into some collections that we can talk about.
There's obviously some things here that you might say is our personal bias, but I'll talk about that momentarily.
So there's a lot of tooling for managing configurations: the Chefs and Puppets and Ansibles and Salts of the world, for managing infrastructure and networks, and there's a whole bunch of that kind of tooling that's available.
There's another category that I think is probably closer, and maybe nearer and dearer to the hearts of DevOps practitioners, which is all of the tooling related to your toolchain or your automated CI/CD pipeline.
And then there is the kind of automation that runs stuff in production. And so that's what I really want to focus on. Now we're talking about this, or I'm referring to it here, as business service orchestration. It's the stuff that's going to run your apps in production.
Now, the arrow on the left obviously is our point of view, but the reason for that point of view I would like to justify, which is that no matter how long it takes to build your apps, and no matter how long their lifecycle is in total, the vast majority is going to be spent in the production environment, or delivering value to business and customers, or interacting with the world outside.
And so I think as we look at what are these kind of tools and what they do, the ones that fall into this kind of application orchestration have a very high impact on the quality of an application over its entire lifecycle: how the world perceives your app, the business value that you get, the impact to the business if it fails or has trouble in that production environment.
And I would argue that that is a portion that we don't spend enough time talking about. And so DevOps is supposed to be at least somewhat about ops, arguably as much as it is about dev. We talk about injecting new tools or new sort of practices into DevOps, and we talk about SecOps and DevSecOps because there needs to be a focus on sec. But ops has always been in the DevOps vernacular, and yet I think we don't speak about it enough.
And there's a lot of reasons I think that that is the case. For many large organizations and enterprises, they've spent a lot of their time with tools in production that were owned by ops that were difficult to plug into a DevOps toolchain.
And so I think this is part of what we talk about as this notion of jobs as code, that the orchestration that is part of your apps, that runs them in production, needs to be treated just like any other component of the application and needs to be injected and managed within your automated toolchain or your CI/CD pipeline.
Now, some people say, "Well, orchestration, we used to call it batch or scheduling. It's kind of decreasing in importance." And I would say, absolutely the need or the kind and nature of orchestration is changing.
But we just did a survey recently with Forrester where the amount of, whether you call it orchestration or workflow management or DAGs or schedules or batch, whatever terminology you want to use, there's a huge need for that. And certainly the vast majority of practitioners in our marketplace see that that need is going to either continue or maybe even increase.
And you can think about all of the components that make up modern applications. Certainly there's a huge emphasis on cloud and containerization and sort of re-architecting applications into more components, and whether you want to call it micro or maybe mini services, there is going to be, going forward, more and more components that need to interact and that you need to look at and understand.
And so the need for orchestration, out-of-the-box knowledge of how to deal with an application, how to be able to provide things like business service management from an SLA perspective, how to be able to consolidate and manage logs in a consistent fashion, how to understand the relationships, all of those things are critical to ensuring that an application can run successfully at the highest level of service quality once it hits your production or is exposed to your customers.
From our perspective, we think that there are certain characteristics that are critical to having a set of, or a capability of, orchestrating your applications in an enterprise or at-scale kind of an environment. And we think that some kind of a platform-like approach needs to be taken.
So one of the most critical components is, again, what I just referred to, this notion of all of these components and how they interact, and being able to visualize a service from a business perspective end-to-end. Being able to see what is going on in a particular application from a data perspective. We talk about data lineage. I would talk about this as kind of process lineage.
How does everything hang together? When something goes wrong, what was the sequence of events that brought us here? If there's a delay or things are not happening the way they should, I need to be able to see that. And I talk about this as this hyper-heterogeneity, which is kind of the new way of thinking of what an enterprise in that scale environment looks like.
Certainly, there's a certain expectation that if you're going to have something operate in your business environment, it has to be not only scalable, but secure. You have to be able to audit it. It has to be compliant to whatever regulations and governance is required. So that's a major requirement.
It has to be able to serve a broad category or collection of users. So frequently, when we think about how we develop an application, we're thinking about the involvement, generally, of a very technical or a very capable set of personas that deal with it. But once it gets into sort of the wild in an organization's production or operational environment, there are business users, business analysts, there may be customers that may have some interest in interacting with it, but that have a much lower level of technical knowledge or even interest. Even if they have the ability to learn, they don't have the desire to become expert in these kind of tools. And so you need to have interfaces and flavors of interacting with the application that are broadly diverse.
Of course, it has to be ready for today's environment. Nowadays, we talk about cloud and containers. A year or two from now, it might be something else. A few years ago, it was virtualization. So what is considered a modern environment continues to evolve, and that's another key element: not only does it have to be ready for what your environment is today, but it has to be sufficiently adaptable that it will be able to provide the ongoing requirements as that changes.
And finally, and I think this is the piece that I want to kind of focus on today, and that I think has been missing in the past, is the ability to include all of the elements that make up this orchestration in a delivery pipeline, just like any other component of your application.
In fact, I would say that if you think about an application very, very broadly as your Java or C or Python or Scala or whatever as being your business logic, and your infrastructure as code as being, of course, the infrastructure and the networking and the security and everything else, this becomes the terminology that we use as jobs as code. That you should be able to build it at inception. You should be able to then push it through a cycle where it gets built and validated by a build tool. It should be part of the test environment that's going to be instantiated dynamically where you can test it. It should be deployable through the entire CI/CD toolchain and land in production where all of the capabilities that it provides that are specific to its domain become the great value of ongoing operation.
So a couple of examples, and then hopefully we get to a demo. So a couple of customers, I think they've actually spoken at DOES events previously.
And the thing that I think is important for both of them is this notion of being able to include their orchestration going forward within their automated CI/CD pipelines and taking a modern approach to building it. These are both organizations that have used particular solutions in their operational world for a very long time.
In the case of PayPal, all of their payment processing is handled by such an orchestration. Happens to be our solution. But really, I'm arguing for a much more general approach in thinking of how you should look at this kind of orchestration.
They transitioned from an environment where it took them, as you can see, days, weeks, and months to perform almost any action related to an application and then get it into production, to provide self-service to all of their developers, including the ability to build this orchestration in a self-service manner, plug it into their delivery pipeline, have it tested, and land in production, not only more quickly, but at a higher level of quality, because they were never able to do the kind of testing that they are now able to do using the automated tools that they have built.
In the case of Amadeus, they're a travel service company. They have some eye-popping statistics on their website if you're not familiar with their company. But they have transitioned from kind of a mainframe-based environment to a classic sort of distributed systems environment, and now have implemented a private cloud in combination with some public providers. And they have been able to move their orchestration from each of those environments to the other and go back and forth. So as they're transitioning, because they're going to be making this transition over a number of years, they'll be able to have this kind of seamless access and run their workload wherever they want.
Now, what I'd like to do is just very quickly kind of step through a process. So let's say I'm a developer, and this, by the way, is just hand-drawn, so don't check it for accuracy. It is intended as a quick example.
I'm building some kind of a data pipeline. I've got a bunch of streaming data arriving from truck sensors, and when that data arrives, this is a predictive maintenance kind of an application, I am going to verify or look for anomalies that will indicate the need for service. Obviously, the intent is to keep the trucks on the road.
Now, the data that's coming in here may be all in real time, but I've got a model that I need to train every so often, and I've got customer and warranty and parts and CRM and all kinds of other data that may be coming from even mainframe or traditional systems and systems of record. There's extracts I need to pull.
The goal is to be able to determine that a truck needs service, notify the driver, and get it serviced. But of course, maybe the service center is busy or maybe the part is not available, and I've got to work with my inventory system, make sure that the part gets ordered, wait till the part arrives, or at least I have a delivery date, wait before the service center is... So this kind of waiting and this kind of activity.
So this is the kind of general scenario that I'm talking about. So let's pop out of this and look at what would be my sort of development environment.
So I'm now going to focus on some of the other work I have here. I'm in Eclipse right now, and some of that orchestration that you saw in the previous flowchart I have here. I need to pull some data from systems of record, so I'm going to be doing some file transfers. I need to schedule that processing to occur only after the data is there.
I want to make sure that the extracts, maybe they run at a particular time, but I want to make sure that they run as soon as they are available. So I don't want to do this kind of arbitrary, "I'm going to run it at 5:00," but I want to make it event-triggered.
So I'm not going to go into the details of what you're seeing in front of you, but what you have here is some JSON which defines that kind of orchestration, and I, as a developer, am working on my little piece. So I've got a bunch of Scala code, but I also have some of this development code.
And obviously, the first thing or one of the first things that I need to do is be able to validate it. So one of the services that's available is to validate this, and I can either run this via command line or externally, or I can run it internally here.
The environment against which I am validating happens to be a virtual appliance that has my operational, I'll name it, the solution I'm talking about, something called Control-M. There's a virtual appliance, happens to be running on AWS, and I have a big data environment that I'm working with as well. It happens to be an instance of an EMR cluster.
And I'm doing my work here. I'm validating my code. I want to be able to see if it's going to not only be valid, but then run it. So let's say if I make some kind of a typing error here and I have invalid syntax, just to show you that you will get some kind of indication that you have an error.
Okay, so that's the error I just made. I'll fix it.
And I kind of iterate through this process. I get my syntax correct. Now it's good. Now syntax is fine, but obviously if this is work that's going to run, I need to be able to run it and make sure that this is going to run successfully.
And so another service that I have available to me is to run a test, and this test is going to really run these jobs. Now, in addition to running them, so you can see down in the window, I'll show you that momentarily, I get some kind of response that lets me interact with this collection of jobs, but I also get a visual representation.
And so this is part of what we call our workbench environment. That's my own private testing environment where I can really run these jobs and make sure that they are valid, not only syntactically, but they kind of flow together as well. So again, this is a very simple flow, but I can look at things like the log and the output. I can rerun the jobs if something goes wrong. I can kind of experiment on this.
And of course, the intent is to give me an environment in which I can do whatever testing I need to do in a way that is completely unencumbered. Because one of the other problems that has existed in the past with tools that provided this kind of functionality is that they have been difficult to access for people outside of the data center or operations.
And one of the points of conflict, in fact, that has existed that I think led us to DevOps is this kind of ownership conflict. Not only are we now combining teams, but we are also sort of democratizing the tool so that everybody has access to it whenever and however they need.
So I can run this, and just to show you some of the services, and these are of course all exposed in REST, so we're using sort of industry-standard, if you will, interfaces. By the way, it's a Node.js CLI that's just a thin wrapper over REST APIs, but it gives you the ability to do things like look at the status, and this will give me, in sort of command-line version, exactly the same information that we just saw graphically.
Okay, so I could look at each of the jobs and whether they ran successfully or not. I then can also interact with these jobs as well, so that just as we looked at... I don't remember. I don't think I showed you the log, but I could have.
I didn't cut it. Sorry.
Okay, so I can interact with these. And the reason I'm showing you these is because these are building blocks that we'll see momentarily how these get used a little bit further downstream.
So the point of this is not to show you the individual functions, but the process flow. So I've developed my stuff. If you haven't noticed, you can see that already this stuff is in Git. And of course, once I do my commits and if there's an upstream repository, I could push to GitHub or whatever.
And the intent is that I, as a developer, could do my work. I could validate that it is syntactically and logically correct. I can check it in and allow it to move on to the next phase.
When it moves on to the next phase, let's see, I've got a Jenkins here somewhere. Okay. I'm going to log in again just to make sure.
Okay. So I have a pipeline built. Now, this is a very simple one, of course. Once I did my commit, if I have all of this wired together, then that check-in would cause my build to start running.
And this is a very simple build, so I've got a bunch of... I'm not even going to bother running it. I don't think I have the time. There's a bunch of Scala, so I'm using the Scala build tool to build my code. I'm then using the same APIs that I was using as a developer to validate and run my code. I'm going to do some builds within my pipeline here. If that's all okay, I'm going to deploy it, and that will push it into maybe the next stage of my environment. And I will even run some tests.
Okay? So I'm using Robot as my testing framework. And this is a key element here as well, that it's not just that you want to be able to build some stuff and then pass it along, but you really want to apply what are intended to be the benefits of a CI/CD approach.
Not only do you validate this as you go, but you test it at every stage. As your testing environments become ever more complex, the testing quality is going to become more rigorous, and the result is going to become more refined as you go.
So if we pop over to... I think this is where my... Yeah.
So I don't know how many of you are familiar with Robot, but these are my test scenarios. And here, again, I am using the same services I was showing you before to do things like validate and then run and operate and then retrieve outputs. So you can see here I can get stuff.
So I've run my jobs. I can now check to see and interrogate the output, see if it was successful or not. So then as I move down the pipe, and you can see that, well, I think I showed you in the Jenkins before, that all of the results, all of the three tests I put together, but as I find new scenarios, I'd be able to add them.
The one piece that we just don't really have enough time, but is really part and parcel of this is... By the way, here's my EMR cluster that I was using for testing. Here's just a test server where I'm running some of my jobs, but these are all my own personal environments. And of course, if you have then a test and a QA and a pre-prod and a UAT and whatever, similarly, you would do the same thing.
And in these environments, we have all of the components that make up the tooling that is required to run the jobs and manage the jobs and see whether they were successful or not, and trigger dependencies and do all of the orchestration. You have all of the capabilities to be able to provision this environment equally so that the entire end-to-end process can be fully automated.
And so no matter what phase you want to work on and what you want to focus on in terms of how you are looking at the orchestration, the intent is to support a completely DevOps-like, CI/CD approach to building, testing, deploying, and managing this stuff.
And even the perspective of what becomes the authoritative source of truth changes. Because in traditional operational tools, the operational environment has its own sort of persisted data. And the idea is that usually if you need to change something that's running in production, traditional approaches would be that you would change it in production.
Here, of course, you kind of invert that, and instead of changing it in the end system, you change it in your Git or whatever your code management or version control system is and push it through the pipeline again. So that you constantly are taking advantage of not only this level of automation for deployment, but also you are taking a real approach to how you manage this kind of orchestration as part of the entire application.
And so I wanted to leave a few minutes for either discussion or questions, if not, but this is what I wanted to talk about. And give you a very brief overview of how we are viewing this kind of orchestration.
Of course, there's a lot to talk about in terms of the capabilities. What makes up an operational instrumentation or business application orchestration tool? I've left that for, hopefully, either subsequent conversation if you are interested, but at the very least, to get you thinking that whatever you're going to use, it should be a set of capabilities that are specific to operating in an operational production environment.
We've had a lot of conversations with people who say, "Ah, well, hey, I use Jenkins for my build. I'm going to use Jenkins to run my stuff." And usually, I would invert the question. I say, "Would you ever consider doing your build automation using an operational tool like, let's say, Cron?" And the people are aghast. "Of course not."
"Why not?"
"Well, you have integration with code management, and you have all these build functions, and all kinds of other..."
And so similarly, really what I want you to think about is that from an operational perspective, there are similar domain-specific requirements. You should consider using the right automation for the right kind of capabilities that are required, but make sure that it meets the test of being able to be developed, tested, and deployed in a modern, automated environment.
So I will pause with that if there are any questions or comments. And if not, I will thank you for your time.
Q&A
Q: One of the gaps between what Jenkins can do and what other automation management systems can do wasn't quite clear, but there seems to be lots of things that you're saying are part of the responsibility of automation, like, say, monitoring. That actually could be another application and another thing that Jenkins is deploying. Why do you think that those all need to be owned by the same system?
A: I'm not suggesting that they have to be owned by it. No, not at all. I'm just saying there are different categories of automation, and I think you need to pick the right tool to perform that automation, but apply the test that it should be the kind of tool that the objects and artifacts and components of it can be deployed in an automated delivery pipeline. That's all.
Of course, I'm suggesting that ours is such a tool, but that is what I'm saying. Okay? Did that answer your question?
Q: Yeah, I need to think a bit more.
A: Okay. Any other questions?
No? Thank you very much. I think I get two minutes back.