Has Everyone Forgotten Application Workflows in Kubernetes
Google became the latest cloud vendor to announce their strategy for bringing their cloud to your datacenter (Google Next Keynote April 8, 2019) following Azure Stack and AWS Outposts. And the industry as a whole is embracing K8S as the de facto standard for the modern datacenter! Developers are infatuated with containers and microservices that are tailor made for K8S. But who is thinking about orchestrating the backend applications that power the business?
Come to this session to:
- Learn about K8S JOB and DaemonSet objects
- Hear how Organizations are leveraging them to manage business workflows
- See real-world examples of customer 360 and sentiment analysis
Joe Goldberg is an IT professional with several decades of experience in the design, development, implementation, sales and marketing of enterprise solutions to Global 2000 organizations. Joe has been active in helping BMC products leverage new technology to deliver market-leading solutions with a focus on Workload Automation, Big Data, Cloud and DevOps.
Chapters
Full transcript
The complete talk, organized by section.
Joe Goldberg
I am going to be talking about the need for orchestrating business applications in the Kubernetes environment, what are some of the considerations, and discuss a little bit about a sample implementation.
And so I want to spend a minute or two -- I was going to spend a little bit more -- talking about why I think, I believe, and what we have heard from discussions with organizations as to why you need to think about applications or business applications differently than just the resource management and the other capabilities that are provided by Kubernetes. And I will argue, and hopefully you will agree, that Kubernetes itself and the structures and the controls and the objects it puts in place really contemplate that as well.
So, very generally, if you look at what is a stack in an OS or how you manage an environment, there are a whole bunch of layers, and usually the component or set of components that manage the hardware, the networking, and manage resources and workload at a resource level are different and have different considerations and think about different things than the software or the tools that you are going to use to manage business applications. And I think it is important to understand that.
And so Kubernetes really focuses on resources, and there are several applications, of which ours is one, that focus on managing the business application aspect of things.
So, to speak a little bit more about what that really means: when you think about Kubernetes and managing resources, whether it is Kubernetes per se or a lot of other solutions that are available for a lot of other environments, they do very similar things. And so I have a sample here or a cross-section. If anybody is familiar with Hadoop and YARN, or if you are running Mesos, or even arguably if you think about the Linux kernel or even the mainframe and the OS, in any of those environments, they all think about these kinds of things: maximizing the resource capabilities, the resource facilities that you have, and worrying about things like memory allocation and how many tasks are running and latency and networking and all kinds of other things.
And those are the domains, or the general domain, of resource scheduling or workload management. And I think you can even go beyond that and think about cloud and just about every computing environment, and that is a domain of Kubernetes specifically. Whereas when you think about business applications, you have other considerations. So you have to worry about things like, what are business cycles? Not only simply time and date, but holidays, specific business periods that are relevant to business applications. You think about things like service levels and when things have to be done from a business perspective, not necessarily from a resource management perspective, and a whole bunch of other things. And again, I was going to spend a little more time on this, but given the time, I do not want to spend too much time on it.
And all of this requires operational management, visibility of upstream and downstream dependencies from a business perspective, the ability to analyze and triage problems when they occur, and collection of logs, the ability to either kill or restart processes and business applications, business components. So there is a whole bunch of stuff that really is at a higher level of abstraction from a workload management perspective that applies to business and business applications as opposed to managing resources, managing the technology layers.
So I meant to ask this question to begin with, and I forgot. How many of you are either running or thinking about Kubernetes today in your environment? Okay. Pretty good collection. Anybody, of those people raising your hands, running Kubernetes in production? Okay, a much smaller set, but still a good number.
Okay, so if you happen to not be terribly familiar with Kubernetes, this is generally its structure. So it is what has become pretty much a classic distributed computing kind of an application. There is a master node and a bunch of worker nodes that are called Kubernetes nodes.
A little bit of history, in case you are not familiar: Kubernetes is the evolution of an internal Google project called Borg, more or less. If you are familiar with, or if you are a fan of Star Trek, you probably know the Borg. It came from that. And in fact, the project, when it was launched at Google for Kubernetes, was called Seven of Nine, which is why the logo of Kubernetes has these seven spokes in it and so forth.
From a facilities perspective, and again, I think that I would argue that this is true of all of those kind of resource managers, they all contemplate the need for this kind of higher level of abstraction and so provide a variety of facilities for managing what are the job objects from their level, but enabling that capability from a higher level facility that is thinking more about business and more from the application perspective.
So in the case of Kubernetes, obviously this is not an exhaustive diagram, but the atomic unit of execution of things that run in Kubernetes is a pod, where usually a pod is probably a single container, but it could be multiple containers. Looking at how those pods are run, this is where Kubernetes now applies its level of management, and you can have either services or DaemonSets and Jobs, which is what I am going to be talking about specifically, and there is a whole bunch of other kind of controllers. And every once in a while, as you see new releases with Kubernetes, they either modify or add new service or new kinds of controllers.
The way you communicate with this environment, again, very typical, is through a set of APIs. Today in the industry, the de facto standard pretty much is REST. And so the Kubernetes API is RESTful web services, and you can either talk to it directly via REST in whatever facility or language you are familiar with or comfortable with, or via kubectl, which is a CLI that implements those same RESTful APIs or RESTful web services and provides a CLI for you to communicate with Kubernetes with.
In the things that you are going to be seeing, I will spend a little bit of time on both, and we are using the same APIs. And, again, what I want to draw your attention to here is the Job or CronJob objects that exist in Kubernetes, and this is the way to run the kind of workload that frequently, if you break down into two general categories, you have business application workload stuff that runs all the time that could be either real-time streaming, transaction-based, and then you have things that begin and end. And those are either batch jobs or whatever objects and however you want to describe them, but that is meant to be the Job object.
The only way to do some kind of periodic scheduling in Kubernetes natively is through the CronJob object. If you are familiar with cron, and probably you all are, it uses and provides that same kind of capability. Very, very rudimentary kind of stuff that allows you to start stuff at a particular time, pretty much, and that is about the extent of it. Again, none of the kinds of capabilities -- there is a part of a bigger discussion, really, whether cron is business application workload or not. It certainly is a way to run tasks, but it does not provide the dependencies, neither within a particular environment. So if you are looking at a single, let us say, Kubernetes cluster, there is no dependency mechanism among either Jobs or CronJobs. You have to implement that yourself.
Or certainly if you have multiple clusters, and a lot of organizations that we talk to do have a lot of clusters, immediately the CronJob capability becomes extremely rudimentary and really very restrictive, does not give you the kind of either visibility or control or facilities to coordinate among a collection of things that run.
So what I would like to do is spend a little bit of time on a use case and show you some of the things that we have done to implement this use case. The use case here is a customer 360 view/sentiment analysis. It is based on an organization, or a use case implemented by an organization, where they wanted to significantly, or at least substantially, increase their wealth management business, and they thought that they could do that in part by adding sentiment analysis, seeing what people are saying or doing on the Twittersphere, so to speak, and in the social environments, capture that information, use that to either identify new potential customers, identify customers that may be having problems, and a bunch of other things. So their implementation looked something like this.
So you may know that when you tweet, it is an option, and about 10 to 15% of people who tweet enable or do not disable the function of geotagging their tweets. So you tweet, it is really kind of anonymous. You can have whatever handle or whatever your Twitter identification is. It is relatively anonymous. But if there is geolocation data tagged to it, then there is the opportunity to try and identify who you are by looking at the geolocation data and mapping it to the information you have.
So in their case, obviously, as an organization, they have a large customer database. They have their customer names and addresses, and so they took the geolocation data that is available. In this case, we used a sample open addresses dataset from Kaggle, if anybody is familiar with that source, which has the geolocation coordinates for every address in the United States and in many other parts of the world. This was an American company that did this.
And so the geolocation data is updated on some kind of irregular basis. When new houses are built or new subdivisions are created or any of the kind of changes that occur. And so there are intermittent updates to that information. They want to be able to take that information, capture it, so then when they get tweets that are coming in that have geolocation data, they can look up the geolocation data. And by the way, not really all that important here, but Mongo has a geolocation lookup, and we are actually using that. So you can find the potential address. If that is an address of a customer, then you can identify that particular tweet as being at least potentially from a customer. Okay, so that was kind of the mechanism.
In order to maintain that environment, they were capturing everything into a Snowflake database. That was their data lake, at least for that application. And the processing of pulling tweets and then taking those tweets, identifying potential candidates, doing the lookup, and then pushing to downstream applications that would either -- if the tweet said something like, "Gee, I wish my broker told me that Zoom was going public and I could make a billion dollars," that might be an opportunity, if they were not a customer, to reach out to them and offer them our services, or maybe apologize for the broker that they did have and undertake some kind of triage of that. So it could be possibly a service or a customer support kind of a situation.
They were also using it to identify which were potentially the same family members who are living at least in the same address and had some kind of relationship, but in their environment, they did not know them to be of the same family. And so that was an opportunity for efficiency and combining accounts and being able to hit sort of a greater, broader target with the same email campaigns or whatever they were doing, or simply to push personalized information. So if somebody expressed an interest in a particular topic or a particular company, that they could forward or push to that kind of a person very specific and targeted marketing information.
So that is kind of the environment. And from our perspective, where our solution and arguably, more generically, any application or business application orchestration solution fits in is where you see the little icons that we popped up. So the core of this is a Kubernetes environment, and we are running a bunch of containers, let us say call them microservices, that are part of this application. They are pulling from Twitter using Twitter API and publishing to Kafka. Kafka is the mechanism for capturing this information, doing some of the analytics by pulling from the Kafka topic, looking up the information that we have resolved in our data lake that now is combined CRM customer information together with the geolocation data that may have arrived from time to time, and pushing out the identification of potential targets for these kind of activities.
Now, before I leave this, so what is important here is that this obviously 10 to 15% of tweets have geolocation. This is augmenting their 360 customer program. It is not the full thing. And they wanted to be able to manage this on a business priority basis. Sometimes they want to be able to reduce the number of instances of containers that are running and pulling and pushing and doing the analysis if more important work was going on. If there were specific business periods, the ones that we always use in examples like Black Friday or whatever, where they wanted to be able to accelerate this kind of activity and open up the aperture. They wanted to be able to drive and manage this from their application.
And so this was a pretty good example of being able to coordinate information and processing that is occurring not only in the Kubernetes environment, but outside of it, and being able to manage the workload levels and thresholds, if you will, in that Kubernetes environment using business priorities and business inputs rather than just simply relying on resource levels and computing availability.
So, again, as I mentioned, we have had some technical issues. I am trying to switch to an environment where all of this is running, and hopefully it will be successful. So let us see how we do.
So this is my very complex single-node Kubernetes cluster. From a demonstration perspective, it is real. By the way, it is based on Bitnami Sandbox available on AWS. If anybody wants to get familiar with Kubernetes, it is a great way to do it. We have no association or affiliation with Bitnami, but it is a plug for them. I think it is a nice facility.
If you look at the kind of controls, and I am using kubectl. Again, this is a CLI that allows me to communicate with Kubernetes and to do things like see what kind of objects do we have running here. So you can see that the only pods -- remember the pod is the smallest sort of execution unit, it is the atomic unit in a Kubernetes environment that we are running. Right now, the only thing that we are doing, or the only pod that is running there, is a Control-M agent. By the way, that is our solution. But it is an agent that is going to manage the workload in this environment.
And this agent, when it starts up, communicates and connects directly to a bunch of infrastructure so that if we had, let us say, the workload that you saw in the diagram that we have stuff that we are polling an S3 bucket looking for incoming geolocation data, pushing that into a Snowflake environment, that is being managed somewhere else on different platforms. But we may want to coordinate that activity with this so that maybe if we know that there is a big geolocation update, we may want to defer some of this tweet analysis until we have done the update and things of that sort.
I will meanwhile go back to our environment, and what I am going to do is just submit some work. Okay, so one of the other things that is really important in this kind of environment is that if you are doing things like managing applications in this kind of an environment, the expectation is that you should be able to do this in a fully automated fashion. So that is kind of a part of another discussion. What we have here is -- maybe just type this real quick. The way that you define this workload is by using JSON.
So there is a whole bunch of discussion as to how you actually do that, but you can define all of the workflows that I am going to be discussing or that I have been kind of alluding to using JSON. You could put it into version control and manage it, version control it, apply all the programming and engineering best practices for CI/CD, which is one of the reasons that we are discussing this in this particular venue. But ultimately, the effect of this is that it travels together with the rest of the application as part of an application release. We can not only code it, but test it, and eventually promote it into production and have it run. So this is what we are running.
Now, if we go back to our Kubernetes cluster, and if the demo gods have not completely abandoned me, then hopefully we should see at least... Well, my typing seems to have. Whoops. Ugh. Come on. Okay.
Okay, so we see a couple of additional pods that have been created. And instead of pods, if we look at, and again, these are Kubernetes objects, there is the notion of Jobs. And again, so these are Jobs. I am trying to remember where I put the manifests so that we can spend a little bit of time on those.
So we can look at some of the... Okay, so if you are familiar with Kubernetes, you know that Kubernetes is a declarative environment where you specify via these YAML manifests what you want to be the case, and Kubernetes will make it so. And so when we initiate a Job, what we say to Kubernetes is, "There is this YAML that defines a Job or several Job types. Please ensure these Jobs run." And so Kubernetes will do that.
With a DaemonSet, so if you are familiar with Kubernetes, you are probably familiar with Unix or Linux in general and what a daemon is, and the notion of a daemon is that it is something that runs forever. It is kind of a service that is running. So in Kubernetes, there are a variety of different ways to define services. We have, and let me just, I guess, look at very briefly our agent DaemonSet definition, where we have specified a bunch of attributes to make sure that the agent is running.
And one of the things that is kind of interesting, and that is a facility of Kubernetes that we use that really is very nice, is this nodeSelector mechanism. So when we define the manifest for the agent, you may have a cluster that has 10 nodes, 100 nodes, thousands of nodes. There certainly is no reason to run more than a small number of these kind of agents that are going to manage job execution.
And so we are using nodeSelector as a mechanism that tells Kubernetes, "Run this particular daemon only, or apply this DaemonSet only to nodes that meet this particular nodeSelector value." So what that allows us to do is if we were to add another node, if we wanted to run our agent on that node for maybe high availability or resiliency purposes, all we then do is assign a label to that node of this workflow manager type, and it would start up our DaemonSet. And so you gain the benefits of Kubernetes management, ensuring that what you need to be running is running, what you do not want or do not need to be running will not be running, and to be able to manage the native capabilities of Kubernetes and Kubernetes objects to achieve business goals.
Now, our higher level, and again, I would have spent a little bit more time on this, but unfortunately, I do not have the luxury of that time. So just a quick way to view, in our case, this is the set of Jobs. This display is a little smaller than I had configured, so let me just monkey around with it. But this is that architectural diagram that I showed. So here we have a Job that is polling an S3 bucket waiting for updates, and when that geolocation data arrives, it will pull it and then apply or push that to Snowflake.
Sorry, I am trying to reconcile this. This says I have got eight minutes left, but I thought I had two minutes left so I can keep going. Okay. I will slow down a little bit.
Okay, so these Jobs, you can see they are not necessarily related via dependencies, but the ones down here are. So we have what we call a folder, which is a container object that has all of our Kafka producers and Kafka consumers. We are able to apply application resource levels to manage the number of instances of those containers that are running. So yellow represents running. Green represents ones that have completed. It is here that we can, if we need to, we can, let us say, kill one.
And when we kill one, so think about what is happening in a pod or in the Kubernetes environment. You have got a Job which points to a pod or contains within the Job manifest a pod definition. The pod definition refers to a container object. So there is an image that is going to be started up when this Job is started. This image happens to be the containerized version of our application that is consuming or subscribing to a Kafka topic, pulling off tweets, doing the analysis, and determining whether this is one we are interested in or not, and that is running.
In order to be able to operate on it, we are using the Kubernetes API to start the Job. I just killed it to kill the Job, and let us take a look at the output, in our case, to be able to retrieve the Job. So what we are doing is tracking the Job, making sure that it is running. Let me pop this out so you can see it a little bit better. So you can see this Job is pulling or consuming, if you will, from a Kafka topic. It runs and eventually when we have killed it, we do the cleanup. And so you see that we have deleted the Job and we have also deleted the pod. If there would have been multiple pods, which could be the case sometimes if the Job is being rerun. But we are using Kubernetes facilities where they are appropriate to start the Job, to track the Job, to define how many instances of the Job we want to have running.
I do not know what that was.
So anything related to resources to maintain the desired state, which is what Kubernetes is particularly good at, but we are applying a higher level sort of abstraction via our facilities to determine how the business application aspects are being governed and managed.
And ultimately, in many of the organizations that we talk to, and I think it is probably representative of what we saw in the room as to the number of people that are running Kubernetes in production, nobody, or at least nobody that we have encountered, has switched 100% of their workload to Kubernetes. And so in an environment where if you have application workflows or applications that have dependencies within a single cluster, then arguably you might find a way to, I think, orchestrate and to build that in. But if you have multiple clusters, it has become a lot more difficult. And if you have Kubernetes and then you have a bunch of the other stuff that we have had in our environment, then when you have application dependencies, this is where a solution like this becomes really, I think, particularly powerful.
And I think I will just close, at least for my portion, with I would urge there is a lot of great material coming out of Kubernetes developers, people that are committers, Google engineers, that talk about the notion of how Kubernetes is meant to manage this kind of application workload. That from the inception of the project, there has always been, from the beginning, this notion that it is going to be focusing on managing resources and resource allocation and workload, and making sure that services that have to be running are running.
Whereas the application and the business dependencies are really intended for a higher level facility, a higher level of abstraction, and certainly ours is one of the solutions that are available, but there are several others. And so I think that that is kind of a clear indication that if you are looking at Kubernetes in your environment and you are thinking about production for business applications, which eventually I think is the goal, and I think eventually everybody will be running Kubernetes in some manifestation. I think as evidenced by the fact that not only can you install it on your own, but pretty much every appliance now is providing Kubernetes, every cloud provider, if you are running OpenShift. Kubernetes seems destined to be the de facto standard for what distributed computing and data centers look like in the future.
When you have -- I guess when is the right word -- application workloads and you need to manage them from an application perspective, then Kubernetes gives you that great foundation. But there is a need for a higher level of business application abstraction that gives you operational capabilities and this kind of cross-cutting visualization and management layer that lets you look out across your entire estate regardless of what it is composed of.
So I will stop for a moment here, see if there are any questions, but I think I am very close to running out of time. Any questions on anything I talked about then?
I think thank you very much for your time.