Have Your Cake and Eat It: VMWare IT Adoption of PKS
VMware IT has been on the journey with containers for more than two years with over 4,000 containers running. Having gone through the various container orchestration framework, from hand-rolling ones to proprietary ones, the team found that orchestration was limiting the ability to broadly adopt the container technology. With Kubernetes established as the de facto container orchestration platform standard, VMware IT adopted it as the go-forward container platform.
In this session, you will discover how VMware IT super-charged the container adoption with all the benefits Kubernetes brings to the container workload by deploying Pivotal Container Service (PKS) with VMware NSX-T integration instead of vanilla Kubernetes.
You can have your cake and eat it too.
Eric Rong leads the strategy and architecture of the next-gen Cloud Native Platform and application transformation for VMware Business IT. He has worked on architecture for various business areas within VMware IT to support Marketing, Sales to Support. Prior to VMware Eric worked at Franklin Templeton Investment as technical consultant in web technology. Eric has 20 years of industry experience in complex IT application delivery.
Chapters
Full transcript
The complete talk, organized by section.
Eric Rong
Today, my talk is to share with you VMware IT's journey on cloud-native, container-based applications, and how and why we chose to use PKS, and how we deploy PKS. Hopefully our experience can be some help to you.
In terms of agenda, we will do brief introductions, which we already did. I will share VMware IT's cloud-native journey. We will talk about VMware IT's PKS deployment, and then we will talk about the takeaways.
A little bit about VMware IT: my group is called the Business IT group in VMware. Our group is responsible for all of the applications that VMware uses to run its business. That includes the internal intranet, people's HR systems, all the way to the portals and the services that customers access. We support all of them. We have about 400-plus applications in our landscape. We have about 10 million-plus lines of code that we have written ourselves on top of it.
Because VMware is already a 20-year-old, $8 billion company, we have a very heterogeneous environment with a mix of packaged applications. We have packaged applications, SaaS applications, and custom applications. Pretty much any big-name application you can think of, we have it in one form or another. It is a very diverse portfolio, and we actually manage a lot of things in the group.
One key challenge with this heterogeneous environment is that making changes becomes more and more difficult as you bring more and more applications together and integrate them. The speed of change becomes slower and slower. At the same time, the business is asking us to do things faster, as fast as we possibly can and as cheaply as we possibly can. To meet those challenges, we went on this journey to start looking at how we develop our own applications, how we make changes to packaged applications, and what the best way is to achieve the velocity and agility that the business is looking for.
Our journey truly started toward the end of 2014 when we first started looking at containers. At that time, Docker had just become popular. Our initial motivation for looking at Docker was quite simple. In our traditional process, we handed off our developed code to operations. We did those once-a-month release parties: everybody came on Friday, sat together, and did the release. The release could last from 12 hours to 24 hours, depending on what was happening.
The biggest challenge we found in that process was the handoff between dev and ops. Developers do write release notes, but developers are lazy people. I come from development; I am lazy too, so I do not like to write those things. Our ops friends are busy. They have to run systems, monitor systems, fix systems that break, and release new things, so they do not have time to read the release notes either. What you see is that on the weekend, everybody shows up Friday night and everybody starts pointing fingers at each other: you did not read my document; you did not write it in the document.
There must be a better way of doing that. When we looked at Docker, this seemed to make sense. Instead of writing my release notes, I write everything into the Docker container. I hand that off to operations. Operations has one way of dealing with whatever container somebody hands to it: just make it run. It is great.
We picked a simple LAMP-based blog system, put that into a container, and tried it out. Both the development and ops people loved it because it simplified the handover. You just give them a container and they run it. We actually pushed that container all the way to production.
Then we showed our learnings and presented our findings to our executives. Like any good executives, they challenged their team. The next thing our executives asked us to do was to go find the most complex, most difficult application and try to containerize that. In the beginning of 2015, we started on that journey. The application we picked was our customer support portal, the portal that serves all VMware customers' needs after the sale.
As you can imagine with our heterogeneous environment, that application had tentacles into about 60% of the applications on the back end that run the company in one form or another. We started from the web layer and then went layer by layer down, trying to package them into containers. The web layer was easy. The app layer was a little bit difficult. By the time we got to the integration layer, we were looking at about 36 different images. If you ran them at full scale, that was about 250 containers, and manually running them was just not an option. The ops guy looked at it and said, I am not going to run 250 containers by hand.
We also learned that because those legacy applications require, for example, clustering, shared sessions, replication, and all of those things, Docker and containers do not natively make it conducive to do. It is very difficult, and there are a lot of manual steps you have to do to make it work. In the end, the efficiency gain from legacy applications really did not warrant the effort that goes into trying to shoehorn those into containers. We decided to abandon the shift-and-lift approach to those complex applications because it was just not worth the effort.
But we did realize the value of containers. We allowed the teams that do greenfield application development to continue to use containers, and we continued to run them on VMs. We sourced a third-party tool to orchestrate small sets of containers together, and that actually worked out quite well. At that time Docker Compose was not a thing, Kubernetes was not a thing, and there was not really a whole, very open, standard-based orchestration framework for containers. That is why we did it that way.
When it came to 2016, we really looked at our journey, looked at our priorities, and looked at where we were. We said we should not be on a random journey because we needed to make changes. We decided to move forward by adopting cloud-native applications and start developing applications truly in microservice fashion. For those legacy applications, we decided to leave them alone for the time being, and when we mastered the skill of developing microservices, we would refactor them into microservices and make them run in containers.
As we were building out those things, we still had not solved the platform problem: how do we get those containers running, orchestrated away? In 2016, when we looked at the market, we could not find a mature tool that dealt with Docker containers. But we did find that Pivotal Cloud Foundry gave us quite a good platform to run those containers. The reason is that we are primarily a Java shop. Most of our applications are developed in Java using Spring frameworks. Pivotal Cloud Foundry seemed to be a logical place to run that. That is why we started with a Pivotal dojo with Pivotal, deployed Cloud Foundry, and got the team running on the platform. In 2017, we rolled the platform out to a much larger team. Right now, between non-production and production environments, we have about 2,000 to 2,500 containers running on that platform with all kinds of microservices that serve internal, external, and partner integration needs.
As we went through this journey, Pivotal Cloud Foundry was great. It solved a problem, but it also introduced a different set of problems because Pivotal Cloud Foundry is a very opinionated platform. You have to do things exactly the way it is. If your application is slightly wonky and different and cannot fit neatly into that, then shoehorning it in is actually quite difficult. Even though Cloud Foundry gives us this very nice service broker, BOSH-managed service construct, the availability of a service is a challenge on the platform. Packaging our own systems into services is quite difficult too. We packaged a MariaDB cross-data-center cluster, but just packaging that service took us three months to make it work consistently and solidly. We were not going to embark on doing that for every single possible data service that we were going to use because it was just not a viable option.
Also, as we moved to refactor the legacy applications, certain legacy applications required access to shared storage. There are applications that require access to actual file systems and all that stuff, which Pivotal is not able to do. We needed to look at alternatives. In 2018, Kubernetes really took off. It is basically now the de facto orchestration platform for containers. Once we looked at Kubernetes, it solved the majority of problems we had with PCF and allowed us to move a lot of workloads that did not fit natively into PCF into Kubernetes, into containers.
But as we looked at Kubernetes, it introduced its own overhead. Coming from the BOSH-managed Pivotal Cloud Foundry world, manually rolling and managing a Kubernetes cluster seemed way too complicated. There are too many moving pieces, and they have to be manually orchestrated in a consistent way. We did set up plain vanilla Kubernetes clusters ourselves. It took a while, there was a lot of trial and error in it, and making it run at large scale was quite difficult. That is why we did not go with that and started looking for better alternatives.
That was when Pivotal Container Service became an officially declared product. We looked at the features it provided. It fit into our ecosystem nicely and gave us a lot of things we were looking for.
What were we looking for from a Kubernetes offering? Kubernetes takes care of the containers; who is taking care of Kubernetes? In the Cloud Foundry world, PCF takes care of the containers and BOSH takes care of Cloud Foundry itself. It provides a very neat day-one, day-two provisioning and operation model that is very consistent and easy for people to understand and practice. We wanted the same kind of operation model for Kubernetes as well.
From a network topology perspective, because VMware IT is a very heterogeneous environment, we needed a network plane that was not just native to the containers, but could stretch across legacy applications and traditional workloads as well. Traditionally, you use NAT from the container to the host and then to the physical network. The challenge with that is that you obscure the container workload and container traffic. You cannot really tell if the traffic is coming from the container, or which container. When you look at it from the host and apply network policy at that level, it becomes very difficult and tedious to manage.
From a storage perspective, we needed to be able to support persistent volumes into containers. We did not want a bandage solution. We wanted a storage solution that is consistent with our storage policies and the way we provision or run storage in our ecosystem. We also wanted to make sure that we took security into consideration. We wanted a way to put policy-driven control over what container developers can push into the systems. We did not want any developer to be able to pull a random container off the internet and run it.
We also wanted to make sure that operationally, it fit neatly into our support infrastructure: how we monitor our systems, how we do logging. We wanted to make sure Kubernetes could fit into that neatly. Last but not least, we wanted to make sure our developers could get access to Kubernetes through self-service capabilities, because we did not want to be there handling tickets and answering phone calls, or seeing Slack requests for a cluster.
That is where PKS solved a lot of these things that we were looking for. This is the PKS architecture. It basically sits on top of our vSphere-based infrastructure with NSX-T as a network layer. On top of that sits the BOSH-managed Kubernetes cluster, and it is integrated with Harbor, VMware's open source image registry, which I think had just been accepted into the Cloud Native Computing Foundation as an official project for Kubernetes. Along with that, it provides a lot of out-of-box integration with tools we use. For example, we use vRealize Operations to monitor the infrastructure. We use Wavefront and Log Insight to monitor our applications, and it fits neatly into that.
NSX-T truly helps us. Our deployment is slated into the diagram. We have separate networks and subnets for our management network for the whole infrastructure, and then the administration network for Kubernetes itself. Each Kubernetes cluster that gets provisioned gets its own subnet as well. They are network-segregated so that we can ensure unauthorized access cannot go across and in between the pods or clusters. We can also ensure that the workload cannot get access to the administration network and the management network.
From an ingress and egress perspective, we use the NSX Edge, which serves as an ingress point and also as a load balancer, firewall all in one. We can define firewall policy at that level as well. One neat feature in the PKS with NSX-T integration is that when we push a workload into Kubernetes, an NSX-T virtual load balancer is automatically created for that workload. It keeps track of the workload so that we do not have to manage load balancing separately from a network perspective. In the NSX-T landscape, every pod actually gets a real IP address that is visible to the network, so when we define firewall policies, we can truly see which container is talking to which container and what container has access to what. We can define firewall policy at that level of granularity to control the traffic.
From a storage perspective, we are a heavy user of vSAN. vSAN gives us a way to virtualize the storage and provide different tiers and policy-driven storage provisioning. In our PKS deployment, we use the local drive in combination with network-attached storage as well, depending on what data and what performance characteristics the user is looking for. vSAN gives us the right storage to service that.
On top of that, the PKS product integrates the open source Hatchway project, which provides a storage plugin into the underlying physical infrastructure and allows us to do things like persistent volumes. In the PKS world, when we declare a persistent volume attached to a container, that persistent volume is created as a VMDK file on the storage. From an operations perspective, these files, even though they are persistent storage volumes, are no different from any other VMDK files that they already manage today. Their backup, restore, and existing capabilities can be leveraged looking at the VMDK file. We can specify policy to those persistent volumes in terms of what kind of backup/restore needs to be applied or what kind of performance they need to guarantee. All that can be managed in a centralized way.
We also use Harbor heavily. The reason we use Harbor rather than the open Docker registry is, first, because it is integrated. Second, Harbor provides RBAC-based access control, which the open source Docker registry does not have. In enterprise, you really need that. You need to make sure the right people have access to update and change the right Docker images and pod scripts in the registry. Harbor is also integrated with Notary and Clair, which gives us a way to policy-define what can be run in our clusters. In our build pipelines, we use Clair to scan images to make sure there is no vulnerability in them. Then we use Notary to sign the image. Our Kubernetes clusters are configured to not run any container that is not signed by the registry. That is how we control what is allowed to run.
In VMware IT, we create our own golden base images based on public golden images, harden them, and add the VMware-mandated daemons and agent processes into them. Then we build the rest from that. We make sure that every image that we run in our cluster is built based on that, so that we can ensure the security and integrity of the containers from that perspective.
From an operations perspective, in VMware we are a heavy user of Log Insight, which provides log ingestion, log analytics, alerting, reporting, and all of that for us. PKS is integrated out of the box with Log Insight, but it can be fairly easily integrated into any logging system that supports a syslog endpoint because internally they utilize Fluentd for that. All the log shipping is done through Fluentd and pushed to whatever logging system you choose. In our case, we choose Log Insight because we use that to monitor the rest of our infrastructure as well, so it fits neatly and allows us to look at the log from end to end. Today, the majority of our workload still runs in the legacy stack, which still is not fully integrated into containers. It also automatically populates things like cluster tag IDs, pod IDs, and namespace IDs, so it is very easy to look at the log and figure out where the problem came from.
From a monitoring perspective, Kubernetes out of the box basically uses Heapster and Telegraf, and we use that to push to Wavefront using the Wavefront proxy integrated into PKS. Wavefront provides a very good out-of-box Kubernetes monitoring dashboard, which gives you a 360-degree view of your Kubernetes deployment. We use Wavefront because we use Wavefront to monitor our PCF platform and our applications as well. All the metrics from our whole ecosystem go in there. The benefit of having a single monitoring tool where all these things go together is that it is easier to troubleshoot, correlate stats and data points, and look at trends and system behaviors.
We deploy our platform in two different data centers in active-active fashion. The reason we do it this way is because we want to provide higher availability to our developers. In each data center, we only guarantee 99% to 99.5% availability from an infrastructure perspective. With this setup, we expect that applications will be able to achieve four-nines to five-nines availability. But applications bear the responsibility to bridge the last mile by correctly routing their traffic, correctly monitoring themselves, and correctly sending signals to the infrastructure that they are in trouble, so the infrastructure can help do automatic failovers and reroute traffic to different data centers.
Takeaways: why is Pivotal Container Service different from other plain vanilla Kubernetes clusters, and why is it good for enterprises to use it? One is that PKS is committed to provide constant mainline compatibility. What that means is that you will get the latest feature that Kubernetes brings on board in no more than 30 days. Second, automated provisioning, scaling, and patching through BOSH is very valuable from an operations perspective. It gives your operations team peace of mind that whenever they run the deployment, it will be consistent and repeatable, and all these best practices from an operating perspective are already ingrained into BOSH itself, so you do not have to reinvent them.
The NSX-T integration obviously simplifies network and traffic management greatly, and allows us to create a much flatter and more secure network to provide application access to all the components that need access. From a storage perspective, vSAN gives us a storage solution that can work across both container workloads and traditional workloads, giving us much more efficiency and better utilization of storage. On top of that, it ships with Harbor, which gives you a secure image store where you can manage your image blueprints very securely and deploy them securely. Out-of-box integration with Log Insight and Wavefront is icing on the cake to make your operations even better. That is the talk. The title of this talk is Have Your Cake and Eat It.
Q&A
Question: VMware and Pivotal are both part of Dell now. Did that influence your decision toward PKS at all, or was it strictly based off of requirements?
Eric Rong: I would say it is about 80/20. Yes, we all are part of the Dell family, but you would be surprised how much competition there is between VMware and Pivotal. We are encouraged to explore the products from our sister companies, but we are not mandated to use them. I would say Pivotal or Dell are the first places we go look for solutions, but the solution has to meet our requirements, fit into VMware's ecosystem, and ultimately benefit VMware. If it does not, it is not helping. I am a VMware stockholder, not a Pivotal stockholder. Making Pivotal stock go up is not helping me. From that perspective, we evaluate them on a common scale with other solutions. For example, right now we have a very big conversation with Pivotal going on because Pivotal contains a Cloud Foundry service as well, but the cost of it is ridiculous. It is very expensive. It helps you go fast and go big, but the faster you go and the bigger you go, the price tag just goes up with it. If the cost is not going to benefit VMware, we will get rid of it and go back to Kubernetes.
Question: I know VMware and Pivotal, but one of your slides said something about what this says about VMware.
Eric Rong: To be honest, I have some of those conversations with Pivotal people, but I do not have a complete preview of what their plans are. If you ask me my view, I think what Pivotal Cloud Foundry should do is port that experience over onto Kubernetes: basically replace Diego with Kubernetes and keep the rest of the Go Routers and service brokers. You actually see that happening. If you look at the Cloud Native Computing Foundation, they start adopting service brokers, buildpacks, and all these contracts introduced by Cloud Foundry. I would not be surprised if, in a year and a half or two years, Cloud Foundry is just going to be an experience layer deployed on top of Kubernetes, just like Istio is. That is how I think it will happen. It may or may not; I do not know. It will take some time because there are still some unique capabilities like Diego provides that I think they are waiting on Kubernetes to catch up with.
Question: During the integration with Pivotal services, such as storage, and as more capabilities are in the pipeline, is that a direct result of the work that you are doing?
Eric Rong: Yeah, a lot. One of the biggest missions of VMware IT, besides keeping the company running and making money, is to be customer zero for all products. That started about three or four years ago. Generally, when a product gets to beta stage, VMware IT gets a bit of it and we start running experimental workloads on it. Generally, we are in production with the actual product about two months before it becomes GA. We actually run our real production workloads on it. We do that because we want to validate our product in real-world scenarios. Our assumption is that if that product works for VMware IT, it probably works for 95% of the companies and customers on the planet, because VMware is very typical of a large enterprise. In the case of PKS, we were actually very, very early adopters. We started a conversation while the product was just in concept. We got the first bit and started running it. A lot of the things that you see in the latest 1.2 release directly come from the work that we are doing and the parts that we asked them for that were missing for us.
Question: You mentioned earlier that after you built a proof of concept, you tried to containerize some legacy applications and had to abandon it because of cost. Can you talk more about the challenges?
Eric Rong: Sure. It is a very interesting set of challenges. Our legacy applications are traditional multi-tier applications: you have a web tier, application tier, integration tier, database, ERP system, and whatever packaged application sits underneath. A couple of challenges are in it. One is that our legacy applications run in things like Tomcat and WebLogic, and they rely on clustering capability of those systems. Those clustering capabilities depend on multicast and network constructs that containers do not natively give you. Once you put them into containers, you literally cut that out. Is there a solution? Yes, but that requires me to make changes to the legacy applications. I can externalize all these sessions to an external part, but then that is an external dependency I have to introduce. That is another set of containers I have to manage.
The other thing is that in our workload, because of the way they are integrated, the biggest challenge is that they have dependencies. One service cannot come up until the other one has come up. Those dependencies start with a simple A to B and gradually become a mishmash of spaghetti startup sequences. To orchestrate that startup sequence in a container world is very difficult. As far as I know, even today, the only way we can do that is manually. There is no reliable way. The way we solve that is by writing custom scripts in the dependent container to the container it depends on. When the container first starts, the script pings the other containers to see if they are up or not. Unless that one is up, it will not try to start itself. But one, that is very custom and proprietary; second, it is not very reliable. There are a lot of those challenges you will find as you take on bigger-scale, complex legacy applications. If you have a traditional simple legacy application, like the sample people always like to give, an Apache server with a Tomcat server and a MySQL database underneath it, no problem. It is easy. You can stick that into a pod or Docker Compose file and it will work just fine. But if your application involves a certain degree of complexity, it becomes very challenging.
Question: Is the Kubernetes API helping with that?
Eric Rong: No, it does not. That is why we did not try to shift-and-lift those applications. We said we will leave them where they are, and the approach to them is what I call choke and release. Basically, we take a component of the legacy application. We first rewrite the interface. We take a piece of the legacy application as a module and rewrite it into microservices. Then we take that module that is still inside the legacy application, take out all the back-end code, and replace that with a proxy to the microservices. Then we go slice by slice. Eventually, our hope is that we slice enough that there is nothing left there.
Question: Basically refactoring those sorts of applications?
Eric Rong: Yes, into microservices with managed dependencies, with clear controlled dependencies. All these nice things said about containers, at the end of the day, what you realize is your agility comes from how you manage your dependencies. You have to be very thoughtful about what dependencies you pick, what dependencies you want to have, and what dependencies you do not want to have. I think that is it. Thank you.