What Tech Leaders Must Know About Microservices

Log in to watch

Las Vegas 2019

Download slides

What Tech Leaders Must Know About Microservices

Leslie Chapman

Distinguished Engineer · Comcast

Michael Winslow

Director · Comcast

If your team practices DevOps, it's only a matter of time before they say "We want to move to microservices".

How do we, the tech leaders, arm ourselves with enough knowledge to make good decisions on the move to microservices. Many times engineers will emphatically push for this move because microservices represent the bleeding edge. They do not always assess the value of this change in the same way we do.

Using our real life experiences at Comcast, we will answer the following questions to help tech leaders make good decisions:

1. What are microservices, exactly?

2. What are the main reasons to move to microservices?

3. What are reasons to NOT move to microservices?

4. What maintenance considerations should my team prepare for?

5. Where are the hidden costs that come with microservices?

At the end of this talk, you will have an appropriate level of understanding of microservices and will be prepared to make informed decisions with your team.

Leslie Chapman is a Distinguished Engineer and Architect for Comcast's X1 platform. She has a passion for inspiring young women to explore STEM fields. When she's not busy designing the future of television, she is at home taking care of her 2 cats.

Michael Winslow picked up his love for programming when he was 10 years old writing GW-Basic code on his Tandy-1000. With his passion for designing simple solutions to complex problems, Michael has played key roles at companies like Aramark, Ortho-McNeil, Oracle and Xfinity Mobile. He is currently a DevOps advocate, Agile enthusiast, and dedicated people-leader for the Core Applications group at Comcast.

Chapters

Full transcript

The complete talk, organized by section.

Leslie Chapman and Michael Winslow

Leslie Chapman: Thank you all so much for being here and learning about what tech leaders need to know about microservices. Microservices. Fantastic.

My name is Leslie Chapman. I am a Distinguished Engineer at Comcast. What does that mean? It means I get to write code, and I don't have to worry about much other than that, so it is kind of like the best job in the entire universe.

This story that we are going to walk you through, and this explanation that we are going to walk you through, is super near and dear to my heart because we are going through this right now on my team. So get ready.

Michael Winslow: All right, and I am Michael Winslow. Just as a homage to Philadelphia, I am going to do the classic DJ Jazzy Jeff and Fresh Prince stance and say, "She's the coder, I'm the manager."

I am a Director at Comcast. I am what is called "used to be a coder." My fun fact is I still code, just for management tasks. I try to do it in a way that my leader does not realize that it is a coded report that he gets. He is just like, "Wow, you're just so detail-oriented in your report." I am like, "Yeah, it's generated."

Feel free to keep in touch with us on Twitter or LinkedIn. I think we will get started.

Michael Winslow: Tell the nice people about Comcast.

Leslie Chapman: Comcast is in Philadelphia, right downtown. We work in a sparkly new building called the Comcast Technology Center, and it is full of a ton of nerds. We have a gym and a cafeteria, which is super fun.

We build amazing products that millions of people use every night when they go home and relax. While a lot of people might think, "It's just television," it is not. It is kind of your friend. Who loves their TV?

Michael Winslow: It is the most fulfilling job in the universe.

Leslie Chapman: It is. We love it. We have a huge footprint that you probably do not even know is our footprint. You know about Comcast as a cable provider and as a high-speed data provider. We have a lot of apps you can use to interact with your high-speed data. A lot of people do not know we recently got into the mobile business. We also have Comcast Spotlight, which does advertising for local businesses in the Philadelphia region, and FreeWheel, which is advertising. There is too much going on on this slide to explain it all.

Michael Winslow: How did we expand recently? Who did we buy?

Leslie Chapman: Sky.

Michael Winslow: Kind of a big deal.

Leslie Chapman: We have all these networks that people do not know about. We employ engineers far beyond software engineers, including mechanical engineers at our Universal theme parks. One thing people do not really realize about Comcast is we are a technology company. We are not a cable company, so let's dispel that myth.

Michael Winslow: Absolutely. In true DevOps form, we do not like this chart for one main reason. This is what we want the chart to look more like: all those lines disappearing. There should be no difference. We should be able to talk across all those lines, and we work every day to make sure those lines get blurred and eventually go away.

Leslie Chapman: I am going to hand it over to him. He is the DevOps guy. I am just here for the sizzle.

Michael Winslow: I am not sure how I got that moniker at Comcast, but recently I have been known as the DevOps guy all the time. I have been coming to DevOps meetups and events for years. Last year was my first time at DevOps Enterprise Summit, and now I am completely honored to be up here. I am a huge DevOps fan.

I always thought the name DevOps was the one thing I did not like because it makes you think it is just about dev and ops, and it is so much more than that. What I like is the acronym John Willis and Damon Edwards came up with, CAMS. They have expanded it to CALMS recently with Jez Humble, but I like to keep things simple and I do not take on new letters unless I have to. It is culture, automation, measurement, which so many people forget about, and sharing. We want to share between companies. CAMS describes my thought of DevOps much more than the word DevOps itself.

You are here for microservices. We are going to go through some fun slides. A little background on how I originally came up with this deck, and then we worked on it together: I started as a Principal Engineer at Xfinity Mobile. One day the team decided to move to microservices. We grew fast, and I ended up taking a leadership role.

When I was in that leadership role, I talked to one of my upstream leaders and said, "By the way, for the last two months on our timesheets, we've been putting microservices. Do you know what microservices are?" He said, "No." I said, "What do you do if your boss asks you what this microservices thing is?" He said, "I just say the engineers told me we had to do it."

Leslie Chapman: That does not sound good.

Michael Winslow: It does not sound great. This whole deck started as a way to teach my leader what microservices are. I had two unique perspectives because of the leadership role: I thought microservices would be cool because it is cool tech, but I also wanted it to prove out its business value. When we made this change, things in the end worked out well, but it was not always sunshine.

Leslie Chapman: This is the part I love. We get to make a choice. Would you rather hear about the lessons we learned along the way moving to microservices, or do you just want to hear about how great the team was? Red pill or blue pill?

Michael Winslow: Sorry, this is my presentation, and there is no way I would come up here and not talk about what a great team we had putting together microservices. I promise we will keep it short. It was such a great, amazing team, and I want to point out the diversity of the team. I talk about all kinds of diversity.

Leslie Chapman: Michael, I am old.

Michael Winslow: You are old, and you are actually not in there, so sorry about that.

But seriously, microservices.

Leslie Chapman: I do not know what that means.

Michael Winslow: A quote I like to use comes from my uncle and one of my first mentors: "A true seasoned professional should be able to speak for five minutes intelligently about any subject." I have carried that with me all my life. When someone talks about a subject I have never heard about before, I do not decide, "I don't know anything about it. I'm not interested." I want to get my five minutes in and figure out how I can speak intelligently about it. That is how I felt when I wanted to make this deck for my leader who did not know about microservices.

Leslie Chapman: By the way, I am playing the role of the leader who does not know. I actually do know what a microservice is, but for the purposes of this talk, I do not. Michael, why do we need to do this? What is this going to bring to our business? What is the difference between a monolith and a microservice?

Michael Winslow: Glad you asked. Monolith versus microservice: these are the words you always hear. Many of you know this already, but I am going to do it for the couple of people who will not admit that they do not know it.

We used to call this thing just a service. When microservices came around, this thing that was just a normal service before was given the moniker "the monolith." A monolith takes in a call from an HTTP request. It comes to a controller. The controller calls a service with the business logic in it. That service calls a repository of some kind, and that repository normally goes to a database, but sometimes may go to other services.

In your controller you define all kinds of endpoints: eligibility, user, and device. If you are writing code systematically, you will have matching business logic for those and possibly exact matching repository layers, which match particular tables in your database. There is variation, but this is a good example of a standard monolith.

A microservice does not mean it is physically smaller than the monolith you started with. Many times, because of boilerplate code, the file size of a microservice can be larger than the monolith you started with. That was definitely the case for us.

HTTP requests come into a microservice, and instead of hitting a controller that has all endpoints, it has a very specific controller with one endpoint, such as device. Then it looks similar as it goes down and has its own data store. We would have two more microservices for eligibility and user.

Leslie Chapman: I think what you are saying is we have all these people on this team, and we can break them off and have them work on microservices.

Michael Winslow: You are getting it. This might be a reason to move to microservices. If you have a great number of developers and they keep stepping on each other's feet, you might break those teams up and give them independent projects to work on.

But here is a challenge: in the monolith, one HTTP endpoint could go to several different endpoints. In microservices you now have three unique services, and when you are migrating users over, you cannot tell them, "Instead of calling one endpoint, call three." You need a traffic cop. You need to add a gateway. We used Netflix Zuul as a gateway at the time. AWS API Gateway is the equivalent of what you would build yourself with Netflix Zuul.

Leslie Chapman: Is this going to scale? Our monolith is working. Why are we messing with it? Are we going to deliver on the same cadence? Is this going to take longer?

Michael Winslow: If the eligibility microservice starts getting traffic that outpaces the rest, you can add two more eligibility microservices. Then you need to discover that new microservices are there. In the Netflix stack, you use Eureka for this. On Amazon, ECS and other services handle service discovery.

Once those services are discovered, the gateway can register them and start sending traffic. Netflix Ribbon is the ability to balance eligibility calls between the three eligibility microservices. In Amazon, you use Elastic Load Balancer.

Leslie Chapman: What happens when traffic gets high? How do we load balance this? What happens if one of those nodes goes down?

Michael Winslow: Fault tolerance and the circuit breaker pattern. If the database for one eligibility microservice fails and causes the microservice to fail, you might retry once, twice, three times, and then say, "I want this out of my mix." Hystrix handles that on the Netflix stack. It is graceful fault tolerance. It takes that eligibility service out of the rotation. If you are hand-rolling your own, you may need to start another microservice yourself. In AWS, it can be handled for you.

Leslie Chapman: This is sounding really great. We can be super fault-tolerant, take this big team, assign them to microservices, and not take a hit to productivity. Where is the but?

Michael Winslow: Great breakdown. Now we should let the audience decide: would you rather have us tell you what our team learned, or have us tell you what our team learned?

If you start your microservices journey, you may find snarky comments on the internet. One flowchart asks, "Are you Netflix?" If no, then no. That was a joke then and is still kind of a joke, but there are great use cases to move to microservices. A lot of people make a mistake when they jump in too soon.

I want to quote a friend of mine, Ryan Emerle, a developer on Xfinity Mobile: "Complexity begets complexity. Soon tools are needed to manage the complexity, then tools are needed to manage the tools."

Leslie Chapman: I think our Splunk costs went up. How are we going to figure this out?

Michael Winslow: Maybe we can figure out how that happened with microservices. Suppose we need to call our API to determine if a particular user is eligible to purchase a device. In a monolith, every time we cross a process boundary, we make a syslog entry. The HTTP request comes into the eligibility service, goes to the service layer, calls the user service, and goes to the database. Then it gets device information. Let's say we have three process-boundary jumps and three syslog entries.

On the microservices side, there are many more process boundaries. We come into the API gateway, call eligibility, eligibility calls user through the gateway, then it goes to the database, then device, then device database. For the same thing, a microservice caused six syslog entries to three. Increased process boundaries also create more surface area for possible security threats.

Leslie Chapman: I was feeling great about this earlier, but now you are making me worried. How can we make sure it is worth it?

Michael Winslow: Calm down, boss. We are going to be fine. But we are not done with the logs conversation yet.

Let's talk about application logs and how we used to troubleshoot in production. In the monolith scenario, if eligibility tries to go to the user database and fails, the stack trace tells us the username was null. We can climb the stack: user repository, user service, eligibility service, controller. With one app log entry, we can find the path of the error because the stack trace has visibility of everything inside the monolith.

In microservices, the stack trace only has visibility for the current microservice. It does not know where the problem originated, so all context from the previous microservice is lost.

Leslie Chapman: My developers need to be able to look at a full stack trace so that we can debug issues.

Michael Winslow: Google wrote a paper called Dapper that foresaw this problem with distributed systems. Twitter came out with Zipkin. Comcast came out with Money. It is open source and works well for us. OpenTelemetry is probably going to be the new hotness for distributed tracing.

We used to use distributed tracing to track transactions across applications. Now we need to track transactions within a single application because everything is distributed now.

How do we track application problems in a microservices world? Something comes into the API gateway and gets a correlation ID. I would not recommend you start with the number one in production. When it goes to the eligibility endpoint, you log that correlation ID. Every time you log something, you put that correlation ID in there. Then if something goes wrong and the final stack trace does not know anything about the previous path, you can use the correlation to draw out what happened the whole time.

Where you had one app log before with a single stack trace, now you have seven log entries to get the same information. Over time, as a conservative estimate, you could have four times as much log file size if you are not careful when you move to microservices.

What you used to do for one in the monolith, you now have to do for many. I had a real spreadsheet for that same boss. When we had one monolith, we had greens across the board. When we started moving to microservices, we were so enthralled that we were not checking all the boxes that needed to get done.

Code review and continuous integration with automated testing were one area. In the beginning, we were just dropping WAR files and JAR files onto the server. Continuous deployment to all environments with automated testing was another. We said, "We'll get to that later." Firewalls and observability are huge because before you could talk inside one service; now you might have a firewall problem talking to another microservice. Security scans also had to be set up.

My suggestions: do not create all your microservices at once. Scott Proulx mentioned the Strangler pattern yesterday; use it to move from the monolith to microservices one at a time. Monitor usage to find your most utilized endpoints and prioritize migration that way. Have a template for your microservices so every developer does not start from scratch every time.

The change management team wants release notes. If you still have a change management team, this is important. Things got so crazy that we automated release notes. When we checked code in, the report laid out every microservice, linked to tests, linked to the Jenkins job that deployed it, showed Veracode security scan scores, and showed our rollback method. Those were all things the change control board needed before rollout. Some people would say, "Just get rid of the change control board." We satisfied them with an automated report.

Version control and compatibility can be a nightmare. A release is no longer completely tied to the version of a binary file. A release is now an aggregation of all your microservice versions, and each microservice version can move independently. It is tough to track over time. We ran into this when we sold code to Charter, and Charter wanted the version of the code from November of the previous year. I said, "Good luck." We had to find what code was checked in and what version each piece was. It was not easy.

Burnout is real. We learned this lesson. We were under tight deadlines at Xfinity Mobile, had not yet released to production, and at the same time decided to go from a monolith to microservices. The World Health Organization recently said this is an actual condition. Try not to overwork your people.

Leslie Chapman: I would like to add a couple things to drive it home. One great thing about moving to microservices is how highly unit-testable and automated-testable they are because they separate the front end from the business logic. That can get you to speedier deployments because you can depend on the unit tests people write as they write these microservices to make sure they go out with quality.

Microservices also enable people, again through the separation of front end from business logic, to focus on a core area of expertise. It is important to think about microservices as something that employs business logic. If you have a data service that gives you every piece of data you need and you do not need to munge it or do a union on it, perhaps that is not a great rationale for a microservice. But when you are hitting multiple data services and having to employ your own business logic on top, that is the sweet spot for a microservice.

Michael Winslow: Agreed. In 20 seconds: do have a reason to move to microservices, maybe resilience or cost optimization. Do not practice resume-driven development. Do have a plan for monitoring, telemetry, and distributed tracing. Do not count on traditional logging techniques. Do peel off one microservice at a time; try the Strangler application pattern. Do not refactor everything at once. Do understand that all teams are impacted: QA, release management, security. You are not in a vacuum. Do not say, "We're doing microservices, YOLO." Do staff appropriately and monitor employee health. Do not ignore the human impact of your decision. Do encourage DevOps teams to pursue microservices if there are benefits, and do not underestimate the time, effort, and money involved. Thank you, everybody.