The Path to Scaling a Digital Enterprise with EssentiallySports and Vercel
DevOps in enterprise businesses requires collaboration and agility in today's fast-paced tech landscape. But how can enterprise companies ensure they have the tools and processes needed to build and iterate efficiently, without sacrificing quality or stability? This fireside chat with EssentiallySports and Vercel will share insights on the technical foundations that are necessary for building highly efficient enterprise-level teams and successfully navigated the path to digital transformation.
Presented by Vercel.
Chapters
Full transcript
The complete talk, organized by section.
Malte Ubl and Mayank Badola
Malte Ubl: Hi everyone. My name is Malte. I am the CTO for Vercel. We are the provider of the Frontend Cloud and the creators and maintainers of Next.js. We work with customers such as Under Armour, Nintendo, Zapier, and many other of the largest enterprise brands in the world, getting their web experiences to the best and most stable that they can be. Today I am speaking with Mayank, who works at EssentiallySports, which is a global brand for sports entertainment news, I think. Yes. It has gone through a pretty substantial growth cycle over the last years. Great to have you here. How are you doing today?
Mayank Badola: I am doing amazing. Quite anxious as well. It has definitely been a great journey for us over the last few years, and it is good to be here and talk about all the good things we have done. I would love to share our learnings with everyone here.
Malte Ubl: Awesome. I must admit I am not a sports person, so maybe others are not. What is EssentiallySports? What are you doing?
Mayank Badola: EssentiallySports is a digital brand focused on telling the best stories about America's favorite sports celebrities. That pretty much sums up our vision. In terms of business scale, last year we published over 400 unique content pieces every day and crossed 1 billion page views aggregated over the year.
Malte Ubl: Wow. By the way, before I joined Vercel, which was a year ago, I ran the desktop search team at Google. That sounds like the type of site I would have to deal with every so often.
Mayank Badola: I am glad you do not have that penalty anymore that you just talked to me about.
Malte Ubl: One thing I think is interesting in the realm of operations is that you are an engineer, you are pushing code, and there is this news team who are pushing articles. Does that create any particular challenges in how you operate?
Mayank Badola: To answer that, it goes down to how we evolved as a business. Initially it was just a fun project. We started with three CS engineers from university. That also plays into the fact that the company is very tech savvy. Back then, because it was more like a hobby, we were just running this WordPress service somewhere. Everything was tightly coupled into that single service. We had themes to put the user interface out there and the whole CMS system.
When the website started getting big, the service would crash all the time. There would be firefighting in the night. People would be rebooting the service again and again. I recall somewhere in June or July of 2019, there was one Sunday where we had Wimbledon finals, a cricket World Cup as well, and a British GP. It was a disastrous night because we had that really innocent WordPress service running somewhere trying to do its job, and the engineers just pressing the reboot button.
It was a bad day for the company because a lot of people in the content org worked a lot to make that day a big success for the company, but the technology did not really stand up to that test. That is probably the original story of how we scaled from there to a point where I think this year we scaled from 800 active reader sessions to around 75,000 active reader sessions, and we did not even realize it because we were out having drinks.
Malte Ubl: To clarify, now you are basically Next.js on Vercel going headless against WordPress. Is that the tech stack?
Mayank Badola: Yes, exactly that. We tried to make sure that we could decouple the publishing flow from the reader or audience experience because we want to invest in these two areas independently, and also make sure we do not introduce side effects into these two processes. These two are really separate processes from our point of view. They also give us the necessary agility to be more adventurous when it comes to creating newer experiences for our readers, but at the same time to be more conscious about how we store data, what kind of things we add to our CMS, and how we take care of that data internally.
Malte Ubl: Can you talk a little bit more about the DevOps aspects of your scaling adventure?
Mayank Badola: Of course. Initially, when we had that nightmare of a Sunday, we decided that scalability and availability were definitely things we needed to invest in. We needed a system that could scale on demand because we operate in a domain where virality is a really big factor. We focus on sports celebrities and their happenings in general, so we cannot predict when the loads will go up or down. We needed a system that could scale effortlessly and efficiently.
Over the course of figuring out different technology choices and frameworks that would work for us, we decided to come to Next.js and deploy the front-end infrastructure on Vercel Cloud because it was a fully serverless architecture for us. That meant we would pay for the things we use. We would not have to maintain bulky servers that take the same amount of pricing all the time, every second.
At the same time, the framework gave us a lot of things off the shelf that we would have had to invest a lot of effort into bootstrapping, like easy support for AMP integrations, which was really important for us, and the ability to have server-side rendering so that we can do SEO effectively as we port and decouple our WordPress monolith and separate out the presentation layer.
Malte Ubl: We have customers with a similar need to scale. The Grammys are a good example: they basically get almost no traffic and then a lot of traffic that one day. One of my favorite examples is The Washington Post, who host their election experience on our site. That is one of the highest-pressure things. In many ways, 2016 was a problematic year, and then 2020 went much better for them. The feedback we got from 2020 was that this was the smoothest election experience they ever had. It is a good example of iteration, because it is such a custom app for one night, and it changes every four years because elections are a little bit different, so you have to innovate really quickly.
What was it like to make the decision to go with a serverless stack at your company? What was the process?
Mayank Badola: The first and foremost thing for us was to be able to scale. We wanted to be highly available and able to scale, more so because we did not want our system to have strong latencies to our content and discovery processes. Search engines have a limited compute bandwidth that they allocate for discovery of content. If your website is not performant, you get crawled less, which means your content will not get discovered as soon as you want it to be discovered.
For a business like us, where virality and attention are key and attention is limited, every second counts for the content to be out there for consumption by our readers. First we started by making sure we could get to that point. After that, since we are a bootstrapped company, we started optimizing on our cost overages. There is a really fine line between compute and bandwidth. Depending on your domain and how you deal with things, there is a lot of potential to fine-tune between those two elements and come to a point that serves you from an economical point of view as well. That is what we did.
Malte Ubl: For people who are not that much into SEO, Google can basically take down any site, but it is trying not to. Especially for a large site like yours, you have infinite content that could be crawled. Google crawls your site and tries to do it as fast as possible, but if you start getting slower, it says, oh my God, we have to back off here or we are going to take them down. The faster you respond, the more crawling you get done in the end.
Mayank Badola: The penalty is also way bigger if you are already getting a lot of traffic but you are failing the expectations of the crawlers.
Malte Ubl: In this world of serverless and a Frontend Cloud like Vercel, you are outsourcing the operations of your site to someone else's cloud infrastructure, but you are also running your own backend. In your case it is headless WordPress. How do you think about the connectivity between the serverless front-end stack and your backend, and the security aspects of it?
Mayank Badola: For us, we initially thought we wanted to have more control over our data. That is why we spun up our own VPC environments with a cloud service provider. Apart from that, we did not want our Vercel front end to directly hit our WordPress services. We created an infrastructure pipeline behind the scenes which transforms content whenever there is some sort of publish event logged on WordPress, so that we could completely decouple every API call that would go into rendering the pages and base it off in a way that it is just static content for us. Then we deal with multi-tiered cache invalidation, which is tricky territory of course.
Malte Ubl: This is something we are investing a lot of time in, because cache invalidation is very difficult. One of the magical parts of Vercel is our ISR service, where you can say, I have this piece of content and it changes every so often, but I can tell you when it changes. We guarantee that within a few seconds on the global delivery network, everyone gets a new version. You can build that yourself, but it is not easy.
When we talk about Frontend Cloud, what we mean is that you can go to a backend cloud and get all these primitives and put them together and build very sophisticated systems, but they were meant for generic use cases. If instead you get something that is actually useful for publishing global front-end applications, then the velocity of being able to come to solutions goes up dramatically.
Mayank Badola: I would chime in that this was also one of the ways we decided to go ahead with Next.js and Vercel: the sheer amount of infrastructure investment there is with Vercel Frontend Cloud, and the amount of bootstrapping that Next.js takes away in order to deliver a performant web experience. To do something in-house would take a lot of specialization and a lot of skill. For us, we had to choose between the cost of ownership and the cost of delegation. For a very big amount of time since the start of our relationship with Vercel, that decision has been really great for us.
Malte Ubl: Coming back to connecting your serverless, high-scale front end to the backend, one of the offerings we recently launched is Secure Compute. It has been a game changer for squaring the circle of someone else taking over a high-scale infrastructure but being able to securely tie that to the backend that sophisticated enterprise customers are running. Almost by definition, you have some form of sophisticated infrastructure and you want to make sure the connection is secure and stable.
Mayank Badola: These kinds of new services that Vercel comes up with, I would be really curious to try them out. Sometimes we have use cases where we are using proprietary APIs and we do not want them open on the internet. To tie these two environments together would be a great use case for those situations. I would take it up with my co-founder for sure.
Malte Ubl: Let us talk a little bit about team efficiency, developer experience, and velocity. How is that important for you, and how are you managing it with respect to Vercel?
Mayank Badola: Our engineering team is quite small. In terms of size, we have a content org of around 250 people, and they are powered by an engineering team of four. We do not need a lot of processes in place. We really want to empower everyone, which is four people in the team, to push to master and put it out on production.
The way it works for us is that we get really strong, fast feedback. We have processes in place where we have one-click redeployments. We can roll back to the previous version easily. We have a lot of observability in the system. We keep an eye out for our Core Web Vitals, page experience scores, costs, and compute usage. We have all the tools. We also have real-time visibility into how the content is performing.
Malte Ubl: Do you use our instant rollback feature?
Mayank Badola: Yes, and it works for everyone. You do not need to be an engineer to use it.
Malte Ubl: You can put the RBAC controls in place that you want. One funny thing is that Vercel itself is a Kubernetes app. You can revert changes, but it takes a while, especially if you have web-facing connections and you do not want to break them. Vercel also runs on Vercel, and we take advantage of instant rollback. Not every day; we are not that bad at pushing code. But in the last week it has been utilized a couple of times. It is such a lifesaver.
Let us talk about Core Web Vitals and your success metrics as an engineering organization for web-facing properties.
Mayank Badola: Our pursuit of Core Web Vitals started around the time when we had that big availability problem with our website, because we got to know that the ranking algorithms with Google were going to include these vitals. Back then, when we tested our website, we were at around 30 out of 100. We said, okay, this WordPress theme needs to go away as soon as possible.
That was the start of the search for a great framework which could bootstrap us in a way where we did not have to deal with a lot of boilerplate, but just had to write good, performant code, and the bundling would take care of things. We ended up optimizing everything like LCP and CLS. For CLS, we made sure that we did not bombard our audience with random ads everywhere. We wanted to be aware that our audience comes to our brand to experience news. The focus should always be news. The ads should be well declared: this ad is starting now, and after that the content continues. Not just in-your-face. That also helped with CLS because there is no shift when the page loads on first render.
Malte Ubl: I want to interject there because I think it is an important point of a revolution that is happening today. If folks are here in the media business, the way ads have been working is through third-party scripts where you effectively hand off control to someone else to manage that side of your business, which has many downstream consequences, including being problematic from a privacy point of view.
One of the core reasons it has been like this is because it was convenient. It was so easy to smash that script on your site and hope for the best. But one of the things that comes with Vercel is that because you have a platform that controls the web delivery side, you can integrate things on a first-party layer as easily as it would have been to use the third-party stuff, but you are fully in control of your data.
The way Vercel's analytics product works, for example, is that you only have to import a piece of JavaScript to initialize the tracking. But the entire data flow is fully first-party and fully automatic. That means you know where the data is going, and you are also not subject to ad blocking, which can be extraordinary. We are migrating from an external service to Vercel's own analytics on our own site. We have a developer audience with a higher propensity to use ad blockers, so we see about double the data when migrating into our own pipeline. It inherently cannot be blocked because it is part of the normal requests that operate the websites. If you get twice the data, the amount of time you need to make decisions goes down by half or even more.
Mayank Badola: I totally resonate with that point. Slapping a script on your page is so easy. You are almost tempted to do it every time. But you incur huge penalties when it comes to Core Web Vitals. You do not know how many other scripts that one script will load. You do not have that visibility. The data that gets tracked is also a single perspective, the perspective of the third party that you get when you use these scripts.
That is why for us, we capture first-party data ourselves. Nowadays, data is probably the biggest asset any brand can have, especially when they are scaling or when they want to grow. Having access to your own data and interpreting it in a way that is more aligned with your domain understanding unlocks a lot of opportunities.
Malte Ubl: I know you are cost conscious, sadly for us, but that is awesome. How do you go about cost optimization?
Mayank Badola: Vercel has notifications on overages.
Malte Ubl: I like to call it on-demand users.
Mayank Badola: Sometimes we receive way too much traffic, so I am not complaining about it. We have those notifications that keep us on our toes. In no way do we stop anyone from implementing new features or trying out new things. Our first priority is to get things out there for the readers to try.
While we have oversight on these metrics, we carefully make sure that we know how much compute we calculated, how much compute we need, and how much data bandwidth or network bandwidth we need in order to continue with our growth. We also have alerts set up on our own cloud service providers so that we can have more control over the amount of expenses we are incurring. Because we are bootstrapped and do not have any funding, that is the primary reason why we are extremely cost conscious. That is also why we want to amplify our ROI on every investment we do.
Malte Ubl: One of the most interesting trends in how you pay for compute on our platform is the change in paradigm of the unit of cost for serverless. In traditional Lambda-originated serverless products, the cost is gigabyte-seconds, and that is gross compute. But on a front-end workload, it is very common that you talk to some backend. With traditional serverless products, you pay while you do nothing. You wait for the backend to come back and idle the machine. Because there is no concurrency on these compute services, you pay for waiting.
One thing that is really interesting about our Edge Functions product is that you pay for net CPU. If you wait for IO, if you wait for your backend to return with some kind of call, you only pay for when you actually do processing. That is a good opportunity to substantially lower serverless bills in a meaningful way.
Mayank Badola: Definitely. A lot of these things can also help us reduce the operational overheads we have within our own backend infrastructure that integrates between our headless CMS and the Vercel Cloud setup. I think we are going to save more money soon.
Q&A
Malte Ubl: We are so well on time. Is there maybe an audience question? We have time for one.
Audience member: I am curious: how has moving to Vercel improved your Core Web Vitals?
Malte Ubl: I will repeat it quickly for the audience. The question is about moving to Vercel serverless and how that improved Core Web Vitals.
Mayank Badola: The first thing is the whole bootstrapping process. The way the assets are chunked and bundled. We do not need minute investments to optimize each and every module that we include. That process helps us. Another process that helps us is the ability to quickly roll out changes and see how they behave.
Another thing that helps us with Core Web Vitals is repeat performance. With the usage of edge cache, we speed up repeat visits significantly, and they mean a lot for us because we have viral content. At any given point, if a piece is ranking really well, it has a lot of traffic and the cache stays hot. Incrementally, our averages go down more and more. I do not mean to say the first load is really slow, but the next loads are blazingly fast, as good as an AMP page. Even though we do support AMP, our web pages are as good as AMP.
Malte Ubl: When I was at Google, I rolled out the page experience change. It is interesting to be on the other side. The biggest thing, especially when adopting Next.js, that you can do for CLS in particular and performance in general is adopting next/image. Even easier and more impactful is next/font. It is sadly incredibly difficult to do font loading well on the web, but we have effectively solved it in a way that is actually doable. It is sad that it is 2023 and loading a font on a web page is almost rocket science, but I think we are very close to solving that problem.
Mayank Badola: It is funny because there are so many websites you go to and see the font looks like this, and then in front of you, one second later, it changes everything on the screen.
Malte Ubl: What we do is take your font, extract metrics from it, and generate a specific fallback font for the web font so that when the custom font loads, it replaces in place without pushing everything else around. That is the degree of optimization you have to go through, and the benefit of getting it from a platform versus doing everything like this yourself.
Do you have any parting words for scaling enterprise web businesses?
Mayank Badola: Own your own data as soon as possible.
Malte Ubl: I love that. This is a really good one to end on. If people want to get in touch, how can they reach you?
Mayank Badola: I like drinking, so you can meet me at any bar in Amsterdam.
Malte Ubl: Excellent. I will definitely be at our booth if people want to chat, or up here. This was fun. Thanks for coming. Thanks for coming everyone, and have a great day.
Mayank Badola: Thank you.