Log in to watch

Log in or create a free account to watch this video.

Log in
Las Vegas 2022
Share

Operational Excellence in April Fools’ Pranks: Being Funny Is Serious Work!

April Fools Day is less than 6 months away! Time to start planning!Your website's April Fools Prank may be the most high-stakes launch of the year! The usual best practices don’t work. There are no second chances, you can’t just re-launch on April 2nd. Many silos must sign off on the plan, yet the plan must be kept secret. Loads shift from zero to millions in one day with little room for mistakes. And what if the prank backfires and has to be disabled quickly?In this talk I’ll discuss how to communicate across the company to get buy-in, how to load test using “dark launches”, rollbacks using feature flags, and other techniques for assuring a high stakes launch happens correctly. I’ll include stories from many AFPs both that I’ve observed and been a part of...and maybe you'll learn something useful for other launches too!

Chapters

Full transcript

The complete talk, organized by section.

Tom Limoncelli

Hello. Hello. How is everybody today? Excellent. Thank you for coming to my talk. Hi, I'm Tom Limoncelli.

01Stack Egg: a self-inflicted denial of service

The year was 2015. The date: April 1st at 10:23 UTC. StackOverflow.com enabled an April Fool's prank called Stack Egg. It was a simple Tamagotchi-like thing in the upper-right-hand corner of the website, and although it had been tested thoroughly, we didn't account for the additional network traffic that would be generated.

About three hours later, the company's load balancers were completely overloaded, making the site unusable. All of the company's web properties were affected. The prank had essentially created a self-inflicted denial-of-service attack.

Three minutes later, the engineers involved went to a control panel and calmly disabled the feature. A half hour later, the site was entirely back to normal. In the next hour, the problem was diagnosed, fixed, and new code was pushed into production. The prank was saved.

Was Stack Overflow lucky that the engineers had made it easy to disable this feature? No, it wasn't luck. It was all in the playbook for: operational excellence in April Fool's pranks. Thunderous applause. Thank you.

02Credibility and agenda

So hi. My name's Tom Limoncelli. I've noticed at this conference all the presentations always have two slides in common. One I call the credibility slide. This is my credibility slide: I'm an SRE TPM at Stack Overflow. Previously I worked at Google and Bell Labs and a number of other companies. I write, I blog, I tweet.

I've written a number of books. The Practice of Cloud System Administration is the most relevant to this talk. Also relevant: back in 2006, I published all the IETF April Fool's RFCs in a book. These are documents that are funny, but you can download them for free, so why did I publish them in a book? To be honest, I'm a fan of coffee table books.

The other slide I see in all the presentations at this conference is some kind of table-of-contents slide. This is my table-of-contents slide. I'm going to talk about something, then something I'm not going to talk about, then the bulk of the talk is this top-five best-practices list. I hope you enjoy it. We're going to end with a three-part extravaganza. Is everybody ready? If you weren't sure you were here, this is the introduction.

03What makes an April Fool's prank funny

What makes an April Fool's prank funny? I believe it has to have two key ingredients: it has to be topical, and it has to be absurdist.

Topical means it's current, relevant, and has something to do with what's happening in the news. If your company's April Fool's prank relies on, oh, I don't know, a Madonna song from the '80s, that's not exactly topical.

It needs to be absurdist. Absurdist is not just being silly for silly's sake. It takes something to an extreme to reveal a deeper truth. For example, if you think the talks at this conference are a little too commercial, an absurdist joke would be to sell the next 30 seconds to McDonald's for a commercial. I'm not going to do that because you're not the right demographic, and Tom, please stop calling us. Anyway, I will say that this talk is brought to you by Stack Overflow for Teams. You love Stack Overflow. You could have a Stack Overflow just for your company. Isn't that awesome? Okay, moving on.

The third thing that a good, successful April Fool's prank has to have is that element of surprise: something unexpected that makes people smile.

04Case study: the mustache story

Back in 2010 I was working at Google and we were brainstorming April Fool's pranks. Google is known for its external April Fool's pranks, but we were brainstorming internal pranks, which Google does a lot of, probably more than the external stuff. We wanted to be topical, absurdist, and have an element of surprise.

What was big in the news in 2010? Face recognition. It was becoming popular. This was before filters and social media, and Google had just acquired a face-recognition company. I said, gosh darn it, shouldn't we be able to accurately place a mustache on everyone's photo in the corporate directory? That's what we ended up doing.

It was topical: face recognition was in the news. It was absurdist: it's pretty absurd. And the element of surprise was that the company corporate directory is a stodgy business function, so seeing something like this had all three.

05Case study: Dance Dance Authentication

Fast-forward to 2017. I'm working at Stack Overflow and we made this video.

At Stack Overflow, we help a monthly audience of 40 million developers get answers to their questions, find jobs, and solve problems. But that success in turn has made us a target. Attacks on our security spurred us into action. We had to adapt. We had to get better. We're excited to present the latest evolution in computer security: Dance Dance Authentication.

We built the... It was all three. It was topical: this was the year a lot of people were starting to deal with two-factor authentication. It was absurdist: two-factor authentication can be a little annoying, so we took it to an extreme. You have to dance in front of a camera. That's absurdist. And it was a surprise because who knew Matt Sherman knew how to dance? Okay, that was an inside joke. We'll talk about that in a second.

06What makes pranks not funny

What makes an April Fool's prank not funny? A lot of things, but three I'd like to talk about: hurting people, punching down, and inside jokes.

Hurting people: don't cause real harm. Gmail did an April Fool's prank once that literally blocked people from getting email from you. That was real damage. Not a good idea.

Punching down means making fun of the downtrodden. Racist, homophobic, and sexist jokes are by their very nature punching down, and you don't want to do them. Also because they're racist, homophobic, and sexist.

Inside jokes: no one laughed when I said, who knew Matt Sherman was such a good dancer? That's an inside joke. It doesn't get a good reaction. In tons and tons of corporate April Fool's pranks that I see, it's like they didn't really think about the audience. Sorry, that was an inside joke.

07Why April Fool's pranks are operationally difficult

Why are April Fool's pranks difficult at a DevOps, operational level? They're difficult because they are high stakes. The launch date is inflexible. It can't be late. You can't do it on April 2nd. That's not going to be funny. It's highly visible. Everyone's watching, and it could go viral. And there are no second chances. If it fails, it kills the project. You can't just say, well, we'll try again next year. Next year it might not be topical anymore.

In this talk, we're going to cover five best practices for April Fool's pranks. Let's drill down.

08Best practice 1: feature flags

Best practice number one: feature flags. You want to hide the feature behind a flag. Instead of rolling out new software to enable the feature, you roll out the new software with the feature disabled. Then, say at 9:00 a.m. when you want to launch, you flip a flag and the feature goes live. At the end of the day, you can flip the flag in the other direction and the feature disappears.

This is so much better than rolling out new software to enable the feature. You could have the greatest DevOps CI/CD toolchain in the world, but how difficult would it be if you said, okay, to enable this feature, your SRE team or whoever is going to have to hit the right button at exactly 9:00 a.m.? That's difficult to coordinate.

It also brings up the problem of how you revert or turn off the feature. Do you revert to the previous release, which is risky, or do you roll forward, which is risky in other ways? It's much better to have a feature flag. Shout out to my friends at the number of companies that do feature flag as a service now.

09Best practice 2: load testing

Best practice number two: load testing. See how it performs under load. Something we obviously didn't do really well in the opening case study with the Stack Egg prank.

Basically, you want to A/B test your benchmarks with and without the prank. Pretty simple, but so few people do load testing. I think load testing is like flossing your teeth. Everyone knows you should do it. Everyone says they do it, but no one actually does it. How many people here flossed seven out of seven days recently? We got one hand, two hands, three. How many people load test major features? A few. About the same number. How many people floss and load test? Or load test your flossing? Okay, let's move on.

10Best practice 3: dark launches

Load testing is good, but there's still a problem. You can be really good at load testing, but it's hard to load test and simulate the full impact of millions of users. You can't just load test with one-tenth the number of users, multiply by ten, and cross your fingers.

There's something called dark launches. This was popularized by Facebook. The story goes that when they launched Facebook Chat, now Facebook Messenger, they knew they were going to go from zero users to potentially a billion users during the press conference where they announced the new feature. Wouldn't it be horrible if it didn't scale to a billion users? All the reporters would be writing about not the feature, but the fact that it was a failed launch.

Six months prior to launch, they actually launched the code in your web browser, but it didn't display anything. You didn't see the feature, but it was being downloaded. Some people got code that would send test messages, simulating sending chat messages. Once they got that working with 1% of users, they turned it up to 2%, 3%, found a performance bug, then 5%, 6%, and found another performance bug. They were able to fix these, and by launch date they had tested Facebook Chat with almost a million users.

There are a lot of different opportunities to do this, but looking at the clock, I'm going to keep moving on. For example, before launching IPv6, Google did something similar. There was a little JavaScript in your browser testing whether your ISP had configuration problems with IPv6.

Some negative examples: the Stack Egg case study from the beginning of my talk. Boy, we could have avoided a lot of problems if we had done a dark launch. Also, when Apple launched the iWatch, they do these big product-introduction productions. On their website, they added not just video streaming but real-time updates, which overloaded the system much in the same way the Stack Egg problem happened. They could have prevented that problem if they had done a dark launch.

11Best practice 4: involve all the silos

Best practice number four: involve all the silos. If you're going to launch an external April Fool's prank, you need to make sure everyone is involved: marketing knows what's going on, the PR department knows, sales. You probably don't want to tell sales what the prank is, because no offense, but they're going to blab, but at least let them know something is coming.

The executives should know, especially if the prank mentions your CEO. The executives need to know ahead of time. You also want your diversity and inclusion organization to know about the prank. This can prevent some embarrassing accidental things that aren't funny. We'll leave it at that.

Customer support also needs to know. For example, in the mustache story, we had informed and trained all of Google's help desks worldwide on what to do if someone called and complained. We had a mechanism where you could disable the mustache on a per-person basis if that person didn't think it was funny. One person didn't. Actually, they called back 15 minutes later and asked for it to be re-enabled because then they realized they were the only person who didn't have a mustache. True story.

12Best practice 5: perform a project retrospective

Best practice number five: perform a project retrospective. Companies call these different things. Some people call them a postmortem or a learning review. Some people spell post-mortem with a hyphen, or as two words, or as one word. Whatever you call it, do a postmortem whether it's successful or not.

People think of postmortems as something you only do if something bad happened. I love doing a learning review or project retrospective no matter what, because there's just as much to learn if you're successful as if you had an outage or failure.

If you're not doing postmortems, start by keeping it simple. Just a simple outline. Discuss what went right, what went wrong, what did we learn, and really focus on what did we learn.

What I love about postmortems is that it makes the whole organization smarter. If I make a mistake and learn from it, good for me. But if I make a mistake and have the humility to tell my whole team about this mistake and how to prevent it in the future, my whole team has gotten smarter. If I have an outage and we study it, learn from it, and inform the entire company or at least all of engineering, then outages are no longer a problem. Outages are just unscheduled learning events that make the entire company smarter.

I remember when I joined Stack Overflow, one of the first things I did was institute a postmortem policy. I set up the meeting. We discussed what went right, what went wrong, what did we learn. We wrote up a report, and then I emailed the report to the entire company.

Someone hit reply-all and sent this: I don't know about anyone else, but I really like getting these postmortem reports. Not only is it nice to know what happened, but it's also great to see how you guys handle the in-the-moment and how you plan to prevent these events going forward. Really neat. Thanks for the great work.

I'm so proud to work at a company where after an outage, the email I get isn't, you jerks can't keep this website up, but instead: really neat, thanks for the great work.

Postmortems have a super-powerful effect. Not only the things I covered, but also it's an opportunity to share gratitude with all the teams involved. That was a little emotional there.

13Three-part extravaganza: the lazy way

To calm down, where are we? It's time for the three-part extravaganza closing. Let's talk about the lazy way. I don't mean lazy as a bad thing. I mean a good engineer is lazy as in as efficient as you can be for the goal in mind.

What are three lazy ways to do April Fool's pranks? Lazy technique number one: use someone else's resources. Remember that video I showed you? That was a blog post and a YouTube video. If that went so viral that it overloaded YouTube's website, well, that's YouTube's problem.

Lazy technique number two: creatively describe something that already exists. Maybe you don't want to write a lot of code, but you could find a funny way of explaining something that you're already shipping.

For example, if you're familiar with the Go programming language, Go is intentionally a bare-bones language. It leaves out a lot of features intentionally. But if you look at various online forums, many new users learn the language and immediately post, I like this language, but it's missing this feature and this feature and this feature. People are always complaining: why don't you add these features? You're not getting it. It's intentionally a bare-bones language.

A great April Fool's prank that the Go team could do, and if anyone here is from that team you could use this for free, is just announce that Go 3.0 will have all of those features. Every feature that people ever ask for. Not only are we going to add these features, but we've added these features and we're ready to ship Go 3.0. Just click here to download Go 3.0. When people click on that link, it would take them to this page.

Actually, they shouldn't do this, because making fun of Java is punching down, and we kind of said that's not good.

Technique number three: mislead your audience. For example, I was about to pitch to Gene Kim that I would do a talk about high-stakes launches with information based on my book The Practice of Cloud System Administration. But then at the last minute I changed the title of the talk to Operational Excellence in April Fool's Pranks, and that's how we got here.

Thank you very much. You have six months to April Fool's Day, so start planning now. Thank you.