Back to Blog
How a Single Selfie Took Down Twitter
System DesignBackendArchitecture

How a Single Selfie Took Down Twitter

Ellen's Oscar selfie took Twitter down in 20 minutes. The servers were fine — everyone just kept hitting the same cache key.

40 million people were watching the Oscars. Ellen pointed at the camera and basically said — retweet this, right now, all of you.

Ellen's Oscar selfie — the tweet that broke Twitter

The original tweet — still the most retweeted post on the platform.

Twitter was down 20 minutes later.

Everyone assumes it was just too much traffic. Too many users, not enough servers. But that's not what happened. The real reason is something that will eventually show up in your system too — and you probably won't see it coming.

By the end of this post you'll know what a Hot Key is, why it's scarier than a traffic spike, and how to make sure it doesn't take your app down.


How Twitter Normally Works

Twitter keeps popular tweets in a cache — fast memory that sits in front of the database. Instead of hitting the DB every time someone loads a tweet, the cache just hands it back instantly.

Normal day: millions of people, millions of different tweets, traffic spreading naturally across all cache servers. Every node carries roughly its fair share. Nothing breaks.

Normal traffic — load balanced evenly across all cache nodes Three nodes. Balanced. Quiet. This is what healthy looks like.


So What Actually Broke?

Ellen didn't just post a tweet. She told 40 million live TV viewers to go retweet it. Right now. Tonight.

So they all opened Twitter at once. And they all went looking for the exact same Tweet ID at the exact same second.

Every request went to the one node holding that key. The other two just sat there. That one node took the full hit — and fell over.

That's a Hot Key. One piece of data, one overwhelmed server, the rest of your infrastructure completely useless.

Viral spike — all traffic funnels into Cache Node 1 All traffic hits Node 1. Nodes 2 and 3 idle. Node 1 is finished.


And Then It Got Worse

When that cache node went down, users got errors. Their apps did the sensible thing — retry the request. Automatically. Immediately.

Millions of phones. All retrying. All hitting the same dead server. Over and over.

Twitter had turned its own users into a DDoS attack against its own infrastructure. Nobody planned it. Nobody could stop it. The apps were just doing their job.

Don't do this:

// Retries instantly — will finish off your dying server
function retry() {
  fetchTweet();
}

Do this:

// Backs off, adds randomness, gives the server room to recover
function retry(attempt) {
  const delay = Math.min(1000 * 2 ** attempt, 30000);
  const jitter = Math.random() * 1000;
  setTimeout(fetchTweet, delay + jitter);
}

The jitter is the part people skip. Without it, a thousand clients wait exactly 2 seconds and then all retry at the exact same moment. You haven't solved anything, you've just delayed it.


Meanwhile, Something Else Was Breaking

While the cache was dying, something else was quietly falling apart.

When you retweet something on Twitter, the system doesn't just update a number. It copies that tweet into the personal timeline of every single one of your followers. Every one.

2 million retweets. Multiply by each person's follower count. That's billions of individual writes, all queued at once. The background workers couldn't keep up. Timelines froze. And now the failure had jumped to a completely different part of the system.

This is how real outages spread. They don't stay where they start.


The Fix: Shard Your Hot Keys

Don't keep a viral tweet on one node. Copy it across several.

// All pressure on one key
cache.set('tweet_123', data);

// Spread it across nodes
cache.set('tweet_123_shard1', data);
cache.set('tweet_123_shard2', data);
cache.set('tweet_123_shard3', data);

// Each user routes to a different shard
const shard = (userId % 3) + 1;
cache.get(`tweet_123_shard${shard}`);

Same traffic. Three servers sharing it instead of one dying alone. The spike becomes a non-event.

Sharded hot key — viral load split evenly across all nodes Same load. Three nodes. 33% each. System doesn't even notice.

Twitter made around 50 architectural changes after this. Sharding the cache was the one that mattered most.


Don't Make These Mistakes

Trusting your load balancer too much. It balances requests — but only when the keys are different. One hot key and that balance means nothing.

Retrying without backoff. Instant retries on a failing server don't help it recover. They finish it off. Exponential backoff and jitter aren't a nice-to-have.

Fan-out for everyone. Fine for regular users. For accounts with massive followings, go pull-based — fetch the tweet when someone opens the app instead of pushing it to millions of timelines the second it's posted.


TL;DR

  • A Hot Key is when one piece of data gets so much traffic it kills the single server holding it
  • Retry storms happen when your own app hammers its dying servers — fix it with exponential backoff and jitter
  • Shard hot keys across multiple nodes so no single one takes the full hit
  • Fan-out write amplification quietly buries your queues during spikes

Comments (0)

Related Posts

How JIT Compilation Supercharges Your JavaScript

How JIT Compilation Supercharges Your JavaScript

JIT compilation boosts JavaScript performance by compiling frequently used code into fast machine code at runtime, combining the speed of compiled languages with the flexibility of interpreters. This makes JavaScript faster without sacrificing its dynamic nature.

JavaScriptBackendPerformance
Read More
Why Regular Expressions (regex) is Slow?

Why Regular Expressions (regex) is Slow?

Learn about how regular expressions (regex) can affect your application's performance and what factors contribute to their computational overhead.

BackendPerformanceRegEx
Read More

Design & Developed by Asim
© 2026. All rights reserved.