Tony Wang22 min read14% of the Web Is Actually Dead
Only 14% of the top 10 million domains are genuinely dead — not the usual 27.6%. Most 'dead' sites are just blocking bots or serving errors.
You have probably seen the stat: 27.6% of the web is dead. It comes from a 2024 crawl of the top 10 million domains, and it gets repeated because it is striking and a little bleak. We ran that study. And when we re-scanned the same 10-million-domain list in 2026 — this time separating a domain that is genuinely gone from one that is merely refusing a bot — the real number came out at 14.2%.
The web didn't suddenly heal. The original number was counting the wrong things. A naive crawler can't tell a dead domain from a live one hiding behind Cloudflare, and it counts a server that politely returns "404 Not Found" the same as one that never answers at all. Fix the classification and roughly half of "dead" turns out to be alive — it just wasn't talking to a bot. Here is the full picture, from 9,992,781 probed domains.
What the outcome of a 10-million-domain scan actually looks like
Every domain gets one of four labels. alive means it answered (a 2xx, or even a 404/5xx — the server is up). blocked means it answered but refused our automated client (a 403, 429, or anti-bot challenge). redirect means it bounced somewhere we couldn't resolve. dead means it never answered at all — no DNS record, or nothing accepts a connection.
Three-quarters of the top web is alive and answering. The interesting part is the bottom 23% — the slice everyone argues about — and how you split it.
The real number: 14% dead, not 27.6%
Same list, same scale, one difference: in 2026 we refuse to call a domain dead just because our bot couldn't read it. A genuinely dead domain fails early — DNS returns nothing, or the connection is refused. A live-but-defended domain fails late, with a 403 or a challenge page, which is a completely different signal. Counting honestly moves the headline from 27.6% to 14.2%.
DNS failure, anti-bot 403s, served 404/5xx and timeouts all lumped together
No DNS, connection refused, or nothing accepts a connection
Where do the missing ~13 points go? Almost all of it is two things a naive crawl mislabels:
- 8.9% (891,672 sites) answer but block bots. A 403, a 429, or a Cloudflare "Just a moment" challenge to a datacenter IP. These are some of the most alive sites on the web — they run active defenses precisely because people want their data.
- ~4% serve a 404 or 5xx from a live server. A "404 Not Found" or a "503 Service Unavailable" is proof the host answered. The original crawl counted them as dead; a server that returns an error is the opposite of gone.
The remainder is a 2024 measurement artifact: that crawl resolved each domain through a single DNS resolver, and a flaky lookup falsely marked resolvable domains dead. We now cross-check across resolvers before declaring a DNS failure.
What a no-follow crawler gets wrong
The gap between 27.6% and 14.2% is largely a measurement choice: whether you follow redirects and read what the server actually says. A crawler that stops at the first response sees only 45.9% return a clean 200 and writes off the rest. Follow the redirects and read the bodies, and 71.9% are alive. Here is where every first response actually ends up:
Show the flows
| 200 OK → Alive | 4,584,611 (46.3%) |
| 3xx redirect → Alive | 2,677,304 (27%) |
| No response → Dead | 1,413,013 (14.3%) |
| 403 / 429 → Blocked | 410,511 (4.1%) |
| 3xx redirect → Blocked | 365,368 (3.7%) |
| 404 → Alive | 236,685 (2.4%) |
| No response → Blocked | 105,222 (1.1%) |
| 5xx → Alive | 85,728 (0.9%) |
| 3xx redirect → Redirect | 31,267 (0.3%) |
| 3xx redirect → Dead | 1,775 (0%) |
The big rivers carry the point: a 301 is not a dead end — 87% of redirects resolve to a live page, and a 403 or 429 is a live site refusing a bot, not a corpse. The only response that reliably means dead is no response at all — and that single No response → Dead band is almost the entire dead web.
The genuinely dead web is mostly DNS that's gone
So what is the 14.2%? Overwhelmingly, it's domains that have left DNS entirely. Of the 1,414,788 genuinely dead domains, 1,077,715 — about 76% — no longer resolve to any IP at all. The registration lapsed, the zone was deleted, the project was abandoned. The rest refuse or reset every connection, or fail TLS to a host that is truly down. A dead domain almost never answers and errors — it simply isn't there.
This matters if you build anything that follows links or crawls a list: the failures you'll actually hit are split between "this domain is gone" (retry never helps) and "this site is blocking me" (a different request gets in). Treating them the same is the single most common way web-health numbers get inflated — and the most common way a scraper wastes a budget retrying domains that will never answer.
The famous dead
Aggregate percentages are abstract. So we sorted the genuinely-dead domains by popularity rank and went looking for names you'd recognise — and the graveyard is remarkable. The single highest-ranked dead domain in the entire top 10 million makes the point on its own.
At #568 sits fanlink.to, the music "smart-link" service artists and labels used for pre-save and streaming links. In March 2024 its parent — Eventbrite's ToneDen — lost control of the .to domain and never recovered it, instantly breaking millions of links sitting in artist bios, ads, and press releases.
Which raises the obvious question: how is a dead domain the 568th most popular on the web? Because the web never stopped knocking. Every un-updated link, embed, and bookmark keeps firing requests at an address that no longer answers — the rank is a fossil of past popularity. That is precisely why a popularity-ranked list is full of corpses at all.
Music & video
fanlink.to† 2024Music smart-links · ToneDen / Eventbrite
The single highest-ranked dead domain in the whole top 10M (#568). In March 2024 Eventbrite lost control of the .to domain overnight, instantly breaking millions of artists' pre-save and streaming links sitting in bios, ads, and press releases. Wayback ↗
grooveshark.com† 2015Free music streaming · ~20M users
Forced shut by the major labels' copyright suit (willful infringement, ~$700M of exposure). The entire catalogue was wiped the day the settlement landed; a co-founder died months later at 28. Wayback ↗
rdio.com† 2015Music subscription service
Bankrupt after burning ~$2M a month. Pandora bought the technology for $75M and shut the service down the day before the sale closed. Wayback ↗
gfycat.com† 2023GIF host for Reddit & Discord · ~220M users
Bought by Snap in 2020, then switched off as a non-core asset — one of the largest single link-rot events ever, breaking millions of embedded GIFs across the web. Wayback ↗
veoh.com† 2024Video-sharing site
Won a landmark DMCA case that helped protect every YouTube-style site, limped on for years under Japan's FC2, and finally went dark in November 2024. Wayback ↗
metacafe.com† 2021Top-3 video site of 2006
One of YouTube's first serious rivals — it simply went offline one day in 2021 with no announcement at all. Wayback ↗
The social web
del.icio.us† 2017Delicious · invented social bookmarking
The site that coined web-scale tagging. Passed through five owners (Yahoo → AVOS → Science → Delicious Media → Pinboard for $35,000) before going read-only. Wayback ↗
dmoz.org† 2017The Open Directory · a human-curated map of the web
91,000 volunteers cataloguing 3.8M sites — once a near-prerequisite for SEO, then made obsolete by Google's algorithm. Lives on as the community fork Curlie. Wayback ↗
pipes.yahoo.com† 2015Yahoo Pipes · visual no-code data mashups
The “Zapier of 2007.” Killed in a Yahoo cost-cut; thousands of live RSS and data pipelines broke on the same day. Wayback ↗
topsy.com† 2015The only full historical Twitter search
Indexed hundreds of billions of tweets back to 2006. Apple bought it for ~$200M and quietly switched it off two years later; the searchable archive simply vanished. Wayback ↗
aviary.com† 2018Photo-editing SDK embedded in 7,000+ apps
Powered in-app photo editing across the mobile economy (10B edits). Adobe acquired it, folded the tech into Creative Cloud, then sunset the free SDK. Wayback ↗
The developer web
s7.addthis.com† 2023Share buttons + tracking on 15M websites
Oracle bought it for the behavioural data, then killed it under GDPR pressure — a single shutdown darkened share widgets across millions of sites at once. Wayback ↗
programmableweb.com† 2023The public directory of ~19,000 web APIs
The index of the “API economy” for 17 years. Salesforce / MuleSoft erased the whole thing with no archive. Wayback ↗
securityfocus.com† 2021Home of the Bugtraq disclosure list (since 1993)
The security world's noticeboard for nearly 30 years. Symantec → Broadcom → Accenture let it freeze; the Bugtraq archive survives only at seclists.org. Wayback ↗
opensolaris.org† 2013Sun's open-source operating system
Oracle froze it the moment it bought Sun and pulled the domain in 2013. The community kept the code alive as the illumos fork. Wayback ↗
sorbs.net† 2024Spam blocklist covering 512M IP addresses
A DNS blocklist that mail servers queried for over two decades. Proofpoint pulled the plug in 2024; servers worldwide still query a list that no longer answers. Wayback ↗
Government & institutions
patft.uspto.gov† 2022US patent full-text search (1790–present)
Retired for a new search tool — breaking decades of direct patent links embedded in academic papers, legal briefs, and analysis tools. Wayback ↗
petitions.whitehouse.gov† 2021Obama's “We the People” e-petitions
A petition once topped a million signatures. The platform was quietly discontinued on Inauguration Day 2021 and never revived. Wayback ↗
weblogs.com† ~2009Dave Winer's blog-ping server · the early blogosphere's heartbeat
Every new blog post once pinged this host; VeriSign paid $2.3M for it. It faded after 2009 — yet old WordPress installs still ping the dead address to this day. Wayback ↗
europa.eu.int† 2006The European Union's original web address
The canonical home of EU law and institutions for over a decade. Migrated to europa.eu on Europe Day 2006, stranding a generation of links. Wayback ↗
Read those twenty obituaries back-to-back and one cause of death stands out: being acquired. Seven of the twenty were bought by a bigger company that then switched them off — Snap killed Gfycat, Apple killed Topsy, Oracle killed both AddThis and OpenSolaris, Adobe killed Aviary, Salesforce killed ProgrammableWeb, Broadcom let SecurityFocus rot. "Acqui-killed" beats bankruptcy, lawsuits, and neglect combined.
Twenty headliners can't show the shape of the whole graveyard. So we widened the lens — pulling ~100 widely-recognised, verifiable shutdowns (from this scan's dead domains and the public record), dating each to the year its service ended and sorting them into six corners of the web. Stacked by year, two decades of the dying web look like this:
Show the data
| Topic | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | 2026 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Social & community | 0 | 0 | 0 | 1 | 1 | 2 | 4 | 2 | 1 | 2 | 1 | 2 | 4 | 1 | 2 | 1 | 1 | 3 | 1 | 1 | 0 |
| Developer & infrastructure | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 1 | 3 | 2 | 4 | 4 | 2 | 1 | 1 | 2 | 0 | 4 | 1 | 1 | 0 |
| Music & video | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 2 | 3 | 2 | 1 | 0 | 3 | 2 | 0 | 2 | 2 | 0 | 0 |
| Search & reference | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| Media & news | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| Commerce & government | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
Two things stand out. Social platforms and developer tools are the bulk of the dead web — the social graveyard (Friendster, Orkut, Bebo, Google+, Path, Yik Yak, Ello, Digg…) and the dev-tools column (Google Code, Parse, Google Wave, Gitorious, Sunrise, Mailbox…) are dead even, and together they're more than half of everything here. And the deaths cluster: a first swell in 2012–2017 as the Web 2.0 and check-in/anonymous-app generation collapsed, then a second from 2020 as pandemic-era and big-tech bets were cut (Quibi, Mixer, CNN+, Stadia, Google Play Music). Before 2009 the stream barely exists — most of the web simply wasn't old enough to have died yet.
One honest caveat, and the reason we re-checked every domain by hand: a dead domain is not always a dead thing. Some only look dead because the service rebranded or moved — money.yandex.ru became YooMoney, the old suicidepreventionlifeline.org host gave way to 988lifeline.org, the EU's europa.eu.int simply became europa.eu. We re-probed every domain above against live DNS in June 2026 and dropped the false positives (nrel.gov and angelfire.com still resolve fine). What remains genuinely no longer answers.
Death is uneven: which TLDs rot fastest
Dead rate is not evenly spread. Split the 10 million by top-level domain and a clear gradient appears — cheap-registration and institutional TLDs rot far faster than the .com baseline.
The standouts tell two stories. .cn, .info, and .in lead because they are cheap and heavily registered for short-lived or speculative sites that lapse quickly. But .gov (26%) and .edu (22%) near the top is the more striking finding: institutional domains rot badly because content is reorganized, departments are dissolved, and old project sites are simply switched off — exactly the digital decay Pew Research documented in 2024, where government and reference pages had some of the worst link rot. The web's most authoritative corners are some of its least permanent.
The geography of the dead web
Group the country-code domains by country and the decay draws a map. The emerging-market registration booms of the last decade left the biggest graveyards — China's .cn leads at a third dead — while German-speaking Europe runs the most durable web on earth.
Show the data
| China (.cn) | 33% |
| India (.in) | 25.9% |
| United States (.us) | 22% |
| Brazil (.br) | 20.9% |
| Spain (.es) | 16.6% |
| Japan (.jp) | 15.6% |
| United Kingdom (.uk) | 15.3% |
| Russia (.ru) | 14.9% |
| France (.fr) | 14.5% |
| Canada (.ca) | 14.1% |
| Italy (.it) | 13.5% |
| Poland (.pl) | 13.2% |
| Sweden (.se) | 11.6% |
| Switzerland (.ch) | 9.8% |
| Netherlands (.nl) | 9.7% |
| Austria (.at) | 8.6% |
| Germany (.de) | 7.6% |
| Czechia (.cz) | 7.3% |
A domain in China's .cn space is more than four times as likely to be dead as one in Germany's .de. Fast, cheap, speculative registration — and, for .cn, a churn-heavy market behind the Great Firewall — leaves more abandoned domains behind; the mature, costlier-to-register German-speaking TLDs barely rot at all.
What the top 10 million is even made of
For context, here's the shape of the corpus itself. .com is not just first — it is nearly half of the entire top 10 million, larger than every country-code and new-gTLD combined.
Two details worth flagging: .io (3.6%) has quietly become the third-largest TLD on the popular web — the developer/startup default — and the AI-era .ai (0.30%, ~30,000 domains) has already overtaken established country domains like .fi, .no, and .tw in the top 10 million.
The dead web is the long tail nobody visits
Death is not spread evenly through the ranking. Split the 10 million by popularity and the dead rate climbs more than 20× — from 0.8% in the top 1,000 to 16.1% past rank 5 million. blocked runs the other way: the most-trafficked sites wall bots hardest, then the defenses thin out down the tail.
That gradient reframes the headline. The 14% is real by domain count — but those dead domains are almost all in the part of the web nobody visits. 99.8% of dead domains sit below rank 100,000, and the popular top-100K — where the overwhelming majority of web traffic lives — is only 2.2% dead. Weighted by attention instead of raw count, the dead web nearly disappears:
share of the top 10M domains that are dead
the popular top-100K, where most web traffic is, is only 2.2% dead
"Dead web" is not "link rot" — and definitely not "dead internet theory"
Three different things get blurred together. Keeping them separate is the whole point:
- This study (dead domains): does the domain still resolve and answer? We find 14.2% of the top 10M do not.
- Link rot (Pew, Ahrefs): are the links inside living pages still good? Pew Research found 25% of pages from 2013–2023 are gone and 38% of 2013 pages have vanished; Ahrefs found 66.5% of tracked links have rotted. Those measure decay within the living web — a complement to this, not the same number.
- Dead internet theory: the claim that AI-generated content and bots have displaced human activity online. That is about what's on the living web, not whether domains are reachable. It is a separate conversation, and conflating it with link rot is how bad statistics spread.
If you only remember one distinction: link rot is about the pages that are still up; the dead web is about the domains that aren't.
What this means if you're building a scraper or a data pipeline
The practical takeaway is the 8.9% blocked slice, because it is the part most likely to break your project. When a request fails, the reason dictates the fix, and they are nothing alike:
- A dead domain (no DNS, refused) will never answer. Retrying, rotating proxies, or switching to a browser does nothing. Drop it and move on.
- A blocked domain is alive and reachable — it just refused your client. A matched browser TLS/JA3 fingerprint or a residential IP gets in where a datacenter bot gets a 403. This is a transport problem, not a dead site.
This isn't theoretical. Probing every domain a second time with a real Chrome TLS/JA3 fingerprint recovered ~72,000 of the ~890,000 sites the polite bot was blocked from — enough to pull the blocked rate from 8.9% down to 8.2%. Every one of those is a live site reachable with the right client, not a dead end.
Naive crawlers can't tell these apart, so they either give up on reachable sites or burn a budget retrying gone ones. The cost-efficient pattern is to escalate only as far as a site forces you to — which is exactly how Crawlora's anti-bot unblocker works, and why it bills on success rather than per attempt. If you want to know which bucket a specific URL is in before you build, the free anti-bot checker tells you in about 30 seconds, and our companion Anti-Bot Adoption Index measures how much of the live web runs a wall at all.
Two more things the scan turned up
The web is a maze of redirects. Only 69% of domains serve their final page directly; 31% bounce through at least one redirect — and a stubborn sliver loops until our 10-hop cap. That is exactly why a crawler that doesn't follow redirects sees a web that looks half-broken.
The dead web is stuck on HTTP. A decade into the HTTPS transition, the living web is ~78% encrypted — but dead and bot-blocked domains are barely half, abandoned before they ever got a certificate.
How we measured it
No magic — a deliberately simple, reproducible probe, run at 10-million scale.
The list. The full top 10 million domains (a DomCop/Tranco-style popularity ranking). We reached 9,992,781 of them — 99.95% coverage.
The probe. Each domain is fetched HTTPS-first from a datacenter IP, following redirects, with a short timeout and a cross-resolver DNS retry before any "DNS failed" verdict. We never submit a form, solve a CAPTCHA, log in, or fetch anything behind a wall. Every domain is probed twice — once as an honest bot, and once as a browser-like client with a real Chrome TLS/JA3 fingerprint — so we can separate "nobody's home" from "the bot wasn't let in."
The classification. A final 2xx, or a served 404/5xx (the host answered), is alive. A 403/429 or anti-bot challenge is blocked. A 3xx we can't resolve is redirect. Only no DNS, a refused/reset connection, or nothing accepting a connection is dead. That single rule — a server that answers anything is up — is the entire difference between 14.2% and 27.6%.
Limits. This is homepage-level reachability from a datacenter vantage, so it is a lower bound: a domain that blocks a datacenter bot may open for a residential browser, and a deep page can be deader (or more defended) than the homepage. Snapshot: June 2026. The full per-domain dataset — every domain, every arm — is open, and the live, searchable version is the Dead-Web Index.
Reach the live web, not the dead one
14% of the top web is gone — but 9% is alive and just blocking your bot. Crawlora escalates from a plain request to a real browser fingerprint only as far as a site demands, and bills on success. Stop retrying dead domains and stop getting 403s from live ones.
Frequently asked questions
How many of the world's top websites are dead?
14.2% of the top 10 million domains are genuinely dead — about 1.41 million sites that no longer resolve in DNS or refuse every connection. That is far below the often-quoted 27.6%, which counted anti-bot blocks and answered errors as death.
What's the difference between a dead website and a blocked one?
A dead site never answers — no DNS record, or nothing accepts a TCP connection. A blocked site is alive and answering, it just refuses an automated client (a 403, 429, or anti-bot challenge). 8.9% of the top web — 891,672 sites — is blocked, not dead, a distinction naive crawlers miss.
Is the dead web the same as the dead internet theory?
No. The dead internet theory is a claim that AI-generated content and bots have replaced human activity on the living web. This study measures the opposite, concrete thing: how many domains have gone completely dark and unreachable — DNS gone, connection refused, server gone.
Why is this lower than the 27.6% dead-web figure?
Earlier top-10M crawls counted three non-dead things as dead: anti-bot 403/429 blocks, 404/5xx pages served by a live server, and domains a single flaky DNS resolver failed to look up. Classifying honestly — dead means genuinely unreachable — brings the real figure to 14.2%.
Which TLD has the most dead domains?
.cn has the highest death rate among common TLDs at 33%. Institutional TLDs like .gov (26%) and .edu (22%) also rank high — matching Pew Research's finding that government and reference pages suffer the worst link rot.
Why does a site look dead to a scraper but load fine in my browser?
Anti-bot systems serve a 403 or a challenge to a datacenter IP while letting a real browser through. A matched browser TLS/JA3 fingerprint reaches the site where a naive bot is blocked — which is why this study probes every domain twice, as a polite bot and as a browser-like client.