Tony WangJune 14, 202622 min read

14% of the Web Is Actually Dead

Only 14% of the top 10 million domains are genuinely dead — not the usual 27.6%. Nearly half of the 'dead' web is just blocking bots or serving errors.

Data Study Anti-Bot Web Scraping API

Key takeaways

We probed 9,992,781 of the top 10 million domains in June 2026. 14.2% are genuinely dead — no DNS, no connection, nothing answers — not the 27.6% a naive crawl of the same list reports.
Nearly half of 'dead' was never dead. 8.9% of the top web (891,672 sites) answers but blocks an automated client (403/429/anti-bot), and another ~4% serves a 404 or 5xx from a live server. Naive crawls count all of that as death.
The genuinely dead web is mostly DNS that no longer resolves: 1,077,715 domains — 76% of all dead — have left DNS entirely. The rest refuse or reset the connection. A 404 page is not death; a missing DNS record is.
Death is uneven by TLD. .cn (33%), .info (28%), .in and .gov (26%), and .edu (22%) rot fastest — institutional and cheap-registration domains lead, echoing Pew's finding that government and reference pages suffer the worst link rot. .com sits near the 14% line.
This is not 'link rot' or 'dead internet theory.' We measure whether the domain itself still resolves and answers — a different question from broken links inside pages (Pew, Ahrefs) or AI-generated content flooding the web.

You have probably seen the stat: 27.6% of the web is dead. It comes from a 2024 crawl of the top 10 million domains, and it gets repeated because it is striking and a little bleak. We ran that study. And when we re-scanned the same 10-million-domain list in 2026 — this time separating a domain that is genuinely gone from one that is merely refusing a bot — the real number came out at 14.2%.

The web didn't suddenly heal. The original number was counting the wrong things. A naive crawler can't tell a dead domain from a live one hiding behind Cloudflare, and it counts a server that politely returns "404 Not Found" the same as one that never answers at all. Fix the classification and roughly half of "dead" turns out to be alive — it just wasn't talking to a bot. Here is the full picture, from 9,992,781 probed domains.

What the outcome of a 10-million-domain scan actually looks like

Every domain gets one of four labels. alive means it answered (a 2xx, or even a 404/5xx — the server is up). blocked means it answered but refused our automated client (a 403, 429, or anti-bot challenge). redirect means it bounced somewhere we couldn't resolve. dead means it never answered at all — no DNS record, or nothing accepts a connection.

Alive76.6% · 7,655,028Blocked8.9% · 891,672Dead14.2% · 1,414,788Redirect0.3% · 31,293

9,992,781 of the top 10 million domains, probed as a polite bot from a datacenter IP, June 2026. Hover a segment to isolate it.

Three-quarters of the top web is alive and answering. The interesting part is the bottom 23% — the slice everyone argues about — and how you split it.

The real number: 14% dead, not 27.6%

Same list, same scale, one difference: in 2026 we refuse to call a domain dead just because our bot couldn't read it. A genuinely dead domain fails early — DNS returns nothing, or the connection is refused. A live-but-defended domain fails late, with a 403 or a challenge page, which is a completely different signal. Counting honestly moves the headline from 27.6% to 14.2%.

Naive 2024 crawl — counted as dead27.6%

DNS failure, anti-bot 403s, served 404/5xx and timeouts all lumped together

Honest 2026 classification — actually dead14.2%

No DNS, connection refused, or nothing accepts a connection

The same top-10-million list, classified two ways. The 13-point gap is anti-bot blocking and answered errors counted as death.

Where do the missing ~13 points go? Almost all of it is two things a naive crawl mislabels:

8.9% (891,672 sites) answer but block bots. A 403, a 429, or a Cloudflare "Just a moment" challenge to a datacenter IP. These are some of the most alive sites on the web — they run active defenses precisely because people want their data.
~4% serve a 404 or 5xx from a live server. A "404 Not Found" or a "503 Service Unavailable" is proof the host answered. The original crawl counted them as dead; a server that returns an error is the opposite of gone.

The remainder is a 2024 measurement artifact: that crawl resolved each domain through a single DNS resolver, and a flaky lookup falsely marked resolvable domains dead. We now cross-check across resolvers before declaring a DNS failure.

What a no-follow crawler gets wrong

The gap between 27.6% and 14.2% is largely a measurement choice: whether you follow redirects and read what the server actually says. A crawler that stops at the first response sees only 45.9% return a clean 200 and writes off the rest. Follow the redirects and read the bodies, and 71.9% are alive. Here is where every first response actually ends up:

Where each first response actually ends up (top 10M, 2026). A no-follow crawler counts the whole 3xx band as 'not 200' — but most of it resolves to a live page.

Show the flows

200 OK → Alive	4,584,611 (46.3%)
3xx redirect → Alive	2,677,304 (27%)
No response → Dead	1,413,013 (14.3%)
403 / 429 → Blocked	410,511 (4.1%)
3xx redirect → Blocked	365,368 (3.7%)
404 → Alive	236,685 (2.4%)
No response → Blocked	105,222 (1.1%)
5xx → Alive	85,728 (0.9%)
3xx redirect → Redirect	31,267 (0.3%)
3xx redirect → Dead	1,775 (0%)

The big rivers carry the point: a 301 is not a dead end — 87% of redirects resolve to a live page, and a 403 or 429 is a live site refusing a bot, not a corpse. The only response that reliably means dead is no response at all — and that single No response → Dead band is almost the entire dead web.

The genuinely dead web is mostly DNS that's gone

So what is the 14.2%? Overwhelmingly, it's domains that have left DNS entirely. Of the 1,414,788 genuinely dead domains, 1,077,715 — about 76% — no longer resolve to any IP at all. The registration lapsed, the zone was deleted, the project was abandoned. The rest refuse or reset every connection, or fail TLS to a host that is truly down. A dead domain almost never answers and errors — it simply isn't there.

This matters if you build anything that follows links or crawls a list: the failures you'll actually hit are split between "this domain is gone" (retry never helps) and "this site is blocking me" (a different request gets in). Treating them the same is the single most common way web-health numbers get inflated — and the most common way a scraper wastes a budget retrying domains that will never answer.

The famous dead

Aggregate percentages are abstract. So we sorted the genuinely-dead domains by popularity rank and went looking for names you'd recognise — and the graveyard is remarkable. The single highest-ranked dead domain in the entire top 10 million makes the point on its own.

At #568 sits fanlink.to, the music "smart-link" service artists and labels used for pre-save and streaming links. In March 2024 its parent — Eventbrite's ToneDen — lost control of the .to domain and never recovered it, instantly breaking millions of links sitting in artist bios, ads, and press releases.

Which raises the obvious question: how is a dead domain the 568th most popular on the web? Because the web never stopped knocking. Every un-updated link, embed, and bookmark keeps firing requests at an address that no longer answers — the rank is a fossil of past popularity. That is precisely why a popularity-ranked list is full of corpses at all.

Music & video

fanlink.to† 2024
Music smart-links · ToneDen / Eventbrite
The single highest-ranked dead domain in the whole top 10M (#568). In March 2024 Eventbrite lost control of the .to domain overnight, instantly breaking millions of artists' pre-save and streaming links sitting in bios, ads, and press releases. Wayback ↗
grooveshark.com† 2015
Free music streaming · ~20M users
Forced shut by the major labels' copyright suit (willful infringement, ~$700M of exposure). The entire catalogue was wiped the day the settlement landed; a co-founder died months later at 28. Wayback ↗
rdio.com† 2015
Music subscription service
Bankrupt after burning ~$2M a month. Pandora bought the technology for $75M and shut the service down the day before the sale closed. Wayback ↗
gfycat.com† 2023
GIF host for Reddit & Discord · ~220M users
Bought by Snap in 2020, then switched off as a non-core asset — one of the largest single link-rot events ever, breaking millions of embedded GIFs across the web. Wayback ↗
veoh.com† 2024
Video-sharing site
Won a landmark DMCA case that helped protect every YouTube-style site, limped on for years under Japan's FC2, and finally went dark in November 2024. Wayback ↗
metacafe.com† 2021
Top-3 video site of 2006
One of YouTube's first serious rivals — it simply went offline one day in 2021 with no announcement at all. Wayback ↗

The social web

del.icio.us† 2017
Delicious · invented social bookmarking
The site that coined web-scale tagging. Passed through five owners (Yahoo → AVOS → Science → Delicious Media → Pinboard for $35,000) before going read-only. Wayback ↗
dmoz.org† 2017
The Open Directory · a human-curated map of the web
91,000 volunteers cataloguing 3.8M sites — once a near-prerequisite for SEO, then made obsolete by Google's algorithm. Lives on as the community fork Curlie. Wayback ↗
pipes.yahoo.com† 2015
Yahoo Pipes · visual no-code data mashups
The “Zapier of 2007.” Killed in a Yahoo cost-cut; thousands of live RSS and data pipelines broke on the same day. Wayback ↗
topsy.com† 2015
The only full historical Twitter search
Indexed hundreds of billions of tweets back to 2006. Apple bought it for ~$200M and quietly switched it off two years later; the searchable archive simply vanished. Wayback ↗
aviary.com† 2018
Photo-editing SDK embedded in 7,000+ apps
Powered in-app photo editing across the mobile economy (10B edits). Adobe acquired it, folded the tech into Creative Cloud, then sunset the free SDK. Wayback ↗

The developer web

s7.addthis.com† 2023
Share buttons + tracking on 15M websites
Oracle bought it for the behavioural data, then killed it under GDPR pressure — a single shutdown darkened share widgets across millions of sites at once. Wayback ↗
programmableweb.com† 2023
The public directory of ~19,000 web APIs
The index of the “API economy” for 17 years. Salesforce / MuleSoft erased the whole thing with no archive. Wayback ↗
securityfocus.com† 2021
Home of the Bugtraq disclosure list (since 1993)
The security world's noticeboard for nearly 30 years. Symantec → Broadcom → Accenture let it freeze; the Bugtraq archive survives only at seclists.org. Wayback ↗
opensolaris.org† 2013
Sun's open-source operating system
Oracle froze it the moment it bought Sun and pulled the domain in 2013. The community kept the code alive as the illumos fork. Wayback ↗
sorbs.net† 2024
Spam blocklist covering 512M IP addresses
A DNS blocklist that mail servers queried for over two decades. Proofpoint pulled the plug in 2024; servers worldwide still query a list that no longer answers. Wayback ↗

Government & institutions

patft.uspto.gov† 2022
US patent full-text search (1790–present)
Retired for a new search tool — breaking decades of direct patent links embedded in academic papers, legal briefs, and analysis tools. Wayback ↗
petitions.whitehouse.gov† 2021
Obama's “We the People” e-petitions
A petition once topped a million signatures. The platform was quietly discontinued on Inauguration Day 2021 and never revived. Wayback ↗
weblogs.com† ~2009
Dave Winer's blog-ping server · the early blogosphere's heartbeat
Every new blog post once pinged this host; VeriSign paid $2.3M for it. It faded after 2009 — yet old WordPress installs still ping the dead address to this day. Wayback ↗
europa.eu.int† 2006
The European Union's original web address
The canonical home of EU law and institutions for over a decade. Migrated to europa.eu on Europe Day 2006, stranding a generation of links. Wayback ↗

Read those twenty obituaries back-to-back and one cause of death stands out: being acquired. Seven of the twenty were bought by a bigger company that then switched them off — Snap killed Gfycat, Apple killed Topsy, Oracle killed both AddThis and OpenSolaris, Adobe killed Aviary, Salesforce killed ProgrammableWeb, Broadcom let SecurityFocus rot. "Acqui-killed" beats bankruptcy, lawsuits, and neglect combined.

Acquired, then killed7

Strategic shutdown / cost-cut5

Neglect / abandoned domain4

Bankruptcy or lawsuit2

Migrated / retired elsewhere2

How twenty of the web's most famous dead domains actually died. Acquisition is the leading cause — more than bankruptcy, lawsuits, and neglect together.

Twenty headliners can't show the shape of the whole graveyard. So we widened the lens — pulling ~100 widely-recognised, verifiable shutdowns (from this scan's dead domains and the public record), dating each to the year its service ended and sorting them into six corners of the web. Stacked by year, two decades of the dying web look like this:

Social & communityDeveloper & infrastructureMusic & videoSearch & referenceMedia & newsCommerce & government

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Show the data

Topic	2006	2008	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022	2023	2024	2025	2026
Social & community	0	0	1	1	2	4	2	1	2	1	2	4	1	2	1	1	3	1	1	0
Developer & infrastructure	0	0	1	0	0	3	1	3	2	4	4	2	1	1	2	0	4	1	1	0
Music & video	0	0	1	0	0	1	0	0	2	3	2	1	0	3	2	0	2	2	0	0
Search & reference	0	1	1	0	0	0	2	0	1	0	1	0	1	1	1	1	0	0	0	1
Media & news	0	0	0	0	0	0	0	0	1	1	0	0	0	0	1	1	0	1	0	1
Commerce & government	1	0	0	0	0	0	0	0	1	0	1	0	1	0	1	1	0	0	0	0

When ~100 notable websites died, by category and year. A curated set of widely-recognised shutdowns — drawn from this scan's dead domains plus public records, dated to the year each service ended — so it's illustrative of the eras, not an exhaustive census. Hover a band to isolate it; 'Show the data' for the numbers.

Two things stand out. Social platforms and developer tools are the bulk of the dead web — the social graveyard (Friendster, Orkut, Bebo, Google+, Path, Yik Yak, Ello, Digg…) and the dev-tools column (Google Code, Parse, Google Wave, Gitorious, Sunrise, Mailbox…) are dead even, and together they're more than half of everything here. And the deaths cluster: a first swell in 2012–2017 as the Web 2.0 and check-in/anonymous-app generation collapsed, then a second from 2020 as pandemic-era and big-tech bets were cut (Quibi, Mixer, CNN+, Stadia, Google Play Music). Before 2009 the stream barely exists — most of the web simply wasn't old enough to have died yet.

One honest caveat, and the reason we re-checked every domain by hand: a dead domain is not always a dead thing. Some only look dead because the service rebranded or moved — money.yandex.ru became YooMoney, the old suicidepreventionlifeline.org host gave way to 988lifeline.org, the EU's europa.eu.int simply became europa.eu. We re-probed every domain above against live DNS in June 2026 and dropped the false positives (nrel.gov and angelfire.com still resolve fine). What remains genuinely no longer answers.

Death is uneven: which TLDs rot fastest

Dead rate is not evenly spread. Split the 10 million by top-level domain and a clear gradient appears — cheap-registration and institutional TLDs rot far faster than the .com baseline.

.cn33.0% dead42,827 domains

.info28.4% dead61,440 domains

.in25.9% dead63,614 domains

.gov25.9% dead43,435 domains

.edu22.0% dead128,968 domains

.us22.0% dead41,645 domains

.br20.9% dead99,202 domains

.net19.9% dead347,414 domains

Dead rate among common TLDs (≥20k domains in the top 10M). Hover a bar to isolate it.

The standouts tell two stories. .cn, .info, and .in lead because they are cheap and heavily registered for short-lived or speculative sites that lapse quickly. But .gov (26%) and .edu (22%) near the top is the more striking finding: institutional domains rot badly because content is reorganized, departments are dissolved, and old project sites are simply switched off — exactly the digital decay Pew Research documented in 2024, where government and reference pages had some of the worst link rot. The web's most authoritative corners are some of its least permanent.

The geography of the dead web

Group the country-code domains by country and the decay draws a map. The emerging-market registration booms of the last decade left the biggest graveyards — China's .cn leads at a third dead — while German-speaking Europe runs the most durable web on earth.

7%33% no data

Dead-domain rate by country-code TLD, 2026. Redder is deader; hover a country for its rate.

Show the data

China (.cn)	33%
India (.in)	25.9%
United States (.us)	22%
Brazil (.br)	20.9%
Spain (.es)	16.6%
Japan (.jp)	15.6%
United Kingdom (.uk)	15.3%
Russia (.ru)	14.9%
France (.fr)	14.5%
Canada (.ca)	14.1%
Italy (.it)	13.5%
Poland (.pl)	13.2%
Sweden (.se)	11.6%
Switzerland (.ch)	9.8%
Netherlands (.nl)	9.7%
Austria (.at)	8.6%
Germany (.de)	7.6%
Czechia (.cz)	7.3%

China (.cn)33.0%42,827

India (.in)25.9%63,614

United States (.us)22.0%41,645

Brazil (.br)20.9%99,202

Spain (.es)16.6%67,984

Japan (.jp)15.6%253,187

UK (.uk)15.3%244,776

Russia (.ru)14.9%301,639

France (.fr)14.5%135,021

Italy (.it)13.5%107,638

Netherlands (.nl)9.7%86,383

Germany (.de)7.6%348,251

Dead rate among major country-code TLDs (≥40,000 domains each). China's .cn leads; Germany's .de is the most durable.

A domain in China's .cn space is more than four times as likely to be dead as one in Germany's .de. Fast, cheap, speculative registration — and, for .cn, a churn-heavy market behind the Great Firewall — leaves more abandoned domains behind; the mature, costlier-to-register German-speaking TLDs barely rot at all.

What the top 10 million is even made of

For context, here's the shape of the corpus itself. .com is not just first — it is nearly half of the entire top 10 million, larger than every country-code and new-gTLD combined.

.com44.1%4,403,688

.org8.8%878,764

.io3.6%363,234

.de3.5%348,251

.net3.5%347,414

.ru3.0%301,639

.jp2.5%253,187

.uk2.5%244,776

.fr1.4%135,021

.edu1.3%128,968

The 10 largest TLDs in the top 10 million by domain count. .com alone is 44%.

Two details worth flagging: .io (3.6%) has quietly become the third-largest TLD on the popular web — the developer/startup default — and the AI-era .ai (0.30%, ~30,000 domains) has already overtaken established country domains like .fi, .no, and .tw in the top 10 million.

The dead web is the long tail nobody visits

Death is not spread evenly through the ranking. Split the 10 million by popularity and the dead rate climbs more than 20× — from 0.8% in the top 1,000 to 16.1% past rank 5 million. blocked runs the other way: the most-trafficked sites wall bots hardest, then the defenses thin out down the tail.

Dead0.8% → 16.1%Blocked12.9% → 8.5%

Top 1K

0.8%

12.9%

1K–10K

1.2%

15.1%

10K–100K

2.3%

10.8%

100K–1M

8.7%

9.3%

1M–5M

13.3%

9.3%

5M–10M

16.1%

8.5%

Dead and blocked rate by popularity-rank band (top 10M, 2026). Dead climbs 20× into the long tail; blocked peaks at the popular head.

That gradient reframes the headline. The 14% is real by domain count — but those dead domains are almost all in the part of the web nobody visits. 99.8% of dead domains sit below rank 100,000, and the popular top-100K — where the overwhelming majority of web traffic lives — is only 2.2% dead. Weighted by attention instead of raw count, the dead web nearly disappears:

By domain count14.2%

share of the top 10M domains that are dead

Weighted by traffic~3%

the popular top-100K, where most web traffic is, is only 2.2% dead

The dead web concentrates in the unvisited tail — 99.8% of dead domains sit below rank 100K. Traffic weighting is estimated from the rank distribution.

"Dead web" is not "link rot" — and definitely not "dead internet theory"

Three different things get blurred together. Keeping them separate is the whole point:

This study (dead domains): does the domain still resolve and answer? We find 14.2% of the top 10M do not.
Link rot (Pew, Ahrefs): are the links inside living pages still good? Pew Research found 25% of pages from 2013–2023 are gone and 38% of 2013 pages have vanished; Ahrefs found 66.5% of tracked links have rotted. Those measure decay within the living web — a complement to this, not the same number.
Dead internet theory: the claim that AI-generated content and bots have displaced human activity online. That is about what's on the living web, not whether domains are reachable. It is a separate conversation, and conflating it with link rot is how bad statistics spread.

If you only remember one distinction: link rot is about the pages that are still up; the dead web is about the domains that aren't.

What this means if you're building a scraper or a data pipeline

The practical takeaway is the 8.9% blocked slice, because it is the part most likely to break your project. When a request fails, the reason dictates the fix, and they are nothing alike:

A dead domain (no DNS, refused) will never answer. Retrying, rotating proxies, or switching to a browser does nothing. Drop it and move on.
A blocked domain is alive and reachable — it just refused your client. A matched browser TLS/JA3 fingerprint or a residential IP gets in where a datacenter bot gets a 403. This is a transport problem, not a dead site.

This isn't theoretical. Probing every domain a second time with a real Chrome TLS/JA3 fingerprint recovered ~72,000 of the ~890,000 sites the polite bot was blocked from — enough to pull the blocked rate from 8.9% down to 8.2%. Every one of those is a live site reachable with the right client, not a dead end.

Naive crawlers can't tell these apart, so they either give up on reachable sites or burn a budget retrying gone ones. The cost-efficient pattern is to escalate only as far as a site forces you to — which is exactly how Crawlora's anti-bot unblocker works, and why it bills on success rather than per attempt. If you want to know which bucket a specific URL is in before you build, the free anti-bot checker tells you in about 30 seconds, and our companion Anti-Bot Adoption Index measures how much of the live web runs a wall at all.

Two more things the scan turned up

The web is a maze of redirects. Only 69% of domains serve their final page directly; 31% bounce through at least one redirect — and a stubborn sliver loops until our 10-hop cap. That is exactly why a crawler that doesn't follow redirects sees a web that looks half-broken.

Direct (0 hops)69.4%

1 redirect23.9%

2 redirects5.2%

3 redirects1.1%

4+ redirects0.4%

Redirects before the final page (top 10M, 2026). 31% of the web is at least one hop deep.

The dead web is stuck on HTTP. A decade into the HTTPS transition, the living web is ~78% encrypted — but dead and bot-blocked domains are barely half, abandoned before they ever got a certificate.

Alive78.4%

Redirect78.6%

Dead52.8%

Blocked47.5%

Share served over HTTPS by outcome (2026). The living web is ~78% encrypted; the dead and bot-walled web is barely half.

How we measured it

No magic — a deliberately simple, reproducible probe, run at 10-million scale.

The list. The full top 10 million domains (a DomCop/Tranco-style popularity ranking). We reached 9,992,781 of them — 99.95% coverage.

The probe. Each domain is fetched HTTPS-first from a datacenter IP, following redirects, with a short timeout and a cross-resolver DNS retry before any "DNS failed" verdict. We never submit a form, solve a CAPTCHA, log in, or fetch anything behind a wall. Every domain is probed twice — once as an honest bot, and once as a browser-like client with a real Chrome TLS/JA3 fingerprint — so we can separate "nobody's home" from "the bot wasn't let in."

The classification. A final 2xx, or a served 404/5xx (the host answered), is alive. A 403/429 or anti-bot challenge is blocked. A 3xx we can't resolve is redirect. Only no DNS, a refused/reset connection, or nothing accepting a connection is dead. That single rule — a server that answers anything is up — is the entire difference between 14.2% and 27.6%.

Limits. This is homepage-level reachability from a datacenter vantage, so it is a lower bound: a domain that blocks a datacenter bot may open for a residential browser, and a deep page can be deader (or more defended) than the homepage. Snapshot: June 2026. The full per-domain dataset — every domain, every arm — is open, and the live, searchable version is the Dead-Web Index.

Reach the live web, not the dead one

14% of the top web is gone — but 9% is alive and just blocking your bot. Crawlora escalates from a plain request to a real browser fingerprint only as far as a site demands, and bills on success. Stop retrying dead domains and stop getting 403s from live ones.

Explore the Dead-Web Index Check any URL free

Frequently asked questions

How many of the world's top websites are dead?

14.2% of the top 10 million domains are genuinely dead — about 1.41 million sites that no longer resolve in DNS or refuse every connection. That is far below the often-quoted 27.6%, which counted anti-bot blocks and answered errors as death.

What's the difference between a dead website and a blocked one?

A dead site never answers — no DNS record, or nothing accepts a TCP connection. A blocked site is alive and answering, it just refuses an automated client (a 403, 429, or anti-bot challenge). 8.9% of the top web — 891,672 sites — is blocked, not dead, a distinction naive crawlers miss.

Is the dead web the same as the dead internet theory?

No. The dead internet theory is a claim that AI-generated content and bots have replaced human activity on the living web. This study measures the opposite, concrete thing: how many domains have gone completely dark and unreachable — DNS gone, connection refused, server gone.

Why is this lower than the 27.6% dead-web figure?

Earlier top-10M crawls counted three non-dead things as dead: anti-bot 403/429 blocks, 404/5xx pages served by a live server, and domains a single flaky DNS resolver failed to look up. Classifying honestly — dead means genuinely unreachable — brings the real figure to 14.2%.

Which TLD has the most dead domains?

.cn has the highest death rate among common TLDs at 33%. Institutional TLDs like .gov (26%) and .edu (22%) also rank high — matching Pew Research's finding that government and reference pages suffer the worst link rot.

Why does a site look dead to a scraper but load fine in my browser?

Anti-bot systems serve a 403 or a challenge to a datacenter IP while letting a real browser through. A matched browser TLS/JA3 fingerprint reaches the site where a naive bot is blocked — which is why this study probes every domain twice, as a polite bot and as a browser-like client.

Tony WangJune 14, 202622 min read

14% of the Web Is Actually Dead

Only 14% of the top 10 million domains are genuinely dead — not the usual 27.6%. Nearly half of the 'dead' web is just blocking bots or serving errors.

Data Study Anti-Bot Web Scraping API

Key takeaways

We probed 9,992,781 of the top 10 million domains in June 2026. 14.2% are genuinely dead — no DNS, no connection, nothing answers — not the 27.6% a naive crawl of the same list reports.
Nearly half of 'dead' was never dead. 8.9% of the top web (891,672 sites) answers but blocks an automated client (403/429/anti-bot), and another ~4% serves a 404 or 5xx from a live server. Naive crawls count all of that as death.
The genuinely dead web is mostly DNS that no longer resolves: 1,077,715 domains — 76% of all dead — have left DNS entirely. The rest refuse or reset the connection. A 404 page is not death; a missing DNS record is.
Death is uneven by TLD. .cn (33%), .info (28%), .in and .gov (26%), and .edu (22%) rot fastest — institutional and cheap-registration domains lead, echoing Pew's finding that government and reference pages suffer the worst link rot. .com sits near the 14% line.
This is not 'link rot' or 'dead internet theory.' We measure whether the domain itself still resolves and answers — a different question from broken links inside pages (Pew, Ahrefs) or AI-generated content flooding the web.

What the outcome of a 10-million-domain scan actually looks like

Alive76.6% · 7,655,028Blocked8.9% · 891,672Dead14.2% · 1,414,788Redirect0.3% · 31,293

9,992,781 of the top 10 million domains, probed as a polite bot from a datacenter IP, June 2026. Hover a segment to isolate it.

Three-quarters of the top web is alive and answering. The interesting part is the bottom 23% — the slice everyone argues about — and how you split it.

The real number: 14% dead, not 27.6%

Naive 2024 crawl — counted as dead27.6%

DNS failure, anti-bot 403s, served 404/5xx and timeouts all lumped together

Honest 2026 classification — actually dead14.2%

No DNS, connection refused, or nothing accepts a connection

The same top-10-million list, classified two ways. The 13-point gap is anti-bot blocking and answered errors counted as death.

Where do the missing ~13 points go? Almost all of it is two things a naive crawl mislabels:

8.9% (891,672 sites) answer but block bots. A 403, a 429, or a Cloudflare "Just a moment" challenge to a datacenter IP. These are some of the most alive sites on the web — they run active defenses precisely because people want their data.
~4% serve a 404 or 5xx from a live server. A "404 Not Found" or a "503 Service Unavailable" is proof the host answered. The original crawl counted them as dead; a server that returns an error is the opposite of gone.

What a no-follow crawler gets wrong

Where each first response actually ends up (top 10M, 2026). A no-follow crawler counts the whole 3xx band as 'not 200' — but most of it resolves to a live page.

Show the flows

200 OK → Alive	4,584,611 (46.3%)
3xx redirect → Alive	2,677,304 (27%)
No response → Dead	1,413,013 (14.3%)
403 / 429 → Blocked	410,511 (4.1%)
3xx redirect → Blocked	365,368 (3.7%)
404 → Alive	236,685 (2.4%)
No response → Blocked	105,222 (1.1%)
5xx → Alive	85,728 (0.9%)
3xx redirect → Redirect	31,267 (0.3%)
3xx redirect → Dead	1,775 (0%)

The genuinely dead web is mostly DNS that's gone

The famous dead

Music & video

fanlink.to† 2024
Music smart-links · ToneDen / Eventbrite
The single highest-ranked dead domain in the whole top 10M (#568). In March 2024 Eventbrite lost control of the .to domain overnight, instantly breaking millions of artists' pre-save and streaming links sitting in bios, ads, and press releases. Wayback ↗
grooveshark.com† 2015
Free music streaming · ~20M users
Forced shut by the major labels' copyright suit (willful infringement, ~$700M of exposure). The entire catalogue was wiped the day the settlement landed; a co-founder died months later at 28. Wayback ↗
rdio.com† 2015
Music subscription service
Bankrupt after burning ~$2M a month. Pandora bought the technology for $75M and shut the service down the day before the sale closed. Wayback ↗
gfycat.com† 2023
GIF host for Reddit & Discord · ~220M users
Bought by Snap in 2020, then switched off as a non-core asset — one of the largest single link-rot events ever, breaking millions of embedded GIFs across the web. Wayback ↗
veoh.com† 2024
Video-sharing site
Won a landmark DMCA case that helped protect every YouTube-style site, limped on for years under Japan's FC2, and finally went dark in November 2024. Wayback ↗
metacafe.com† 2021
Top-3 video site of 2006
One of YouTube's first serious rivals — it simply went offline one day in 2021 with no announcement at all. Wayback ↗

The social web

del.icio.us† 2017
Delicious · invented social bookmarking
The site that coined web-scale tagging. Passed through five owners (Yahoo → AVOS → Science → Delicious Media → Pinboard for $35,000) before going read-only. Wayback ↗
dmoz.org† 2017
The Open Directory · a human-curated map of the web
91,000 volunteers cataloguing 3.8M sites — once a near-prerequisite for SEO, then made obsolete by Google's algorithm. Lives on as the community fork Curlie. Wayback ↗
pipes.yahoo.com† 2015
Yahoo Pipes · visual no-code data mashups
The “Zapier of 2007.” Killed in a Yahoo cost-cut; thousands of live RSS and data pipelines broke on the same day. Wayback ↗
topsy.com† 2015
The only full historical Twitter search
Indexed hundreds of billions of tweets back to 2006. Apple bought it for ~$200M and quietly switched it off two years later; the searchable archive simply vanished. Wayback ↗
aviary.com† 2018
Photo-editing SDK embedded in 7,000+ apps
Powered in-app photo editing across the mobile economy (10B edits). Adobe acquired it, folded the tech into Creative Cloud, then sunset the free SDK. Wayback ↗

The developer web

s7.addthis.com† 2023
Share buttons + tracking on 15M websites
Oracle bought it for the behavioural data, then killed it under GDPR pressure — a single shutdown darkened share widgets across millions of sites at once. Wayback ↗
programmableweb.com† 2023
The public directory of ~19,000 web APIs
The index of the “API economy” for 17 years. Salesforce / MuleSoft erased the whole thing with no archive. Wayback ↗
securityfocus.com† 2021
Home of the Bugtraq disclosure list (since 1993)
The security world's noticeboard for nearly 30 years. Symantec → Broadcom → Accenture let it freeze; the Bugtraq archive survives only at seclists.org. Wayback ↗
opensolaris.org† 2013
Sun's open-source operating system
Oracle froze it the moment it bought Sun and pulled the domain in 2013. The community kept the code alive as the illumos fork. Wayback ↗
sorbs.net† 2024
Spam blocklist covering 512M IP addresses
A DNS blocklist that mail servers queried for over two decades. Proofpoint pulled the plug in 2024; servers worldwide still query a list that no longer answers. Wayback ↗

Government & institutions

patft.uspto.gov† 2022
US patent full-text search (1790–present)
Retired for a new search tool — breaking decades of direct patent links embedded in academic papers, legal briefs, and analysis tools. Wayback ↗
petitions.whitehouse.gov† 2021
Obama's “We the People” e-petitions
A petition once topped a million signatures. The platform was quietly discontinued on Inauguration Day 2021 and never revived. Wayback ↗
weblogs.com† ~2009
Dave Winer's blog-ping server · the early blogosphere's heartbeat
Every new blog post once pinged this host; VeriSign paid $2.3M for it. It faded after 2009 — yet old WordPress installs still ping the dead address to this day. Wayback ↗
europa.eu.int† 2006
The European Union's original web address
The canonical home of EU law and institutions for over a decade. Migrated to europa.eu on Europe Day 2006, stranding a generation of links. Wayback ↗

Acquired, then killed7

Strategic shutdown / cost-cut5

Neglect / abandoned domain4

Bankruptcy or lawsuit2

Migrated / retired elsewhere2

How twenty of the web's most famous dead domains actually died. Acquisition is the leading cause — more than bankruptcy, lawsuits, and neglect together.

Social & communityDeveloper & infrastructureMusic & videoSearch & referenceMedia & newsCommerce & government

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Show the data

Topic	2006	2008	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022	2023	2024	2025	2026
Social & community	0	0	1	1	2	4	2	1	2	1	2	4	1	2	1	1	3	1	1	0
Developer & infrastructure	0	0	1	0	0	3	1	3	2	4	4	2	1	1	2	0	4	1	1	0
Music & video	0	0	1	0	0	1	0	0	2	3	2	1	0	3	2	0	2	2	0	0
Search & reference	0	1	1	0	0	0	2	0	1	0	1	0	1	1	1	1	0	0	0	1
Media & news	0	0	0	0	0	0	0	0	1	1	0	0	0	0	1	1	0	1	0	1
Commerce & government	1	0	0	0	0	0	0	0	1	0	1	0	1	0	1	1	0	0	0	0

Death is uneven: which TLDs rot fastest

Dead rate is not evenly spread. Split the 10 million by top-level domain and a clear gradient appears — cheap-registration and institutional TLDs rot far faster than the .com baseline.

.cn33.0% dead42,827 domains

.info28.4% dead61,440 domains

.in25.9% dead63,614 domains

.gov25.9% dead43,435 domains

.edu22.0% dead128,968 domains

.us22.0% dead41,645 domains

.br20.9% dead99,202 domains

.net19.9% dead347,414 domains

Dead rate among common TLDs (≥20k domains in the top 10M). Hover a bar to isolate it.

The geography of the dead web

7%33% no data

Dead-domain rate by country-code TLD, 2026. Redder is deader; hover a country for its rate.

Show the data

China (.cn)	33%
India (.in)	25.9%
United States (.us)	22%
Brazil (.br)	20.9%
Spain (.es)	16.6%
Japan (.jp)	15.6%
United Kingdom (.uk)	15.3%
Russia (.ru)	14.9%
France (.fr)	14.5%
Canada (.ca)	14.1%
Italy (.it)	13.5%
Poland (.pl)	13.2%
Sweden (.se)	11.6%
Switzerland (.ch)	9.8%
Netherlands (.nl)	9.7%
Austria (.at)	8.6%
Germany (.de)	7.6%
Czechia (.cz)	7.3%

China (.cn)33.0%42,827

India (.in)25.9%63,614

United States (.us)22.0%41,645

Brazil (.br)20.9%99,202

Spain (.es)16.6%67,984

Japan (.jp)15.6%253,187

UK (.uk)15.3%244,776

Russia (.ru)14.9%301,639

France (.fr)14.5%135,021

Italy (.it)13.5%107,638

Netherlands (.nl)9.7%86,383

Germany (.de)7.6%348,251

Dead rate among major country-code TLDs (≥40,000 domains each). China's .cn leads; Germany's .de is the most durable.

What the top 10 million is even made of

For context, here's the shape of the corpus itself. .com is not just first — it is nearly half of the entire top 10 million, larger than every country-code and new-gTLD combined.

.com44.1%4,403,688

.org8.8%878,764

.io3.6%363,234

.de3.5%348,251

.net3.5%347,414

.ru3.0%301,639

.jp2.5%253,187

.uk2.5%244,776

.fr1.4%135,021

.edu1.3%128,968

The 10 largest TLDs in the top 10 million by domain count. .com alone is 44%.

The dead web is the long tail nobody visits

Dead0.8% → 16.1%Blocked12.9% → 8.5%

Top 1K

0.8%

12.9%

1K–10K

1.2%

15.1%

10K–100K

2.3%

10.8%

100K–1M

8.7%

9.3%

1M–5M

13.3%

9.3%

5M–10M

16.1%

8.5%

Dead and blocked rate by popularity-rank band (top 10M, 2026). Dead climbs 20× into the long tail; blocked peaks at the popular head.

By domain count14.2%

share of the top 10M domains that are dead

Weighted by traffic~3%

the popular top-100K, where most web traffic is, is only 2.2% dead

The dead web concentrates in the unvisited tail — 99.8% of dead domains sit below rank 100K. Traffic weighting is estimated from the rank distribution.

"Dead web" is not "link rot" — and definitely not "dead internet theory"

Three different things get blurred together. Keeping them separate is the whole point:

This study (dead domains): does the domain still resolve and answer? We find 14.2% of the top 10M do not.
Link rot (Pew, Ahrefs): are the links inside living pages still good? Pew Research found 25% of pages from 2013–2023 are gone and 38% of 2013 pages have vanished; Ahrefs found 66.5% of tracked links have rotted. Those measure decay within the living web — a complement to this, not the same number.
Dead internet theory: the claim that AI-generated content and bots have displaced human activity online. That is about what's on the living web, not whether domains are reachable. It is a separate conversation, and conflating it with link rot is how bad statistics spread.

If you only remember one distinction: link rot is about the pages that are still up; the dead web is about the domains that aren't.

What this means if you're building a scraper or a data pipeline

The practical takeaway is the 8.9% blocked slice, because it is the part most likely to break your project. When a request fails, the reason dictates the fix, and they are nothing alike:

A dead domain (no DNS, refused) will never answer. Retrying, rotating proxies, or switching to a browser does nothing. Drop it and move on.
A blocked domain is alive and reachable — it just refused your client. A matched browser TLS/JA3 fingerprint or a residential IP gets in where a datacenter bot gets a 403. This is a transport problem, not a dead site.

Two more things the scan turned up

Direct (0 hops)69.4%

1 redirect23.9%

2 redirects5.2%

3 redirects1.1%

4+ redirects0.4%

Redirects before the final page (top 10M, 2026). 31% of the web is at least one hop deep.

Alive78.4%

Redirect78.6%

Dead52.8%

Blocked47.5%

Share served over HTTPS by outcome (2026). The living web is ~78% encrypted; the dead and bot-walled web is barely half.

How we measured it

No magic — a deliberately simple, reproducible probe, run at 10-million scale.

The list. The full top 10 million domains (a DomCop/Tranco-style popularity ranking). We reached 9,992,781 of them — 99.95% coverage.

Reach the live web, not the dead one

Explore the Dead-Web Index Check any URL free

Frequently asked questions

How many of the world's top websites are dead?

What's the difference between a dead website and a blocked one?

Is the dead web the same as the dead internet theory?

Why is this lower than the 27.6% dead-web figure?

Which TLD has the most dead domains?

Why does a site look dead to a scraper but load fine in my browser?

What the outcome of a 10-million-domain scan actually looks like

The real number: 14% dead, not 27.6%

What a no-follow crawler gets wrong

The genuinely dead web is mostly DNS that's gone

The famous dead

Music & video

The social web

The developer web

Government & institutions

Death is uneven: which TLDs rot fastest

The geography of the dead web

What the top 10 million is even made of

The dead web is the long tail nobody visits

"Dead web" is not "link rot" — and definitely not "dead internet theory"

What this means if you're building a scraper or a data pipeline

Two more things the scan turned up

How we measured it

Reach the live web, not the dead one

Frequently asked questions

Cloudflare Will Crawl the Web for You. It's Locked Out of 29% of Its Own Customers.

The State of Web Scraping & Anti-Bot 2026

We Fingerprinted 978K Websites: Cloudflare, WordPress, and the CAPTCHA Cross-Check

We Followed 173 Million Domains for 8 Years. ~40M Died.

How Much of the Web Runs Anti-Bot? We Scanned the Top 1,000,000 Sites

World Cup Final Tickets Hit $2.3M — The Anti-Bot Data Behind Ticketing

What the outcome of a 10-million-domain scan actually looks like

The real number: 14% dead, not 27.6%

What a no-follow crawler gets wrong

The genuinely dead web is mostly DNS that's gone

The famous dead

Music & video

The social web

The developer web

Government & institutions

Death is uneven: which TLDs rot fastest

The geography of the dead web

What the top 10 million is even made of

The dead web is the long tail nobody visits

"Dead web" is not "link rot" — and definitely not "dead internet theory"

What this means if you're building a scraper or a data pipeline

Two more things the scan turned up

How we measured it

Reach the live web, not the dead one

Frequently asked questions

Cloudflare Will Crawl the Web for You. It's Locked Out of 29% of Its Own Customers.

The State of Web Scraping & Anti-Bot 2026

We Fingerprinted 978K Websites: Cloudflare, WordPress, and the CAPTCHA Cross-Check

We Followed 173 Million Domains for 8 Years. ~40M Died.

How Much of the Web Runs Anti-Bot? We Scanned the Top 1,000,000 Sites

World Cup Final Tickets Hit $2.3M — The Anti-Bot Data Behind Ticketing