Crawlora
ProductPlatformsUse CasesDocsPricingCompareContact
Sign inTry Playground Console
Crawlora

Structured public web data APIs for search, maps, geocoding, streaming, travel, real estate, marketplaces, apps, social, audio, crypto, finance, and AI workflows with managed execution and credit-based usage.

Product

Web Scraping APIFeaturesPlatformsTravel APIsReal Estate APIsPricingReferral Program

Platforms

Google SearchGoogle MapsGoogle TrendsBing SearchAmazonLinkedInApple PodcastsZillowTripAdvisorShopifyAll platforms

Developers

DocsGetting StartedAPI ExamplesPlaygroundSDKsGitHub

Use cases

SERP MonitoringSERP Rank Checker APIGoogle Maps LeadsProperty Market IntelligenceAmazon Product MonitoringCrypto Market ResearchAI Agent Web DataAll use cases

Resources

Free Web ScraperAnti-Bot CheckerKeyword ResearchBlogChangelogAll free tools

Legal

ContactTermsPrivacy
Product
Web Scraping APIFeaturesPlatformsTravel APIsReal Estate APIsPricingReferral Program
Platforms
Google SearchGoogle MapsGoogle TrendsBing SearchAmazonLinkedInApple PodcastsZillowTripAdvisorShopifyAll platforms
Developers
DocsGetting StartedAPI ExamplesPlaygroundSDKsGitHub
Use cases
SERP MonitoringSERP Rank Checker APIGoogle Maps LeadsProperty Market IntelligenceAmazon Product MonitoringCrypto Market ResearchAI Agent Web DataAll use cases
Resources
Free Web ScraperAnti-Bot CheckerKeyword ResearchBlogChangelogAll free tools
Legal
ContactTermsPrivacy
© 2026 Crawlora. All rights reserved.·Built by Tony Wang
System statusCrawlora API status
  1. Home
  2. /Anti-Bot Adoption Index
  3. /Methodology

How we measured it.

The Anti-Bot Adoption Index is a passive, reproducible fingerprint of 1005top sites. We publish the method in full so the numbers are checkable — and so you know exactly what a “protected” result does and doesn’t mean.

The sample

Tranco-seeded, categorised.

The 1005 domains are seeded from the Tranco research list 6WNKX (2026-06-11) — an aggregated, citable top-sites ranking with a permanent ID, so the seed is reproducible rather than hand-picked.

We walk the ranking from the top and keep real, public, content-bearing sites, bucketed into 28 categories. Pure CDN/infrastructure, ad/tracking, auth-only and adult domains are skipped. On the run date, 960 of 1005 were reachable; the rest are excluded from percentages.

The probe

One passive request.

Each site’s homepage is fetched once with a real Chrome user-agent, following redirects, from a datacenter IP— the honest “what a basic cloud scraper sees” vantage. We capture the response headers, the Set-Cookie names, and a capped slice of the body.

We never run a CAPTCHA, submit a form, or attempt to access anything. It is a single GET of a public homepage, the same signal set as Crawlora’s anti-bot checker.

The signatures

What names a vendor.

The vendor is identified from public, documented fingerprints — response header names (e.g. cf-ray, x-datadome), Set-Cookie names (_abck, datadome, _px), and, only on a challenge-shaped response, body markers. Header/cookie matches are high-confidence; body markers are medium.

CAPTCHA widgets are typed and version-classified from script sources and markup — reCAPTCHA v2/v3/Enterprise, hCaptcha, Cloudflare Turnstile, Arkose FunCaptcha, GeeTest v3/v4, AWS WAF and others. Most sites only show a CAPTCHA on login/checkout, so homepage CAPTCHA counts are a floor.

Proprietary signed-payload VMs (TikTok’s webmssdk/X-Bogus, Kasada, F5/Shape) are flagged as a distinct “closed VM” class.

Vendors detected in this run

Cloudflare 328Akamai Bot Manager 126Google reCAPTCHA 40DataDome 27PerimeterX (HUMAN) 16Imperva (Incapsula) 15AWS WAF 7AWS WAF CAPTCHA 7GeeTest 5Arkose Labs (FunCaptcha) 5Cloudflare Turnstile 4Citrix NetScaler 3Kasada 2Alibaba slider 1TikTok (proprietary VM) 1Sucuri 1

CAPTCHA types surfaced

reCAPTCHA v2 20reCAPTCHA Enterprise 11reCAPTCHA v3 9AWS WAF CAPTCHA 7GeeTest v3 5Arkose FunCaptcha 5Cloudflare Turnstile 4Alibaba slider 1
The difficulty score

A heuristic, not a live test.

We map the strongest detected protection to the typical tier of tooling generally required to reliably access public pages — bumped one tier when we actually saw a challenge. It's derived from headers/HTML, not a live multi-transport measurement, so read it as directional.

T1

Plain HTTP clientband: Easy

No managed anti-bot detected — a plain HTTP request reaches it.

T2

Browser-impersonation HTTPband: Medium

Wants a matched TLS/JA3-JA4 fingerprint, correct HTTP/2 frame order, and realistic headers. Akamai and open Cloudflare paths live here.

T3

Headless browser (JS)band: Hard

Needs a real browser to execute the vendor's JavaScript challenge — Cloudflare managed challenge, Imperva, AWS WAF challenge, most CAPTCHA gates.

T4

Stealth browser + residential IP + behaviorband: Very hard

Weighs behavior and IP reputation on top of JS — DataDome and PerimeterX/HUMAN.

T5

Closed signed-payload VMband: Closed VM

Signs every request with a proprietary in-browser bytecode VM — TikTok (webmssdk), Kasada, F5/Shape. Generic transport tooling can't mint valid tokens.

A CAPTCHA gate lifts an otherwise-open site to at least T3; a detected closed-VM defense sets T5. The 1–10 difficulty score nudges up for a CAPTCHA or a hard block. Charged-for, real-engine difficulty (running the actual transport fleet per URL) is what Crawlora’s anti-bot checker does for a single URL.

When a request doesn’t pass, we read why. From the status code, response headers and cookies we separate a rate limit (429 / Retry-After), an IP ban (Cloudflare 1006–1008), a bot challenge (cf-mitigated), a CAPTCHA, a geo-block (451/1009), and a login wall (401 / a redirect to /login). The fix differs each time — rotate IPs for a rate limit, a real browser for a challenge — so an auth wall is its own first-class class, not a difficulty. Per-site pages add an advisory deep-page test plan, because homepage protection is a poor proxy for the profile, search and checkout pages you actually scrape.

The fine print

What this does not tell you.

This check can be inaccurate or out of date

Anti-bot is deliberately dynamic, so a snapshot like this can be wrong in both directions — and the vendors update their models constantly, often daily.

  • Homepages are the open front door. Login, checkout, search and deep listings are usually protected more heavily than the homepage we tested.
  • What you see depends on where you connect from. The same page can return content from a residential IP but a challenge from a datacenter IP. This study ran from a datacenter.
  • Challenges are conditional. Cloudflare managed challenge, DataDome and PerimeterX only trip on suspicious signals, so a protection can be present but invisible to a passive scan.
  • Vendors ship updates and per-customer configs constantly.Akamai added JA4 fingerprinting in 2026; a signature that’s correct today can be renamed or reconfigured tomorrow.
  • “Not detected” does not mean “easy.”It can mean a protection we didn’t recognise, a challenge that hadn’t triggered, or behavioural/TLS defenses that don’t show up in passive HTML.
Back to the index →Download the dataset (CSV) →Read the analysis →

Snapshot 2026-06-12. Licensed CC BY 4.0 — cite as “Crawlora Anti-Bot Adoption Index” with a link.