Tony WangMarch 13, 2026Updated June 2, 20264 min read

Is Web Scraping Legal in 2026? A Practical Guide

A practical 2026 guide to web scraping and the law: public vs private data, hiQ/CFAA, terms of service, copyright, and GDPR/CCPA, with a do/don't checklist.

Legal Web Scraping API Guide

Web scraping is generally legal when you collect publicly available data and respect how you access it — but "legal" depends on the data, the method, and what you do with the results, not on scraping as an activity. This guide breaks down the rules that actually matter in 2026 so you can scope a project on the right side of the line.

The three questions that decide it

Whether a specific project is defensible usually comes down to three things:

Question	Lower risk	Higher risk
What data?	Public facts — prices, addresses, ratings	Personal data; copyrighted/creative content
How do you access it?	Public pages, respectful rate	Bypassing logins, paywalls, CAPTCHAs, explicit blocks
What do you do with it?	Internal research, analytics	Republishing content; cloning a database

If you are on the left column for all three, you are usually in defensible territory. Each step toward the right column adds risk.

Public vs. private data

The cleanest rule of thumb: collect public, non-personal, factual data. In the US, hiQ Labs v. LinkedIn held that accessing data available to the general public without authentication likely does not constitute "unauthorized access" under the Computer Fraud and Abuse Act (CFAA) — a reading reinforced by the Supreme Court's narrowing of the CFAA in Van Buren v. United States (2021). Facts themselves — a price, an address, a star rating — are not copyrightable.

Two important caveats: the CFAA question is separate from contract law (the hiQ litigation ultimately turned on LinkedIn's terms, see below), and risk rises sharply when data sits behind a login, includes personal information, or is creative/copyrighted content you intend to reuse.

Terms of service and robots.txt

A site's Terms of Service can prohibit automated access even when the data is public. Violating ToS is generally a contract matter (account bans, cease-and-desist letters), not a criminal one — but it is still a real risk, especially for logged-in scraping, and it is the lever many platforms now pull. robots.txt is a crawling convention, not a law, but ignoring explicit disallows weakens your position. Respect rate limits and don't degrade the service — see proxies for web scraping, explained for how responsible pacing works.

Personal data: GDPR and CCPA

If you collect personal data about people in the EU or California, privacy laws such as GDPR and CCPA/CPRA apply regardless of whether the data was public. That means a lawful basis, purpose limits, and data-subject rights. The safest path for most products is to avoid personal data and stick to business- and product-level facts. The most sensitive sources deserve extra care — see how to scrape LinkedIn in 2026 (legally).

Copyright, databases, and the 2025–26 signal

Facts are free to collect; creative expression and wholesale copies are not. Reproducing articles, photos, or large verbatim slices of a site — or rebuilding its database — moves you from "collecting facts" into copyright and misappropriation territory. This is the live edge of the law: in December 2025, Google sued SerpApi over scraping and reselling Search results, citing the DMCA and copyright (context in SerpApi alternatives). The practical lesson is not "scraping is illegal" — it is that circumventing technical protections and reselling protected content is where the real exposure sits.

A practical do / don't checklist

Collect public, factual, non-personal data.
Respect rate limits, robots.txt, and reasonable load.
Review the source's terms and your own compliance obligations.
Keep a clear, legitimate purpose for what you collect.

Don't

Bypass logins, paywalls, or CAPTCHAs to reach gated data.
Collect personal data without a lawful basis.
Republish copyrighted content wholesale.
Hammer a site or evade explicit blocks.

How Crawlora fits

Crawlora is built for responsible public web data: documented platform APIs that return normalized JSON for public sources, with managed rate-limited access. It is infrastructure — you remain responsible for lawful, compliant use of the data. See rate limits for pacing guidance, and how to choose a web scraping API once you know the rules.

Build on responsible public-data infrastructure

Documented endpoints, normalized JSON, managed rate limits. 2,000 free credits a month, no card.

Try the Playground Read the docs

Frequently asked questions

Is web scraping illegal?

No, not inherently. Collecting public data is generally lawful in the US, UK, and EU; problems arise from how you access it (bypassing auth) and what you collect (personal or copyrighted data).

Does hiQ v. LinkedIn mean I can scrape anything public?

It means accessing public data isn't "unauthorized access" under the CFAA — but ToS, copyright, and privacy law still apply.

Is violating a site's Terms of Service a crime?

Generally it's a contract issue (bans, cease-and-desist), not criminal — but it's still a risk, particularly for logged-in or evasive scraping.

Can I scrape personal data if it's public?

Public availability doesn't exempt you from GDPR/CCPA. Avoid personal data unless you have a lawful basis.

Is this legal advice?

No. This is general information; consult a lawyer for your specific situation.

Is Web Scraping Legal in 2026? A Practical Guide

The three questions that decide it

Public vs. private data

Terms of service and robots.txt

Personal data: GDPR and CCPA

Copyright, databases, and the 2025–26 signal

A practical do / don't checklist

How Crawlora fits

Build on responsible public-data infrastructure

Related reading

Frequently asked questions

How Paywalls Actually Work: The Engineering Behind Them

How to Scrape Yahoo Finance in 2026 (API & Python)

Web Scraping with Python — The Complete 2026 Guide

How to Scrape App Store & Google Play Reviews in 2026 (API & Python)

Scrape Data From a Website to Excel — 3 Ways That Work

Web Scraping with AI — How Agents Get Web Data in 2026

Is Web Scraping Legal in 2026? A Practical Guide

The three questions that decide it

Public vs. private data

Terms of service and robots.txt

Personal data: GDPR and CCPA

Copyright, databases, and the 2025–26 signal

A practical do / don't checklist

How Crawlora fits

Build on responsible public-data infrastructure

Related reading

Frequently asked questions

How Paywalls Actually Work: The Engineering Behind Them

How to Scrape Yahoo Finance in 2026 (API & Python)

Web Scraping with Python — The Complete 2026 Guide

How to Scrape App Store & Google Play Reviews in 2026 (API & Python)

Scrape Data From a Website to Excel — 3 Ways That Work

Web Scraping with AI — How Agents Get Web Data in 2026