Tony Wang7 min readWhy Reddit Blocked Unauthenticated JSON in 2026 (and How to Still Get Reddit Data)
Reddit deprecated unauthenticated .json endpoints in 2026 (now 403). Why it happened — AI data licensing and bots — and how to get Reddit data now.
For years, the simplest way to get structured data out of Reddit was a trick everyone knew: append .json to any Reddit URL and get clean JSON back — no API key, no OAuth, no account. It quietly powered most open-source Reddit scrapers, research scripts, bots, and data pipelines.
That door is now closed. On May 28, 2026, Reddit posted Protecting communities from scrapers and platform abuse to r/modnews, announcing it would shut down unauthenticated .json access. Within days, requests started coming back 403 Forbidden — with no deprecation window. If your scraper "still runs" but returns nothing, this is why.
This post explains why Reddit did it — the answer is mostly AI and money — and the compliant ways to still get Reddit data in 2026.
What actually broke
In Reddit's own words: "Deprecating unauthenticated JSON access: We'll also be shutting down unauthenticated .json endpoints. These endpoints can be used to scrape Reddit without accountability. Logged-in and authenticated access won't be impacted."
So:
- Anonymous
.jsonrequests now 403.https://www.reddit.com/r/<sub>/top.jsonand friends no longer return data without authentication. - It fails silently in a lot of tools. Many scrapers get a 403 (or an empty/redirect response) but appear to "succeed," so pipelines quietly go dark instead of erroring loudly.
- Authenticated access still works. Logged-in sessions and the official OAuth API are unaffected — that is the entire point of the change.
- RSS is next. In the same post Reddit called RSS "another common surface for scraping," so feed-based access is on notice too.
Why Reddit did it
The technical change is small. The motivation behind it is the bigger story — and yes, it is largely about AI chatbots and bot traffic.
Reddit's data became an AI goldmine — and a product
Reddit is two decades of real human questions, answers, and opinions — exactly the text that makes large language models useful, and one of the most-cited sources in AI answers. Once that became obvious, Reddit turned its archive into a licensed product:
- A ~$60M/year licensing deal with Google (February 2024) to train Gemini on Reddit data.
- A licensing deal with OpenAI (May 2024) for ChatGPT.
- ~$130M in data-licensing revenue in 2024 — roughly 10% of Reddit's total revenue.
When the data is the product, the free append-.json endpoint is a leak: it let anyone — especially AI companies — take the same data for nothing, undercutting the paid deals.
AI bots were taking it for free — "without accountability"
This is the part most people's instinct gets right. The explosion of AI training crawlers and live "grounding" agents (assistants that fetch Reddit threads at answer time) created enormous automated traffic against the exact endpoints that required no identity. Reddit's framing names it directly: "large-scale scraping, spam networks, agentic account creation, and automated abuse." The unauthenticated .json route was the anonymous front door for all of it — data taken with no key to rate-limit, bill, or ban.
So Reddit started enforcing — in court
Killing .json is the technical half of a broader campaign:
- Reddit sued Anthropic (June 2025), alleging its bots crawled Reddit 100,000+ times and bypassed
robots.txtafter declining to license. - Reddit then sued Perplexity and three scraping firms — SerpApi, Oxylabs, and AWM Proxy (October 2025).
- Reddit blocked the Internet Archive's Wayback Machine (August 2025) over AI-scraping concerns.
Cutting off anonymous .json is how you enforce "license it or don't take it" at the protocol level.
It's part of the bigger "closing web"
Reddit is the highest-profile example of a wider shift: as AI made web data commercially valuable, the open, anonymous, append-.json web is closing. Sites are gating and monetizing data, Cloudflare now blocks AI crawlers by default for many customers, and "pay-per-crawl" is becoming real. The era of casual anonymous public-data access is ending.
Why your scraper gets 403 now (it is not your credentials)
Teams hitting this assume it is an auth or rate-limit bug. It usually is not. Reddit's 2026 enforcement also leans on:
- TLS fingerprinting — generic clients (
requests,wget, defaultcurl) are identified by their TLS handshake and blocked, even with perfect headers. - IP reputation — datacenter and cloud IPs (GitHub Actions, Vercel, common hosts) are heavily flagged; the same request often works from a residential browser and 403s from a server.
- No anonymous fallback — the
.jsonpath that used to absorb all this is gone.
That is why "add a User-Agent" or "back off the rate" no longer fixes it — the block is at the access-policy and fingerprint layer, not the request rate.
How to get Reddit data in 2026 (compliant options)
The free anonymous path is over, but public Reddit data is still reachable through sanctioned routes. Ranked:
1. The official Reddit Data API / Devvit
Reddit points developers to its authenticated Data API (OAuth) and the Devvit developer platform — the sanctioned path:
- Free for non-commercial use, capped at ~100 requests/minute.
- Commercial access runs about $0.24 per 1,000 requests; enterprise agreements start near $12,000/year.
Best when you can register an app, do the OAuth dance, and your use fits Reddit's terms.
2. Authenticated / session-based access
A logged-in browser session (cookies, a real browser via Playwright) still works, because authenticated access is unaffected. It is viable for small, careful jobs — but it is fragile (sessions expire, fingerprints get flagged) and you own all the maintenance and the terms-of-service risk.
3. A managed Reddit API (Crawlora)
If you want structured Reddit data without maintaining auth, proxies, and fingerprints — or rewriting your scraper every time Reddit changes the rules — a managed API does that for you. Crawlora's Reddit API returns normalized JSON for search, posts, comment threads, and subreddit feeds from one key, and maintains the access path as Reddit tightens it:
curl -G "https://api.crawlora.net/api/v1/reddit/subreddit/webdev/posts" \
-H "x-api-key: $CRAWLORA_API_KEY" \
--data-urlencode "sort=hot" \
--data-urlencode "limit=25"
import requests
resp = requests.get(
"https://api.crawlora.net/api/v1/reddit/search",
headers={"x-api-key": "YOUR_API_KEY"},
params={"q": "web scraping", "sort": "top", "limit": 25},
)
for post in resp.json()["data"]["posts"]:
print(post["score"], post["subreddit"], post["title"])
You get posts, comments, and feeds as clean JSON, and you stop chasing Reddit's changes — that is the trade you are buying.
A note on compliance
Reddit's updated Data API terms and Rule 8 now explicitly cover automated abuse and unauthorized scraping, and the May 2026 change makes Reddit's stance clear. Whatever route you choose:
- Collect only public posts, comments, and subreddits — never private, quarantined, or personal data.
- Treat usernames and comment text as personal data (GDPR/CCPA) — minimize what you store and have a lawful basis, especially for AI-training use.
- Prefer the official API or a licensed/managed path, and review Reddit's terms and your local law before commercial or AI use.
This is not legal advice — see Is web scraping legal in 2026? for the public-vs-personal-data detail.
Sources
Where this fits
The append-.json era is over, but Reddit remains one of the richest sources for community research, brand and product sentiment, and grounding data for AI. For the practical how-to (search, posts, comments, subreddit feeds, pagination), see how to scrape Reddit in 2026; to feed threads into a retrieval pipeline or agent, see the MCP integration and the AI-agent web data workflow.
Try it first, free: test the endpoint in the Playground, read the schema in the API docs, and review credit costs on the pricing page.
Frequently asked questions
Why did Reddit block unauthenticated .json endpoints?
On May 28, 2026 Reddit announced it was deprecating unauthenticated .json access to stop scraping 'without accountability' and curb bot and agentic abuse. The bigger driver is commercial: Reddit's data is now a licensed AI-training asset (deals with Google and OpenAI worth ~$130M in 2024), and the free .json path let anyone — especially AI companies — take that data without paying.
Are Reddit .json URLs still working in 2026?
No. Since late May 2026, appending .json to a Reddit URL returns 403 Forbidden for unauthenticated requests. Logged-in sessions and the official OAuth API still work, and Reddit has flagged RSS as the next surface it may close.
Why does my Reddit scraper get 403 even with a User-Agent?
Because the block is no longer about rate or headers. Reddit uses TLS fingerprinting and IP-reputation checks, so generic clients (requests, wget, default curl) and datacenter or cloud IPs get 403 even with a valid User-Agent. The anonymous .json fallback that used to absorb this is gone.
What is the official way to get Reddit data now?
Reddit's authenticated Data API (OAuth) and the Devvit developer platform. It is free for non-commercial use at about 100 requests/minute; commercial access is roughly $0.24 per 1,000 requests, with enterprise agreements starting near $12,000/year.
Is scraping Reddit legal or allowed in 2026?
Reddit's updated Rule 8 and Data API terms restrict unauthorized scraping. Public data is generally accessible, but collect only public content, treat usernames and comments as personal data, and prefer the official API or a licensed/managed path — review Reddit's terms and your local law before commercial or AI use. This is not legal advice.
How can I still get Reddit data without maintaining a scraper?
A managed API like Crawlora returns normalized JSON for Reddit search, posts, comment threads, and subreddit feeds from one key, and maintains the access path as Reddit tightens it — so you avoid auth, proxies, fingerprinting, and constant breakage.