Tony Wang3 min readHow to Scrape Reddit in 2026 (API & Python)
Three ways to scrape Reddit posts, comments, and subreddits in 2026 — DIY Python, no-code tools, or a structured API — what each returns and the legal basics.
The fastest way to scrape Reddit in 2026 is to call a structured API that returns normalized JSON — posts, comments, and subreddit feeds with scores, authors, and timestamps — instead of working around Reddit's rate limits and login walls yourself. DIY is possible, but since Reddit's official Data API became paid and heavily rate-limited in 2023, casual scraping breaks quickly.
Reddit is one of the most cited sources in AI answers, which is exactly why teams collect it for community research, sentiment, and grounding data. A structured endpoint gives you the threads as clean JSON without managing tokens and quotas.
Is it legal to scrape Reddit?
Reddit content is public, but posts and comments are written by people, so treat them as personal data where relevant:
- Collect only public posts, comments, and subreddits — no private or quarantined-access content.
- Reddit's official API terms and rate limits exist; review them and your local law before commercial or AI-training use.
- Minimize and protect any user-identifying data you store (usernames, etc.).
- Don't republish content in ways that violate platform terms or user expectations.
This is not legal advice — see Is web scraping legal in 2026? for the public-vs-personal-data detail.
Option 1: DIY in Python (and why it breaks)
You can hit Reddit's JSON endpoints directly:
import requests
resp = requests.get(
"https://www.reddit.com/r/webdev/top.json",
params={"limit": 25},
headers={"User-Agent": "my-app/0.1"},
)
data = resp.json()
# ...then handle 429s, pagination cursors, and comment trees yourself
The recurring pain:
- Aggressive rate limiting — unauthenticated requests get 429s fast; you need backoff and proxies.
- Auth overhead — the official route now means OAuth apps, tokens, and quota management.
- Comment trees — nested replies require recursive traversal and careful flattening.
- Pagination cursors —
aftertokens must be threaded through every call.
Option 2: No-code and ready-made tools
Browser tools can grab a thread, but social listening means watching many subreddits and queries over time and storing the history. That is a pipeline, which an API serves better than a manual export.
Option 3: A structured Reddit API
Crawlora's Reddit API exposes documented endpoints for search, individual posts, comment threads, and subreddit feeds, returning normalized JSON from one API key.
curl -G "https://api.crawlora.net/api/v1/reddit/search" \
-H "x-api-key: $CRAWLORA_API_KEY" \
--data-urlencode "q=web scraping" \
--data-urlencode "sort=top" \
--data-urlencode "limit=25"
import requests
resp = requests.get(
"https://api.crawlora.net/api/v1/reddit/search",
headers={"x-api-key": "YOUR_API_KEY"},
params={"q": "web scraping", "sort": "top", "limit": 25},
)
for post in resp.json()["data"]["posts"]:
print(post["score"], post["subreddit"], post["title"])
A response is normalized JSON (fields are illustrative — confirm the schema in the docs):
{
"code": 200,
"msg": "OK",
"data": {
"posts": [
{
"id": "abc123",
"subreddit": "webdev",
"title": "Example post title",
"author": "example_user",
"score": 482,
"num_comments": 73,
"created_utc": 1780387665
}
]
}
}
From a post id, call the comments endpoint to pull the full thread (with sort, limit, and depth), or the subreddit-posts endpoint to follow a community's feed. These are the same endpoints we used to pull the r/SaaS case study referenced in our content and SEO strategy — handy when you want the post body and top comments as structured data.
What you can collect
- Search results across Reddit: title, subreddit, author, score, and comment count
- Full post detail and nested comment threads by
id - Subreddit feeds sorted by top, new, hot, and more
- Pagination via
after, plustimeandsortfilters
Sources
Where this fits
Reddit data powers community research, brand and product sentiment, and grounding data for AI. Feed threads into a retrieval pipeline or agent with the MCP integration and the AI-agent web data workflow, or track brand mentions alongside review and reputation monitoring.
Get started by testing the endpoint in the Playground, reading the request and response schema in the API docs, and reviewing credit costs on the pricing page.
Frequently asked questions
Can I scrape Reddit without getting blocked?
Crawlora handles rate limiting, proxies, and retries behind the API, so you avoid the 429s and token management of hitting Reddit directly and get normalized JSON.
Does Reddit still have a free API?
Reddit's official Data API became paid and heavily rate-limited in 2023. Crawlora's Reddit endpoints return public posts, comments, and subreddit feeds as normalized JSON from a single Crawlora API key.
Can I scrape private or personal data?
No. Only public posts, comments, and subreddits are in scope. Treat usernames and content as personal data, and review Reddit's terms and your local law before commercial or AI-training use.
What can I collect from Reddit?
Search results, full post detail, nested comment threads, and subreddit feeds, with sort, time, limit, and pagination controls.
Can I use Reddit data for AI or RAG?
You can feed public threads into retrieval pipelines and agents, for example via the MCP integration, subject to Reddit's terms and applicable law for your use case.