Tony WangMay 4, 2026Updated June 15, 20265 min read

How to Scrape Reddit in 2026 (API & Python)

Three ways to scrape Reddit posts, comments, and subreddits in 2026 — DIY Python, no-code tools, or a structured API — what each returns and the legal basics.

Reddit Guide Web Scraping API

The fastest way to scrape Reddit in 2026 is to call a structured API that returns normalized JSON — posts, comments, and subreddit feeds with scores, authors, and timestamps — instead of working around Reddit's rate limits and login walls yourself. DIY is possible, but since Reddit's official Data API became paid and heavily rate-limited in 2023, casual scraping breaks quickly.

Reddit is one of the most cited sources in AI answers, which is exactly why teams collect it for community research, sentiment, and grounding data. A structured endpoint gives you the threads as clean JSON without managing tokens and quotas.

Is it legal to scrape Reddit?

Reddit content is public, but posts and comments are written by people, so treat them as personal data where relevant:

Collect only public posts, comments, and subreddits — no private or quarantined-access content.
Reddit's official API terms and rate limits exist; review them and your local law before commercial or AI-training use.
Minimize and protect any user-identifying data you store (usernames, etc.).
Don't republish content in ways that violate platform terms or user expectations.

This is not legal advice — see Is web scraping legal in 2026? for the public-vs-personal-data detail.

Option 1: DIY in Python (and why it no longer works)

For years you could hit Reddit's unauthenticated .json endpoints directly. That stopped working in May 2026 — Reddit deprecated unauthenticated .json access, and the call below now returns 403 Forbidden:

import requests

resp = requests.get(
    "https://www.reddit.com/r/webdev/top.json",
    params={"limit": 25},
    headers={"User-Agent": "my-app/0.1"},
)
data = resp.json()
# ...then handle 429s, pagination cursors, and comment trees yourself

The recurring pain:

Unauthenticated .json is dead (2026). The call above now returns 403 — see why Reddit blocked it. DIY now means the authenticated OAuth API or a real logged-in browser session, with TLS-fingerprint and IP-reputation hurdles on top.
Auth overhead and cost — the official route means OAuth apps and tokens, and Reddit's Data API is paid beyond free non-commercial use (100 requests/minute): commercial access runs about $0.24 per 1,000 requests, and enterprise agreements start near $12,000/year.
Comment trees — nested replies require recursive traversal and careful flattening.
Pagination cursors — after tokens must be threaded through every call.

Option 2: No-code and ready-made tools

Browser tools can grab a thread, but social listening means watching many subreddits and queries over time and storing the history. That is a pipeline, which an API serves better than a manual export.

Option 3: A structured Reddit API

Crawlora's Reddit API exposes documented endpoints for search, individual posts, comment threads, and subreddit feeds, returning normalized JSON from one API key.

curl -G "https://api.crawlora.net/api/v1/reddit/search" \
  -H "x-api-key: $CRAWLORA_API_KEY" \
  --data-urlencode "q=web scraping" \
  --data-urlencode "sort=top" \
  --data-urlencode "limit=25"

import requests

resp = requests.get(
    "https://api.crawlora.net/api/v1/reddit/search",
    headers={"x-api-key": "YOUR_API_KEY"},
    params={"q": "web scraping", "sort": "top", "limit": 25},
)
for post in resp.json()["data"]["posts"]:
    print(post["score"], post["subreddit"], post["title"])

A response is normalized JSON (fields are illustrative — confirm the schema in the docs):

{
  "code": 200,
  "msg": "OK",
  "data": {
    "posts": [
      {
        "id": "abc123",
        "subreddit": "webdev",
        "title": "Example post title",
        "author": "example_user",
        "score": 482,
        "num_comments": 73,
        "created_utc": 1780387665
      }
    ]
  }
}

From a post id, pull its comments, or follow a community with the subreddit endpoint:

post_id = "abc123"
h = {"x-api-key": "YOUR_API_KEY"}

comments = requests.get(f"https://api.crawlora.net/api/v1/reddit/comments/{post_id}",
                        headers=h, params={"limit": 100}).json()
hot = requests.get("https://api.crawlora.net/api/v1/reddit/subreddit/webdev/posts",
                   headers=h, params={"sort": "hot"}).json()["data"]["posts"]

The comments endpoint returns flat public comments (no recursive reply tree to traverse), and every list paginates with an after token. These are the same endpoints behind the r/SaaS case study referenced in our content and SEO strategy — store one row per post or comment and re-run on a schedule for ongoing social listening.

What you can collect

Search results across Reddit: title, subreddit, author, score, and comment count
Full post detail and flat public comments by id
Subreddit feeds sorted by hot, new, top, and rising
Pagination via after, plus time and sort filters

Limitations and common challenges

The official API is the expensive path. Free use is non-commercial only and capped at 100 requests/minute; commercial access runs about $0.24 per 1,000 requests, with enterprise agreements starting near $12,000/year. The free public .json endpoints teams used to fall back on are gone as of May 2026, leaving the paid API or a managed scraper.
Comments come back flat. Public comment data is a flat list, not a nested reply tree — simpler to store, but reconstruct threading from parent ids yourself if you need it.
Pagination. Lists page with an after token; thread it through every call rather than expecting everything at once.
Usernames are personal data. Treat authors and comment text as potentially personal under GDPR/CCPA — minimize what you store and have a lawful basis, especially for AI-training use.
Rate limits and blocks for DIY. Unauthenticated requests get 429s quickly; a structured API handles pacing, proxies, and retries behind one key.

Sources

Where this fits

Try it first, free: run any public URL through the Free Web Scraper, or check whether a site blocks bots with the Anti-Bot Checker — no signup.

Reddit data powers community research, brand and product sentiment, and grounding data for AI. Feed threads into a retrieval pipeline or agent with the MCP integration and the AI-agent web data workflow, or track brand mentions alongside review and reputation monitoring — and for structured business reviews to set beside Reddit's unstructured chatter, see how to scrape Trustpilot reviews. Where the subreddit is a finance one, pair that chatter with the underlying numbers — how to scrape Yahoo Finance covers quotes, history, and earnings dates. For a full comparison against other Reddit data APIs, see best Reddit scraper APIs in 2026. For the rest of the social stack, see how to scrape YouTube, how to scrape TikTok, how to scrape LinkedIn, and how to scrape Twitter/X; for the broader toolkit, see how to choose a web scraping API.

Get started by testing the endpoint in the Playground, reading the request and response schema in the API docs, and reviewing credit costs on the pricing page.

Frequently asked questions

Can I scrape Reddit without getting blocked?

Crawlora handles rate limiting, proxies, and retries behind the API, so you avoid the 429s and token management of hitting Reddit directly and get normalized JSON.

Does Reddit still have a free API?

Reddit's official Data API is free only for non-commercial use and capped at 100 requests/minute; it became paid and heavily rate-limited in 2023. Crawlora's Reddit endpoints return public posts, comments, and subreddit feeds as normalized JSON from a single Crawlora API key.

How much does the Reddit API cost?

Free for non-commercial use (100 requests/minute via OAuth/PRAW). Commercial access runs about $0.24 per 1,000 requests, and enterprise agreements start near $12,000/year — which is why many teams now use a managed scraper API instead — the free public .json endpoints were shut down in May 2026.

Can I get full Reddit comment threads?

Crawlora's comments endpoint returns flat public comments for a post (it accepts sort and depth for compatibility, but public comment data is flat). It is simpler to store; reconstruct threading from parent ids yourself if you need the reply tree.

Can I scrape private or personal data?

No. Only public posts, comments, and subreddits are in scope. Treat usernames and content as personal data, and review Reddit's terms and your local law before commercial or AI-training use.

What can I collect from Reddit?

Search results, full post detail, flat public comments, and subreddit feeds, with sort, time, limit, and after-cursor pagination.

Can I use Reddit data for AI or RAG?

You can feed public threads into retrieval pipelines and agents, for example via the MCP integration, subject to Reddit's terms and applicable law for your use case.

Tony WangMay 4, 2026Updated June 15, 20265 min read

How to Scrape Reddit in 2026 (API & Python)

Three ways to scrape Reddit posts, comments, and subreddits in 2026 — DIY Python, no-code tools, or a structured API — what each returns and the legal basics.

Reddit Guide Web Scraping API

Is it legal to scrape Reddit?

Reddit content is public, but posts and comments are written by people, so treat them as personal data where relevant:

Collect only public posts, comments, and subreddits — no private or quarantined-access content.
Reddit's official API terms and rate limits exist; review them and your local law before commercial or AI-training use.
Minimize and protect any user-identifying data you store (usernames, etc.).
Don't republish content in ways that violate platform terms or user expectations.

This is not legal advice — see Is web scraping legal in 2026? for the public-vs-personal-data detail.

Option 1: DIY in Python (and why it no longer works)

import requests

resp = requests.get(
    "https://www.reddit.com/r/webdev/top.json",
    params={"limit": 25},
    headers={"User-Agent": "my-app/0.1"},
)
data = resp.json()
# ...then handle 429s, pagination cursors, and comment trees yourself

The recurring pain:

Unauthenticated .json is dead (2026). The call above now returns 403 — see why Reddit blocked it. DIY now means the authenticated OAuth API or a real logged-in browser session, with TLS-fingerprint and IP-reputation hurdles on top.
Auth overhead and cost — the official route means OAuth apps and tokens, and Reddit's Data API is paid beyond free non-commercial use (100 requests/minute): commercial access runs about $0.24 per 1,000 requests, and enterprise agreements start near $12,000/year.
Comment trees — nested replies require recursive traversal and careful flattening.
Pagination cursors — after tokens must be threaded through every call.

Option 2: No-code and ready-made tools

Option 3: A structured Reddit API

Crawlora's Reddit API exposes documented endpoints for search, individual posts, comment threads, and subreddit feeds, returning normalized JSON from one API key.

curl -G "https://api.crawlora.net/api/v1/reddit/search" \
  -H "x-api-key: $CRAWLORA_API_KEY" \
  --data-urlencode "q=web scraping" \
  --data-urlencode "sort=top" \
  --data-urlencode "limit=25"

import requests

resp = requests.get(
    "https://api.crawlora.net/api/v1/reddit/search",
    headers={"x-api-key": "YOUR_API_KEY"},
    params={"q": "web scraping", "sort": "top", "limit": 25},
)
for post in resp.json()["data"]["posts"]:
    print(post["score"], post["subreddit"], post["title"])

A response is normalized JSON (fields are illustrative — confirm the schema in the docs):

{
  "code": 200,
  "msg": "OK",
  "data": {
    "posts": [
      {
        "id": "abc123",
        "subreddit": "webdev",
        "title": "Example post title",
        "author": "example_user",
        "score": 482,
        "num_comments": 73,
        "created_utc": 1780387665
      }
    ]
  }
}

From a post id, pull its comments, or follow a community with the subreddit endpoint:

post_id = "abc123"
h = {"x-api-key": "YOUR_API_KEY"}

comments = requests.get(f"https://api.crawlora.net/api/v1/reddit/comments/{post_id}",
                        headers=h, params={"limit": 100}).json()
hot = requests.get("https://api.crawlora.net/api/v1/reddit/subreddit/webdev/posts",
                   headers=h, params={"sort": "hot"}).json()["data"]["posts"]

What you can collect

Search results across Reddit: title, subreddit, author, score, and comment count
Full post detail and flat public comments by id
Subreddit feeds sorted by hot, new, top, and rising
Pagination via after, plus time and sort filters

Limitations and common challenges

The official API is the expensive path. Free use is non-commercial only and capped at 100 requests/minute; commercial access runs about $0.24 per 1,000 requests, with enterprise agreements starting near $12,000/year. The free public .json endpoints teams used to fall back on are gone as of May 2026, leaving the paid API or a managed scraper.
Comments come back flat. Public comment data is a flat list, not a nested reply tree — simpler to store, but reconstruct threading from parent ids yourself if you need it.
Pagination. Lists page with an after token; thread it through every call rather than expecting everything at once.
Usernames are personal data. Treat authors and comment text as potentially personal under GDPR/CCPA — minimize what you store and have a lawful basis, especially for AI-training use.
Rate limits and blocks for DIY. Unauthenticated requests get 429s quickly; a structured API handles pacing, proxies, and retries behind one key.

Sources

Where this fits

Try it first, free: run any public URL through the Free Web Scraper, or check whether a site blocks bots with the Anti-Bot Checker — no signup.

Get started by testing the endpoint in the Playground, reading the request and response schema in the API docs, and reviewing credit costs on the pricing page.

Frequently asked questions

Can I scrape Reddit without getting blocked?

Crawlora handles rate limiting, proxies, and retries behind the API, so you avoid the 429s and token management of hitting Reddit directly and get normalized JSON.

Does Reddit still have a free API?

How much does the Reddit API cost?

Can I get full Reddit comment threads?

Can I scrape private or personal data?

No. Only public posts, comments, and subreddits are in scope. Treat usernames and content as personal data, and review Reddit's terms and your local law before commercial or AI-training use.

What can I collect from Reddit?

Search results, full post detail, flat public comments, and subreddit feeds, with sort, time, limit, and after-cursor pagination.

Can I use Reddit data for AI or RAG?

You can feed public threads into retrieval pipelines and agents, for example via the MCP integration, subject to Reddit's terms and applicable law for your use case.

How to Scrape Reddit in 2026 (API & Python)

Is it legal to scrape Reddit?

Option 1: DIY in Python (and why it no longer works)

Option 2: No-code and ready-made tools

Option 3: A structured Reddit API

What you can collect

Limitations and common challenges

Sources

Where this fits

Frequently asked questions

How to Scrape Yahoo Finance in 2026 (API & Python)

Web Scraping with Python — The Complete 2026 Guide

How to Scrape App Store & Google Play Reviews in 2026 (API & Python)

Scrape Data From a Website to Excel — 3 Ways That Work

Web Scraping with AI — How Agents Get Web Data in 2026

How to Scrape Airbnb in 2026 (API & Python)

How to Scrape Reddit in 2026 (API & Python)

Is it legal to scrape Reddit?

Option 1: DIY in Python (and why it no longer works)

Option 2: No-code and ready-made tools

Option 3: A structured Reddit API

What you can collect

Limitations and common challenges

Sources

Where this fits

Frequently asked questions

How to Scrape Yahoo Finance in 2026 (API & Python)

Web Scraping with Python — The Complete 2026 Guide

How to Scrape App Store & Google Play Reviews in 2026 (API & Python)

Scrape Data From a Website to Excel — 3 Ways That Work

Web Scraping with AI — How Agents Get Web Data in 2026

How to Scrape Airbnb in 2026 (API & Python)