Tony WangMay 6, 2026Updated June 7, 20264 min read

How to Scrape YouTube in 2026 (API & Python)

Three ways to scrape YouTube in 2026 — DIY Python, ready-made tools, or a structured API for videos, search, comments, and transcripts — with the legal basics.

YouTube Guide Web Scraping API

The fastest way to scrape YouTube in 2026 is to call a structured YouTube API that returns normalized JSON — video metadata, search results, comments, and transcripts — instead of parsing YouTube's JavaScript-heavy pages yourself. You can build a DIY scraper or use the official Data API (with its quotas), but a structured scraping API often gives you the fields you actually want with less friction. This guide covers all three approaches, what each returns, where each breaks, and the legal basics.

Is it legal to scrape YouTube?

Scraping public YouTube data (public video metadata, search results, public comments, captions) is generally lower-risk public-web scraping — but YouTube's Terms of Service restrict automated access, and you should avoid personal data and anything gated. Use public, non-personal data, respect rate limits, and review YouTube's terms; see is web scraping legal. Not legal advice.

Option 1: DIY in Python (and why it breaks)

There are really three DIY paths, and each has a catch:

The official YouTube Data API v3 is the sanctioned route, but it runs on a 10,000-unit daily quota — and a single search.list call costs 100 units, so a few deep searches exhaust the day in minutes. Comments cap out around 3,000–10,000/day, and the Data API returns no transcripts at all.
A headless browser or YouTube's internal /youtubei/v1/ endpoints give you what the frontend sees, but the embedded client JSON shifts without notice and trips bot checks.
Transcript libraries (e.g. youtube-transcript-api) are the easy win for captions:

from youtube_transcript_api import YouTubeTranscriptApi

segments = YouTubeTranscriptApi.get_transcript("dQw4w9WgXcQ", languages=["en"])
text = " ".join(s["text"] for s in segments)
print(text[:500])

That one call works until the undocumented caption endpoint changes, the video has no captions, or your IP is rate-limited — and it gives you only the transcript, not the metadata, comments, or search you usually need alongside it. The real cost is stitching the quota-bound Data API (metadata), a browser (search), and a transcript library (captions) into one reliable pipeline — plus the proxies, retries, and monitoring to keep it alive.

Option 2: Ready-made tools

No-code extractors export CSV/JSON and are fine for one-off pulls — less convenient for in-product pipelines with predictable fields.

Option 3: A structured YouTube API

For repeatable workflows, a YouTube scraping API returns normalized JSON from one key — no browser, no Data API quota. Search videos (use q, with optional sort_by, upload_date, and duration filters):

curl "https://api.crawlora.net/api/v1/youtube/search?q=langchain&sort_by=view_count" \
  -H "x-api-key: $CRAWLORA_API_KEY"

Then pull a video's metadata, comments, and transcript by id in Python:

import requests

vid = "dQw4w9WgXcQ"
h = {"x-api-key": "YOUR_API_KEY"}

video = requests.get(f"https://api.crawlora.net/api/v1/youtube/video/{vid}", headers=h).json()["data"]
comments = requests.get(f"https://api.crawlora.net/api/v1/youtube/comments/{vid}", headers=h).json()["data"]["comments"]
transcript = requests.get(f"https://api.crawlora.net/api/v1/youtube/transcript/{vid}", headers=h).json()["data"]["segments"]

print(video["title"], "—", len(comments), "comments,", len(transcript), "transcript segments")

Comments paginate with a continuation_token, and the transcript endpoint can return json, text, srt, or vtt (and even translate to another language) — no quota units to budget. A video response is normalized JSON you can store directly (fields are illustrative — check the docs):

{
  "code": 200,
  "msg": "OK",
  "data": {
    "id": "dQw4w9WgXcQ",
    "title": "Example video",
    "channel": "Example Channel",
    "view_count": 1840000,
    "like_count": 92000,
    "published": "2026-01-12"
  }
}

What you can collect

Where the public page exposes them, grouped by endpoint:

Video — id, title, description, channel, view/like/comment counts, duration, publish date, and captions, via /youtube/video/{id}.
Search — videos, channels, and playlists with sort_by, upload_date, and duration filters; paginate with continuation_token.
Comments — top-level comments and replies per video, paginated with continuation_token.
Transcripts — caption segments with timestamps in json, text, srt, or vtt, with optional translation.
Channels — channel videos, playlists, and shorts for creator research.

Limitations and common challenges

The Data API quota caps the official route (10,000 units/day; search.list is 100 units each), which is why high-volume search and comment collection moves off it — a structured scraper has no quota units to budget.
Transcripts aren't guaranteed — not every video has captions, auto-captions vary in quality, and language availability differs; request a lang and handle misses.
Comments paginate — follow the continuation_token rather than expecting every comment in one call.
DIY fragility — embedded client JSON and the internal /youtubei/v1/ endpoints shift without notice and trip bot checks; a structured API absorbs that behind one key.
Gated content — age- or region-restricted videos may be unavailable; collect only public, non-personal data.

Where this gets used

Creator intelligence — research channels, videos, and comments. See YouTube creator intelligence.
Transcript extraction — pull captions for summarization and RAG. See YouTube transcript extraction.
AI agent context — feed structured video data and transcripts into agents.

Sources

Start collecting

Try it first, free: run any public URL through the Free Web Scraper, or check whether a site blocks bots with the Anti-Bot Checker — no signup.

Test the endpoints in the Playground, check the schema in the API docs, and review pricing. For a full comparison against other YouTube data APIs, see best YouTube scraper APIs in 2026. See also how to scrape Instagram, how to scrape TikTok, how to scrape Reddit, and how to scrape Twitter/X for the rest of the social stack. When the work is music rather than video — tracks, artists, and playlist placement beside YouTube's catalog — see how to scrape Spotify. For the broader toolkit, how to choose a web scraping API.

Frequently asked questions

Can I scrape YouTube without getting blocked?

With a structured API, proxy routing and browser execution are handled behind the endpoint; a DIY scraper must manage shifting client JSON, the internal /youtubei/v1/ endpoints, and bot checks itself.

Can I get YouTube transcripts?

Yes. The transcript endpoint returns caption segments with timestamps by video id where captions exist, in json, text, srt, or vtt, with optional translation to another language. The official YouTube Data API does not return transcripts at all, which is why teams use a scraper or a library like youtube-transcript-api.

Can I scrape YouTube comments?

Yes. The comments endpoint returns top-level comments and replies for a video, paginated with a continuation_token. The official Data API can return comments too but its 10,000-unit daily quota limits you to roughly 3,000–10,000 per day.

Why is the official YouTube Data API so limited?

It runs on a 10,000-unit daily quota and a single search.list call costs 100 units, so deep search or comment collection exhausts the day quickly — and it returns no transcripts. That quota is the main reason teams move high-volume YouTube collection to a scraper.

Is this the official YouTube Data API?

No. It extracts public YouTube data and is independent of YouTube's official Data API v3 and its quota units.

What YouTube data can I collect?

Public video metadata and captions, search (videos/channels/playlists with sort and date filters), comments and replies, transcripts, and channel videos/playlists/shorts.

How often can I refresh?

Run scheduled snapshots within your plan and responsible-use limits.

Tony WangMay 6, 2026Updated June 7, 20264 min read

How to Scrape YouTube in 2026 (API & Python)

Three ways to scrape YouTube in 2026 — DIY Python, ready-made tools, or a structured API for videos, search, comments, and transcripts — with the legal basics.

YouTube Guide Web Scraping API

Is it legal to scrape YouTube?

Option 1: DIY in Python (and why it breaks)

There are really three DIY paths, and each has a catch:

The official YouTube Data API v3 is the sanctioned route, but it runs on a 10,000-unit daily quota — and a single search.list call costs 100 units, so a few deep searches exhaust the day in minutes. Comments cap out around 3,000–10,000/day, and the Data API returns no transcripts at all.
A headless browser or YouTube's internal /youtubei/v1/ endpoints give you what the frontend sees, but the embedded client JSON shifts without notice and trips bot checks.
Transcript libraries (e.g. youtube-transcript-api) are the easy win for captions:

from youtube_transcript_api import YouTubeTranscriptApi

segments = YouTubeTranscriptApi.get_transcript("dQw4w9WgXcQ", languages=["en"])
text = " ".join(s["text"] for s in segments)
print(text[:500])

Option 2: Ready-made tools

No-code extractors export CSV/JSON and are fine for one-off pulls — less convenient for in-product pipelines with predictable fields.

Option 3: A structured YouTube API

curl "https://api.crawlora.net/api/v1/youtube/search?q=langchain&sort_by=view_count" \
  -H "x-api-key: $CRAWLORA_API_KEY"

Then pull a video's metadata, comments, and transcript by id in Python:

import requests

vid = "dQw4w9WgXcQ"
h = {"x-api-key": "YOUR_API_KEY"}

video = requests.get(f"https://api.crawlora.net/api/v1/youtube/video/{vid}", headers=h).json()["data"]
comments = requests.get(f"https://api.crawlora.net/api/v1/youtube/comments/{vid}", headers=h).json()["data"]["comments"]
transcript = requests.get(f"https://api.crawlora.net/api/v1/youtube/transcript/{vid}", headers=h).json()["data"]["segments"]

print(video["title"], "—", len(comments), "comments,", len(transcript), "transcript segments")

{
  "code": 200,
  "msg": "OK",
  "data": {
    "id": "dQw4w9WgXcQ",
    "title": "Example video",
    "channel": "Example Channel",
    "view_count": 1840000,
    "like_count": 92000,
    "published": "2026-01-12"
  }
}

What you can collect

Where the public page exposes them, grouped by endpoint:

Video — id, title, description, channel, view/like/comment counts, duration, publish date, and captions, via /youtube/video/{id}.
Search — videos, channels, and playlists with sort_by, upload_date, and duration filters; paginate with continuation_token.
Comments — top-level comments and replies per video, paginated with continuation_token.
Transcripts — caption segments with timestamps in json, text, srt, or vtt, with optional translation.
Channels — channel videos, playlists, and shorts for creator research.

Limitations and common challenges

The Data API quota caps the official route (10,000 units/day; search.list is 100 units each), which is why high-volume search and comment collection moves off it — a structured scraper has no quota units to budget.
Transcripts aren't guaranteed — not every video has captions, auto-captions vary in quality, and language availability differs; request a lang and handle misses.
Comments paginate — follow the continuation_token rather than expecting every comment in one call.
DIY fragility — embedded client JSON and the internal /youtubei/v1/ endpoints shift without notice and trip bot checks; a structured API absorbs that behind one key.
Gated content — age- or region-restricted videos may be unavailable; collect only public, non-personal data.

Where this gets used

Creator intelligence — research channels, videos, and comments. See YouTube creator intelligence.
Transcript extraction — pull captions for summarization and RAG. See YouTube transcript extraction.
AI agent context — feed structured video data and transcripts into agents.

Sources

Start collecting

Try it first, free: run any public URL through the Free Web Scraper, or check whether a site blocks bots with the Anti-Bot Checker — no signup.

Frequently asked questions

Can I scrape YouTube without getting blocked?

Can I get YouTube transcripts?

Can I scrape YouTube comments?

Why is the official YouTube Data API so limited?

Is this the official YouTube Data API?

No. It extracts public YouTube data and is independent of YouTube's official Data API v3 and its quota units.

What YouTube data can I collect?

Public video metadata and captions, search (videos/channels/playlists with sort and date filters), comments and replies, transcripts, and channel videos/playlists/shorts.

How often can I refresh?

Run scheduled snapshots within your plan and responsible-use limits.

How to Scrape YouTube in 2026 (API & Python)

Is it legal to scrape YouTube?

Option 1: DIY in Python (and why it breaks)

Option 2: Ready-made tools

Option 3: A structured YouTube API

What you can collect

Limitations and common challenges

Where this gets used

Sources

Start collecting

Frequently asked questions

How to Scrape Yahoo Finance in 2026 (API & Python)

Best YouTube Scraper APIs in 2026: How to Choose

Web Scraping with Python — The Complete 2026 Guide

How to Scrape App Store & Google Play Reviews in 2026 (API & Python)

Scrape Data From a Website to Excel — 3 Ways That Work

Web Scraping with AI — How Agents Get Web Data in 2026

How to Scrape YouTube in 2026 (API & Python)

Is it legal to scrape YouTube?

Option 1: DIY in Python (and why it breaks)

Option 2: Ready-made tools

Option 3: A structured YouTube API

What you can collect

Limitations and common challenges

Where this gets used

Sources

Start collecting

Frequently asked questions

How to Scrape Yahoo Finance in 2026 (API & Python)

Best YouTube Scraper APIs in 2026: How to Choose

Web Scraping with Python — The Complete 2026 Guide

How to Scrape App Store & Google Play Reviews in 2026 (API & Python)

Scrape Data From a Website to Excel — 3 Ways That Work

Web Scraping with AI — How Agents Get Web Data in 2026