Sign in Try Playground Console

YouTube video intelligence showcase

John Schulman on Reasoning, RLHF, and the Road to AGI

John Schulman explains how pre-training and post-training shape AI behavior, why long-horizon training may unlock more useful models, and what could still bottleneck progress toward AGI.

Dwarkesh PatelAI Programming PodcastsPre-training and post-trainingLong-horizon tasksGeneralization and robustness1 hr 35 minMay 15, 20246 comment sample

Transcript API Comments API Source video

Build this with Crawlora

Video intelligence API workflow

Video ID: Wo95ob_s_NI
Available APIs: TranscriptCommentsMetadata

YouTube transcript API YouTube comments API YouTube video metadata API YouTube scraping API Creator intelligence workflow Pricing Source video

Open transcript in Playground Open comments in Playground Get API key

cURL

curl "https://api.crawlora.net/api/v1/youtube/transcript/Wo95ob_s_NI" \
  -H "x-api-key: $CRAWLORA_API_KEY"

Video summary

John Schulman on reasoning, RLHF, and AGI progress

In this Dwarkesh Patel interview, OpenAI cofounder John Schulman discusses reasoning, RLHF-style post-training, and what it may take for models to handle longer, more complex tasks. The conversation covers coding agents, generalization, bottlenecks, and possible paths toward more capable AI systems.

Pre-training vs. post-training

Schulman explains how pre-training builds a broad web-trained model, while post-training narrows it into a helpful chat assistant.

Long-horizon task capability

He discusses how models may move from short chatbot responses to longer, more autonomous coding and planning tasks.

Generalization and robustness

The interview explores sample efficiency, recovery from errors, and how better generalization may help models get unstuck.

AI-friendly interfaces

He also touches on UI design, multimodal use, and why human websites may still work well for AI agents.

Topics

Pre-training and post-training

How pre-training learns from web-scale data and why post-training aims for a more helpful assistant persona.

Long-horizon tasks

Why future models may handle multi-file coding projects and other longer, more autonomous tasks.

Generalization and robustness

The role of generalization, sample efficiency, and recovering from errors when models get stuck.

Audience comments snapshot

What viewers are saying

Comments praise the depth of the interview and Dwarkesh Patel’s persistent follow-up questions. Several viewers highlight the discussion of pre-training vs. post-training, long-horizon tasks, and the possibility of near-term AGI timelines as especially memorable.

Sampled comments: 6
Visible likes: 56
Public replies: 0

Comment themes

AI training concepts

The conversation is widely appreciated for its clear explanation of how pre-training and post-training differ.

Long-horizon capability

The audience was especially interested in how models may progress toward longer, more coherent task execution.

Thoughtful interview dynamics

Listeners valued the interviewer’s persistent questioning to surface clearer answers.

Audience signals

Strong positive reception

Multiple comments call the episode great and engaging, with appreciation for the interview style.

AGI timeline discussion stood out

Viewers specifically mention the discussion of dangerous AGI potentially emerging within a few years.

Notable takeaways were easy to follow

One comment summarizes key moments, including autonomous coding and long-horizon task ability.

Minor audio feedback

A viewer notes the audio and suggests lowering mic gain for cleaner sound.

Representative public comments

@vrai49132024-05-30

great episode, john schulman was interesting. i appreciated you pressing him on his view that dangerous AGI could emerge within "two or three years", at least with some likelihood where he found this topic worth discussing. i don't have enough info for a strong opinion on that myself, but i've noticed it's almost a...

24 likes0 replies

@ashh30512024-05-30

Great delving there. Thanks guys.

8 likes0 replies

@moonsonate56312025-05-30

00:30 Pre-training creates a model that can generate content from the web. Post-training targets a narrower range of behaviors like being a chat assistant. 03:44 Models evolving to perform complex coding tasks autonomously 10:29 Improvement in the ability to do long-horizon tasks is key to AI capabilities. 13:52 Mod...

16 likes0 replies

@justinrce2025-05-30

Great interview, appreciate the interviewer challenging and persistent line of good questions and follow-ups to get the best answers

1 likes0 replies

@muntazirabidi2024-05-30

Another great episode. Thanks for such wonderful content.

4 likes0 replies

@peteyhayman2024-05-30

great interview! if you want cleaner audio try reducing mic gain to avoid clipping ( it can be normalized later to get full volume)

3 likes0 replies

Build with YouTube comments data

Use Crawlora's YouTube comments API with the video and transcript endpoints to collect viewer language, thread activity, and audience signals.

Comments API docs Playground

Build this workflow

1Fetch video metadata

Start with the video endpoint to capture ID, channel, publish date, duration, and source context.

2Fetch transcript

Pull timestamped transcript data for summarization, search, citation, and RAG preparation.

3Fetch public comments

Collect visible audience comments to identify themes, objections, questions, and engagement signals.

4Store, analyze, report

Persist structured JSON, run analysis, and publish dashboards, alerts, or research reports.

Public transcript excerpt

Transcript

Timestamped public transcript passages group captions into readable sections, making the video easier to scan, cite, and summarize.

Public excerpt

Show timestamped transcript excerpt(2 passages)

1:55

it can also assign probabilities to everything. The base model can effectively take on all of these different personas or generate all different kinds of content. When we do post-training, we're usually targeting a narrower range of behaviors where we want the model to behave like a kind of chat assistant. It's a more specific persona where it's trying to be helpful. It's not trying to imitate a person. It's answering your questions or doing your tasks. We're optimizing on a different objective, which is more about producing outputs that humans will like and find useful, as opposed to just imitating this raw content from the web.

2:46

Maybe I should take a step back and ask this. Right now we have these models that are pretty

Build with YouTube transcript data

Use Crawlora's YouTube transcript API to fetch fresh timestamped transcript data for your own server-side workflows.

API docs Sign in

Related Crawlora APIs & guides

Build YouTube data workflows with Crawlora

This showcase is built from Crawlora's public YouTube data APIs. Use the same endpoints and guides to build your own transcript, comment, and creator-intelligence workflows.

More AI video examples

Browse structured transcript and comment showcases in AI.

More Programming video examples

Browse structured transcript and comment showcases in Programming.

YouTube API

Transcript, comments, and video metadata endpoints that return normalized JSON.

YouTube transcript extraction

Build searchable, RAG-ready transcript pipelines from public videos.

YouTube creator intelligence

Monitor creators, audiences, and content trends across channels.

Podcast & audio intelligence

Turn long-form audio and podcasts into structured, analyzable data.

Related showcases

More structured YouTube examples

Chip design from the bottom up – Reiner Pope

Dwarkesh Patel and Reiner Pope build AI chip design from the ground up, starting with logic gates and multiply-accumulate operations before moving into adders, precision tradeoffs, and why low-bit arithmetic is so powerful for neural nets.

Logic gates to chip primitivesMatrix multiplication as the core workload

What rebuilding AlphaGo teaches us about self-play, RL, and the future of LLMs

Eric Jang explains AlphaGo from the ground up, using Go’s rules, endgame scoring, and search complexity to show why deep learning made the problem tractable. The episode connects those ideas to self-play, reinforcement learning, and broader lessons for future AI systems.

Go fundamentalsAlphaGo’s significance

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Reiner Pope explains the mechanics behind how GPT-style models are trained and served, focusing in this excerpt on inference economics. Using a roofline-style analysis of transformer execution on a GPU cluster, he shows how batch size, weight fetches, compute throughput, and KV cache access shape latency and cost. The discussion helps explain why higher-priced fast modes can stream tokens more quickly, and why serving many users together can dramatically improve efficiency.

Batch size and batchingRoofline analysis

Build this with Crawlora

Video intelligence API workflow

Video ID: Wo95ob_s_NI
Available APIs: TranscriptCommentsMetadata

YouTube transcript API YouTube comments API YouTube video metadata API YouTube scraping API Creator intelligence workflow Pricing Source video

Open transcript in Playground Open comments in Playground Get API key

cURL

curl "https://api.crawlora.net/api/v1/youtube/transcript/Wo95ob_s_NI" \
  -H "x-api-key: $CRAWLORA_API_KEY"