Video summary
John Schulman on reasoning, RLHF, and AGI progress
In this Dwarkesh Patel interview, OpenAI cofounder John Schulman discusses reasoning, RLHF-style post-training, and what it may take for models to handle longer, more complex tasks. The conversation covers coding agents, generalization, bottlenecks, and possible paths toward more capable AI systems.
Pre-training vs. post-training
Schulman explains how pre-training builds a broad web-trained model, while post-training narrows it into a helpful chat assistant.
Long-horizon task capability
He discusses how models may move from short chatbot responses to longer, more autonomous coding and planning tasks.
Generalization and robustness
The interview explores sample efficiency, recovery from errors, and how better generalization may help models get unstuck.
AI-friendly interfaces
He also touches on UI design, multimodal use, and why human websites may still work well for AI agents.
Topics
Pre-training and post-training
How pre-training learns from web-scale data and why post-training aims for a more helpful assistant persona.
Long-horizon tasks
Why future models may handle multi-file coding projects and other longer, more autonomous tasks.
Generalization and robustness
The role of generalization, sample efficiency, and recovering from errors when models get stuck.
Start with the video endpoint to capture ID, channel, publish date, duration, and source context.
Pull timestamped transcript data for summarization, search, citation, and RAG preparation.
Collect visible audience comments to identify themes, objections, questions, and engagement signals.
Persist structured JSON, run analysis, and publish dashboards, alerts, or research reports.
Public transcript excerpt
Transcript
Timestamped public transcript passages group captions into readable sections, making the video easier to scan, cite, and summarize.
it can also assign probabilities to everything. The base model can effectively take on all of these different personas or generate all different kinds of content. When we do post-training, we're usually targeting a narrower range of behaviors where we want the model to behave like a kind of chat assistant. It's a more specific persona where it's trying to be helpful. It's not trying to imitate a person. It's answering your questions or doing your tasks. We're optimizing on a different objective, which is more about producing outputs that humans will like and find useful, as opposed to just imitating this raw content from the web.
Maybe I should take a step back and ask this. Right now we have these models that are pretty
Related showcases
More structured YouTube examples
How GPT, Claude, and Gemini are actually trained and served – Reiner Pope
Reiner Pope explains the mechanics behind how GPT-style models are trained and served, focusing in this excerpt on inference economics. Using a roofline-style analysis of transformer execution on a GPU cluster, he shows how batch size, weight fetches, compute throughput, and KV cache access shape latency and cost. The discussion helps explain why higher-priced fast modes can stream tokens more quickly, and why serving many users together can dramatically improve efficiency.
Jensen Huang on Nvidia’s Moat, Supply Chain Bottlenecks, and Whether AI Software Gets Commoditized
Jensen Huang argues that Nvidia’s moat is not just software, but the hard-to-replicate system that turns electrons into valuable tokens across a broad AI ecosystem. He also discusses supply chain constraints, upstream investments, and how Nvidia plans years ahead to scale through bottlenecks.
Michael Nielsen on scientific progress, falsification, and the road to special relativity
Michael Nielsen and Dwarkesh Patel discuss how scientific progress is actually recognized in practice, using the history of the ether, Michelson-Morley, Lorentz, Poincaré, Einstein, and later muon experiments to show why the standard falsification story is often too simple.
Audience comments snapshot
What viewers are saying
Comments praise the depth of the interview and Dwarkesh Patel’s persistent follow-up questions. Several viewers highlight the discussion of pre-training vs. post-training, long-horizon tasks, and the possibility of near-term AGI timelines as especially memorable.
Comment themes
AI training concepts
The conversation is widely appreciated for its clear explanation of how pre-training and post-training differ.
Long-horizon capability
The audience was especially interested in how models may progress toward longer, more coherent task execution.
Thoughtful interview dynamics
Listeners valued the interviewer’s persistent questioning to surface clearer answers.
Audience signals
Strong positive reception
Multiple comments call the episode great and engaging, with appreciation for the interview style.
AGI timeline discussion stood out
Viewers specifically mention the discussion of dangerous AGI potentially emerging within a few years.
Notable takeaways were easy to follow
One comment summarizes key moments, including autonomous coding and long-horizon task ability.
Minor audio feedback
A viewer notes the audio and suggests lowering mic gain for cleaner sound.
Representative public comments
great episode, john schulman was interesting. i appreciated you pressing him on his view that dangerous AGI could emerge within "two or three years", at least with some likelihood where he found this topic worth discussing. i don't have enough info for a strong opinion on that myself, but i've noticed it's almost a...
Great delving there. Thanks guys.
00:30 Pre-training creates a model that can generate content from the web. Post-training targets a narrower range of behaviors like being a chat assistant. 03:44 Models evolving to perform complex coding tasks autonomously 10:29 Improvement in the ability to do long-horizon tasks is key to AI capabilities. 13:52 Mod...
Great interview, appreciate the interviewer challenging and persistent line of good questions and follow-ups to get the best answers
Another great episode. Thanks for such wonderful content.
great interview! if you want cleaner audio try reducing mic gain to avoid clipping ( it can be normalized later to get full volume)
Use Crawlora's YouTube comments API with the video and transcript endpoints to collect viewer language, thread activity, and audience signals.