AI Explained Official Podcast

GPT-6 Goes Rogue? The HuggingFace Incident, Sans Hype Jul 22, 2026 875 An unreleased internal OpenAI model, very likely to be called GPT-6, was able to autonomously break out of its sandbox AND break into HugginFace, just to score higher on a benchmark prompt. This video has the details you may have missed, a layperson analogy, whether this is truly novel, and more…Dozens more Exclusive videos on Patreon ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Intr

This Was Not a Normal Set of Model Release - Sol Ultra, Meta Muse, New Grok Jul 10, 2026 1054 What a week in AI, for real. GPT 5.6 may actually beat Claude Fable, in what you get for your money, while the new Grok 4.5 and Meta Muse Spark 1.1 make the choice even harder. Uncovering a dozen nuggets of gold you may have missed from all the viral headlines, I can also assure you you’ll learn something you didn’t know before.For Exclusive Videos, go to AI Insiders (less than $9!): https://www.p

Claude Fable Blocked - 11 Quiet Details on What’s Next Jun 14, 2026 800 Claude Fable 5 banned, but what’s the bigger story. We go through 11 under-reported details, so you have the context to see what’s coming next for your use of AI. From whether the ban will last, what the possible motives are, what the model can actually do, and some wild over-extrapolations going on.Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcounci

Claude Fable 5 - Full 319 page Breakdown Jun 10, 2026 2039 Fable 5 is out - and it’s good, very good. But beyond the splashy demos, I want to bring you the 20+ nuggets from the 319 page system card, which I read in full, all day, plus benchmarks you may not have noticed. https://assemblyai.com/aiexplainedPlus two worrying trends inside the ‘mind’ of Claude, how OpenAI counter, and the transformer inventor’s warning.Check out my fast-growing (!) app, free

New Claude - 244 page breakdown May 29, 2026 1348 The ‘best’ generally available AI model just dropped, but there is plenty I bet you missed about what it is, how it performs, and what the release tells us. 15 highlights from the 244 page system card, plus private testing, leader interview and more.AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:49 - Mythos in Weeks01:49 - Adaptive not necessary02:26 - Honest

Two Rival Bets on AGI: Google I/O Highlights May 20, 2026 1290 The biggest Google AI push of the year, but what is the bigger story? Why is Google pursuing a different fork in the road than OpenAI or Anthropic? What does Gemini 3.5 Flash mean for the near-term future of AI? https://assemblyai.com/aiexplainedPlus the highlights from a provocative new paper on AI, 8 key moments you may have missed, and the signal from 5+ hours of AI lab interviews.Check out my

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies Apr 24, 2026 1518 GPT 5.5 full analysis, plus DeepSeek V4 paper highlights, comparisons with Mythos, a vibe-coded game w/ GPT Image 2, and 50 data-points you wouldn’t get from just reading the headlines.Chapters:01:11 - GPT 5.5 Comparison06:04 - Mythos Marketing11:50 - Recursive Self-Improvement?14:11 - Deepseek V418:03 - VibeCode Experiment Extravaganza21:44 - The Scarce Compute Erahttps://80000hours.org/aiexplain

Claude Opus 4.7 - A New Frontier, in Performance … and Drama Apr 17, 2026 1180 Claude Opus 4.7 just dropped, but behind every headline lies a deeper story. From a bonanza of benchmarks, to seeing the fruits of one of the biggest mega-projects in US history, to sneaky Mythos disclaimers, to Anthropic admitting compute restraints and, forcing lower capability of Opus 4.7. Where the new model falls behind Gemini but ahead of GPT 5.4, plus why some users are furious at Anthropic

Claude Mythos: Highlights from 244-page Release Apr 8, 2026 1651 The model, the mythos, the legend. We have a new best AI model, but not all of us. How good is it, what does it’s new offensive capabilities mean? Why does it’s 244 page report card remind me of Her, and why did the creator of Claude Code call it ‘terrifying’. 30+ highlights sourced by reading the paper in full, old-school, no AI summary.https://80000hours.org/aiexplainedCheck out my fast-growing

OpenAI Spud, a Claude Model set to ‘stir governments’, Beast Mode ARC-AGI-3 Mar 26, 2026 987 First look at exclusive reports about OpenAI's new Spud model, and the model Anthropic think will stir governments to urgency, all in the context of the newly-launched ARC-AGI-3. What does the extreme difficulty of that benchmarks, and its quirky scoring metrics, mean for AI in 2026?https://assemblyai.com/aiexplainedCheck out my fast-growing (!) app, free to use, and code INSIDER15 for paid t

What the New ChatGPT 5.4 Means for the World Mar 6, 2026 1311 Just 48 hours after releasing GPT 5.3 Instant, OpenAI have released GPT 5.4 Thinking, so either their is an imminent singularity or perhaps we are being distracted from other news. This video will give 9 crucial bits of context, not just on the GPT 5.4 drop but on the background to the meltdown between the Pentagon and Anthropic. What does this say about the state of AI progress, your job, and wha

Deadline Day for Autonomous AI Weapons & Mass Surveillance Feb 27, 2026 819 Will Anthropic be forced to make a version of Claude for war? And does a new paper expose the risks of Claude agents, in both OpenClaw and the field of war? Plus, 5 more twists in the story of the Pentagon versus Anthropic + some AI lab employees, and a petition that could change everything, or nothing...Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmc

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI Feb 20, 2026 1130 Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!https://epoch.ai/ai-explained-datacentersCheck out my fast-growing (!) app, free to use, an

The Two Best AI Models/Enemies Just Got Released Simultaneously Feb 6, 2026 1189 The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much morehttps://assemblyai.com/aiexplainedCheck ou

Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown Jan 28, 2026 1332 Anthropic's CEO, who has consistently predicted transformative AI will arrive before 2030, recently published a nearly 20,000-word essay outlining his vision of where AI is heading. The video gives you the highlights. The essay argues that scaling and recursion will advance AI from coding automation to full engineering automation, while warning of economic displacement within 1-2 years and Ch

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me: Jan 14, 2026 1096 A new tool, with code written by an AI model, has gone omega-viral: Claude Cowork. But is the hype justified? What do the stats say on productivity? Where is the truth in a sea of noise? What is truth? Can we handle the truth? Where's Nemo?https://matsprogram.org/s26-aieCheck out my new app! https://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters: 00:00 - Introducti

What the Freakiness of 2025 in AI Tells Us About 2026 Dec 23, 2025 2006 It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.http://matsprogram.org/s26-aieMy new app! https://lmcouncil.aiPatreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094Chapters:00:00 - Introduction00:34 - Re

Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but … Dec 19, 2025 1199 The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more…https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26 Also, do check out my new app: https://lmcouncil.aiChapters: 00:00 - Introduction00:50 - Results02:44 - But… the Flaw04:49 - So Benchmark

GPT 5.2: OpenAI Strikes Back Dec 12, 2025 1061 Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines.https://www.youtube.com/@eightythousandhoursAI Insiders ($9!): https://www.patreon.com/AIExplainedhttps://lmcouncil.aiChapters:00:00 - Introduction00:55 - Better than Human @ Professional Tasks?04:42 - Test time Compute07:05 - Benchmark Selection0

You Are Being Told Contradictory Things About AI: 8 examples Dec 5, 2025 1215 With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide.https://epoch.ai/data/data-centersEpoch AI is th

Gemini 3 is Here: 11 Details You Might Have Missed Nov 19, 2025 1302 Gemini 3 Pro is out, and records fell like snowflakes in Svalbard. No long description, chapters or links today, huge technical difficulties, including with audio, so just want to publish asap.https://app.grayswan.ai/ai-explainedhttps://lmcouncil.aiAI Insiders ($9!): https://www.patreon.com/AIExplainedNon-hype Newsletter: https://signaltonoise.beehiiv.com/Podcast: https://aiexplainedopodcast.buzzs

Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that Nov 14, 2025 1106 A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests.https://assemblyai.com/aiexplainedChapters:00:00 - Introduction00:56 - GPT 5.1 Smarter?01:47 - Some Regressions03:22 - Sycophancy?05:22 - Claud

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection) Nov 10, 2025 773 Don’t let headlines about bubbles distract you from the real avenues of progress being explored in AI every week, including what had been thought to be a long-term blocker - continual learning (learning on the fly). https://app.grayswan.ai/ai-explainedThis, plus models introspecting (hesitate before you berate), Nano Banana 2 possibly spotted, Chinese imagen and more.AI Insiders ($9!): https://www

Sora 2 - It will only get more realistic from here Oct 1, 2025 943 Sora 2 - the start of the infinite slop-feed or a key step to a generalist agent? Better than VEO 3 or over-hyped? I bring out 6 details you may have missed, contrast the announcement to Periodic Labs and even squeeze in some Claude Sonnet 4.5 analysis. Maybe I should make my videos longer…https://80000hours.org/aiexplainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Intr

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings Sep 26, 2025 846 An OpenAI report released in the last 24 hours is the best look we have as to whether 2025 AI can automate your job. I’ll go through 4 unexpected findings, from which model is best at what, to practical tips and massive caveats. Plus UFC robots, radiologist essay, don’t trust videos and the blockers to the singularity. Gray Swan: https://app.grayswan.ai/ai-explainedGDPval: https://cdn.openai.com/p

ChatGPT Will Guess your Age, Flirt if Asked, and Can Call the Cops Sep 16, 2025 691 Sam Altman, CEO of OpenAI, announced a set of new ‘protections’ and ‘privileges’ for ChatGPT users, requiring a significant amount of trust from users. From predicting your age based on your chat to calling law enforcement if you are at risk of harm, to allowing non-minors to flirt. But amidst all of these announcements, there are interview snippets you may have missed, as Altman dramatically revi

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana Aug 26, 2025 1134 Wait, why did Sam Altman say AI was in a bubble? Or did he? Is it? 8 points for you to consider, before we all get distracted by Nano Banana.Chapters:00:00 - Introduction01:14 - Sam Altman Clarification02:30 - Media Calls a Bubble (for the tenth time)03:40 - MIT and McKinsey Analysed08:21 - Incremental Progress Deceptive12:07 - Reasoning Breakthroughs15:31 - CEOs might not know their products17:25

GPT-5 has Arrived Aug 8, 2025 901 GPT-5 will change how hundreds of millions of people use AI. Yes, you might have to forgive the chart crimes, the underwhelming livestream and Altman hype… But it’s a good model. I have read the 50 page system card in full, have the benchmark scores, coding tests, and things you might have missed.https://app.grayswan.ai/ai-explainedAnnouncement: https://openai.com/index/introducing-gpt-5/System Ca

Genie 3: The World Becomes Playable (DeepMind) Aug 5, 2025 714 Soon, anything will be playable. A photo becomes an interactive world, a selfie becomes a new game. Genie 3 from Google, debuting just 2 hours ago, is what I mean, and I have the full analysis, plus the pushback I gave the authors (will it really lead to reliable AI agents? Is that even the point?). You make your own mind up, but it’s certainly fascinating, and not to be overlooked in the week tha

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …) Jul 21, 2025 1039 GPT-5 did what? OpenAI ahead of Google? There are 9 ways to misread the headlines of the last 48 hours, so this video is here to tell you what happened, sans sizzle. It’s been a fairly momentous last few days, so let’s dive in to the International Math Olympiad Gold, GPT-5 alpha release, whether mathematicians are out of jobs, and the white collar impact by year’s end.Job Board: https://80000hours

Grok 4 - 10 New Things to Know Jul 10, 2025 703 Grok 4 is here, but did you know these 10 things about the new model? From benchmark caveats to soloing science, $300 a month secrets to Grok 5 promises, here's 10 new things to know in just under 12 minutes.AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:22 - Benchmark Results02:11 - Benchmark Caveats02:59 - ARC-AGI 2 03:35 - SimpleBench04:49 - ‘Humanity

When Will AI Models Blackmail You, and Why? Jun 24, 2025 1579 In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplainedAI Insiders ($9!): https://www.patreon.c

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know Jun 12, 2025 840 What to make of those headlines that AI can’t reason, seen by tens of millions? I cover the paper in layman’s terms, what it means and doesn’t mean, and what’s next. Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: https://storyblocks.com/AIExplainedPlus o3-pro and whether it is my current most-recommended model.AI Insiders ($9!): h

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed Jun 6, 2025 1001 There’s a new best language model, so let’s go through the up and downs of Gemini 2.5 Pro 06-05. Record-breaking common-sense, but dumb mistakes remain. And it’s not even their best model, which remains behind the scenes - Gemini 2.5 Ultra. Plus Sundar Pichai’s AGI date and an analysis of whether the current AI unemployment headlines are justified, and Elevenlabs v3.https://emergentmind.comAI Insi

Claude 4: Full 120 Page Breakdown … Is it the Best New Model? May 22, 2025 1144 Not only did I get early access and ran my own tests, as per the title I read both the 120 page Claude 4 Opus and Claude 4 Sonnet System Card, and 25 page report on ASL-3 being triggered, plus the 2 hour launch video, and surrounding coverage. Ft. coding tests, Simple, twitter controversies, deep alignment coverage, spiritual bliss and much more!https://80000hours.org/aiexplainedChapters: 00:00 -

Google Takes No Prisoners Amid Torrent of AI Announcements May 21, 2025 1027 Google just announced at least 12 things that are each worthy of a video, but here are the top I/O highlights. From Veo 3 to Deep Research now being useable, Deep Think breaking records to Gemini Diffusion, Gemini 2.5 Flash changing how AI is priced and GemmaVerse, SynthID Detector and Imagen 4. And even this intro is missing other announcements covered in the vid! And yes, they’ll be plenty of Ve

AI Improves at Self-improving May 19, 2025 1061 AlphaEvolve is not the first system to exhibit self-improvement, but it may be the most impressive yet. AI is literally improving the hardware, architectures, data and training methods of AI itself. A deep dive into the paper, drawing on two previous interviews and 5 other papers. Plus a snippet on OpenAI’s new Codex system.Gray Swan: http://app.grayswan.ai/ai-explainedAI Insiders ($9!): https://w

o3 breaks (some) records, but AI becomes pay-to-win Apr 25, 2025 873 A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.https://app.grayswan.ai/ai-explainedAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction00:33 - FictionLiveBench01:37 - PHYBench02:14 - SimpleBen

o3 and o4-mini - they’re great, but easy to over-hype Apr 16, 2025 864 Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…https://weave-docs.wandb.ai/?utm_source=sponsorshi

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed Apr 16, 2025 1209 This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening.https://www.emergentmind.com/Chapters: 00:00 - Introduction00:30 - Kling 2.001:35 - GPT 4.105:25 - o3 Build-up07:37 - ‘Product Compa

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’... Apr 7, 2025 1431 The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and more, as well.Weights & Biases: https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explainedDeepSeek D

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score) Mar 28, 2025 1281 Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained… and more. Plus practical t

Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI Mar 25, 2025 827 Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indications, Vista Bench, LM Arena and more…AI Insiders ($9!): https://www.patreon.com/AIExplainedChapters: 00:00 - Introduction01:15 - Gemini 2.5 Benchmarks05:

Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3) Mar 13, 2025 778 Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs of Manus AI, and where it is strong. Other news (yes, Gemini image editing and research hacking, I mean you), will have to wait for a few more hours, as mi

GPT 4.5 - not so much wow Feb 28, 2025 1505 GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4.5. You’ll see all the benchmarks, highlights of the paper, emotional intelligence and humor tests, Simple Bench results (reddit was an unreliable source),

Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon) Feb 25, 2025 1659 Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2.GraySwan C

AGI: (gets close), Humans: ‘Who Gets the Money?’ Feb 11, 2025 1337 A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will play out. From labour vs capital to automating rival companies and countries, and from non-profit shenanigans to new mini-docs, there was just too m

Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research Feb 3, 2025 1112 12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.Deep Research: https://openai.com/index/introducing-deep-research/https://www.youtube.com

o3-mini and the “AI War” Jan 31, 2025 921 o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably the bigger story - the increasingly frenetic rhetori

Nothing Much Happens in AI, Then Everything Does All At Once Jan 24, 2025 1389 When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s Last Exam, and Hassabis Accelerates his AGI timelin

Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out Jan 20, 2025 791 OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'. Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much more. Who said anything else is happening

OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward Jan 8, 2025 1421 Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a new Simple Bench competition, and s

o3 - wow Dec 21, 2024 1340 o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more. Frontie

Never Browse Alone? - Gemini 2 Live and ChatGPT Vision Dec 12, 2024 820 The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision. Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission. 00:00

Sora is Out, But is it a Distraction? Dec 10, 2024 934 After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system card. The user interface is quite beautiful, even if the videos themselves operate until entirely new rules of physics. But I can’t help wondering if OpenAI wa

o1 Pro Mode – Full Analysis (plus o1 paper highlights) Dec 5, 2024 1003 Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears. Weights and Biases’ Weave: wandb.me/ai_explainedPlus, GPT-4.5? MLE Bench, Simple Update, Image Analysis and much more o1 System Card: https://cdn.openai.com/o1-syst

AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution Dec 5, 2024 929 Calmest before the storm? Whatever analogy you want to use things had gotten quiet toward the end of 2024. But then tonight we got Genie 2, and a series of scheduled announcements from OpenAI. Sora is soon here, and o1, but I dive deeper into what it all means and whether reliability is on a path to being solved, ft: two recent papers. Assembly AI Speech to Text: https://www.assemblyai.com/?utm_so

New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem Nov 15, 2024 919 A new and mysterious Gemini model appears at the top of the leaderboard, but is that the full story? I dig behind the headline to show you some anti-climactic results, give some context with leaks in the last 48 hours of diminishing returns to scaling, and add the response of Altman, OpenAI and co. The future is about to look a lot stranger...80,000 hours Podcast and Channel: https://open.spotify.

Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’ Nov 10, 2024 944 The last few days have seen two narratives emerge. One, derived from yesterday’s OpenAI leak in TheInformation, that GPT-5/Orion is a disappointment, and less of a leap than GPT-3 to GPT-4. The second comes from a series of 4 clips (shown in this video) from Sam Altman, regarding the ‘clear path’ to AGI. Let’s go beyond the headlines (and through papers like Frontier Math) to get closer to the gro

ChatGPT with Search, Altman Answers Anything and Simple Bench Out Nov 1, 2024 920 The Google destroyer, the Perplexity crusher? Or just hype? ChatGPT with Search is here, and simultaneously Altman and co did an AMA on Reddit, covering GPT-5, Sora, SearchGPT and a lot more. Plus, the biggest news of them all: Simple Bench is out.ChatGPT with Search: https://openai.com/index/introducing-chatgpt-search/Altman AMA (ask me anything): https://www.reddit.com/r/ChatGPT/comments/1ggixzy

The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think Oct 28, 2024 1354 A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out? Is it for real, and are AI agents about to grab your mouse and shake your cursor? Plus, results on my own Simple Bench, and new tools from Runway (Act-One), HeyGen (Zoom Calls) and an updated NotebookLM. AI, without the hype.Weights and Biases' Weave: https://wand

Episodes

Recommended