Think of water reaching 100 degrees C. For a long time, it gets hotter and hotter -- but it's still water. Then suddenly, it flips to steam. That's what happened in December 2025 for Karpathy. The models didn't just get a little better. Something flipped.
"I was in this perpetual state of AI psychosis. There was a huge unlock in what you can achieve as a person. I went from 80/20 to writing basically zero code myself. I don't think I've typed a line of code since December."
Karpathy describes "AI psychosis" as the disorienting state where your individual productive capacity suddenly multiplies. You're not coding anymore -- you're expressing your will to agents for 16 hours a day. And the skill ceiling is infinite. Every failure feels like a "skill issue" because in principle you could have prompted better, set up better context, run more agents.
The question is no longer "can AI write code?" The agent part is taken for granted. The new frontier:
"Code's not even the right verb anymore. It's not prompting either -- it's context and spec engineering, and then all of the harness things: tools, workflows."
He calls persistent agents "claws" -- always-on entities with memory, tool access, and the ability to reach into the real world (control smart home devices, manage Sonos speakers, run experiments). The next question: how do you manage teams of them?
"I feel a need for a proper 'agent command center' IDE for teams of them. See/hide toggle, check if any are idle, pop open terminals, usage stats. We're going to need a bigger IDE."
Imagine a PhD student who never sleeps, never gets discouraged, and runs experiments 24/7. They read the results of experiment #47, notice a pattern, design experiment #48, run it, and keep going. That's AutoResearch -- except it ran 700 experiments autonomously over 2 days on a single GPU.
~700 experiments over ~2 days. ~20 changes that all stacked additively:
"Time to GPT-2" dropped from 2.02 hours to 1.80 hours (11% improvement)
"This is a first for me because I am very used to doing iterative optimization of neural network training manually. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through ~700 changes autonomously is wild."
Karpathy's most provocative idea: make AutoResearch massively collaborative, like SETI-at-Home but for AI research. Millions of GPUs sitting idle in consumer hands. Agents contribute experimental results back to a shared repository. Not emulating one PhD student -- emulating an entire research community.
"The goal is not to emulate a single PhD student, it's to emulate a research community of them. Git is almost but not really suited for this -- it has a softly built-in assumption of one 'master' branch."
Typing speed, syntax knowledge, boilerplate generation, manual hyperparameter tuning, reading docs to find API calls
Context engineering, spec writing, taste/aesthetics, knowing what to build, calibration ("when to trust the output"), agent orchestration
"The real competitive edge is moving from 'can write code' to 'can ask the right questions.'"
Think of how animals evolved to fill ecological niches -- whales for deep ocean, hummingbirds for hovering at flowers, bacteria for extreme heat. AI models are doing the same thing: specializing into niches based on cost, latency, and domain expertise.
Karpathy argues we won't have "one model to rule them all." Instead, the model landscape is speciating like biological organisms. Different models will dominate different niches:
Massive reasoning models for research, complex coding, novel problems. High cost, high latency, incredible depth.
Fast, cheap models for structured extraction, classification, bulk processing. Flash-tier models dominate here.
Tiny models running locally for privacy, latency, offline use. The "consumer niche" that open-source fills best.
Fine-tuned for medicine, law, finance. Not the smartest overall, but the best at their specific job.
When steam engines got more efficient in the 1800s, you'd expect coal demand to drop. Instead it skyrocketed -- because cheaper energy unlocked new uses nobody imagined. Karpathy argues the same will happen with AI and engineering: making coding cheaper will increase the total demand for software, not decrease it.
Karpathy spent significant time analyzing BLS (Bureau of Labor Statistics) data. His findings:
The nuance Karpathy emphasizes: it's not a clean "AI replaces X" story. It's a restructuring. The 10x engineer becomes the 100x engineer. But the junior engineer's path to learning is disrupted -- which leads to the education question.
"The thing about the digital world is that it's very forgiving. If your agent writes buggy code, you just revert the commit. If a robot arm swings wrong, it breaks something physical."
Karpathy is cautiously optimistic about autonomous robotics but emphasizes the fundamental gap between digital and physical AI:
Instant feedback, cheap mistakes, perfect rollbacks, infinite environments. Software agents can already self-improve.
Slow feedback, expensive mistakes, no undo button, sim-to-real gap. Robot policies still hand-tuned. VLAs + world models are improving but not there yet.
His personal project: a robotic claw controlled by Claude, running on his new DGX Station GB300 (a gift from Jensen Huang). He calls it "Dobby the House Elf."
If coding is now done by agents, how do you teach programming? Karpathy's answer: build the simplest possible LLM from scratch.
Strip away all the complexity. Build a tiny language model (~630 lines of code) that you can actually understand end-to-end. The point isn't to build GPT-4 -- it's to build intuition for what these systems are doing under the hood.
This is the same philosophy behind his famous "Neural Networks: Zero to Hero" series -- but updated for the agent era. Understanding the substrate matters even when you're not writing the code yourself.
"It's not that programming becomes irrelevant. It's that the level of abstraction shifts. You need to understand the machine to direct it well -- even if you never type the code."
Karpathy offered Q&A in his reply thread. Here are the sharpest exchanges:
You said Claude feels like a teammate. What did Anthropic actually do in training? Is the field intentional enough about model personality?
"I'd hope AIs can feel like Rocky from Project Hail Mary -- a partner and teammate. When Claude found my Sonos system, it said 'We're in!...' instead of 'Successfully found the Sonos server.' That sense that we're trying to achieve something together. Personality doesn't require new technology -- it looks like long SOUL.md files, possibly distilled into weights."
Are you happy with the code quality agents give?
"I'm not very happy. Agents bloat abstractions, have poor code aesthetics, are very prone to copy-pasting code blocks. They don't listen to my AGENTS.md instructions. No matter how many times I say 'use intermediate variables as documentation,' they still multitask one-liners. At some point just a shrug is easier."
"The SETI-at-Home angle is the most interesting. Distributing actual compute to researchers who might run it in different contexts -- that's fundamentally different research velocity than a single lab."
"The fact that Karpathy described using coding agents as developing 'muscle memory' and then casually dropped 'hence the psychosis' tells you everything about where software engineering is rn. It works, it's powerful, and it's making everyone slightly insane."
"A distributed compute layer where independents can contribute would collapse the barrier from 'needs VC funding' to 'needs a good idea.' That's the phase shift for the rest of us outside the frontier labs."
"When agents stop listening to style guides and Karpathy's response is 'I stopped fighting it' -- the only lever left is writing a spec clear enough that the agent's first pass is close to right. That's not prompting. That's product management."
We've entered the "Loopy Era" -- where AI agents don't just assist, they close the loop on entire workflows: coding, research, optimization. Karpathy hasn't typed code since December 2025. His AutoResearch project ran 700 experiments autonomously and found real improvements he missed in 2 decades of manual tuning.
The skills that matter now: context engineering (not prompting), taste and direction-setting (not implementation), and agent orchestration (not coding). Verifiable domains will increasingly belong to machines. The human edge lives in judgment, aesthetics, and knowing what to build.
The wildest vision: AutoResearch going SETI-at-Home, with millions of distributed GPUs contributing to open AI research. Not one PhD student -- a global research community of agents.