Field Notes · AI Development
Five Levels of
AI Programming
A working framework for the current state of AI-assisted development, and a clear map of where the industry is heading.
Framework by Dan Shapiro
This five-level model was originated by
Dan Shapiro, CEO of Glowforge and Wharton Research Fellow, published January 2026. Inspired by the SAE autonomous driving levels. Read the original:
danshapiro.com. Notes and commentary by Dan Grafham.
⚠ Not to be confused with OpenAI's "Five Levels". OpenAI published a separate five-level framework tracking progress toward AGI, measuring AI capability itself (Chatbots, Reasoners, Agents, Innovators, Autonomous Organizations). Shapiro's framework addresses an entirely different question: not how powerful is the AI, but how does a developer work with it. Same number. Completely different map.
- You type, AI suggests the next line
- You accept or reject in real time
- GitHub Copilot in its original form
- Essentially a smarter Tab key
Reality check
The human is still writing every line. The AI is just filling in blanks. No architectural shift has occurred yet.
- Hand the AI a discrete, well-scoped task
- Write this function, build this component
- Refactor this model
- You review everything that comes back
- Human handles architecture, judgement, integration
The dynamic
AI handles the task. Human handles the thinking. This is what most vendors mean when they say their tool "writes code for you."
- Multi-file changes and codebase navigation
- Understands dependencies and context
- Builds features, expands modules
- You're still reading all the code
- 90% of "AI-native" developers live here
The self-delusion trap
Level 2 feels more advanced than it is. The AI touched five files at once. Impressive! But you're still doing all the thinking and reading every line. The AI is executing your ideas faster, not replacing your judgement. Calling that "AI-native" is like calling a calculator "math."
- You direct, the AI implements
- Approve or reject at the feature or PR level
- The model submits PRs for your review
- Your job is now judgement, not execution
- Almost everyone tops out here currently
The psychological wall
For many developers, the code is how they think. Reading it is how they trust it. Giving that up feels like going blind. Those who break through tend to already think in systems, not lines. A useful reframe: a good manager doesn't read every email their team sends. They define the standard, then trust the outcome. Level 3 is learning to manage, not abdicate.
- Write a specification, walk away
- Come back and check whether tests pass
- Code is a black box, you don't read it
- Eval quality determines everything
- Requires deep trust in system and in yourself
What is an "eval"?
An eval (evaluation) is an automated scorecard that judges outcomes from the outside. Not whether the code is clean, but whether the software behaves correctly. Does the login flow work end-to-end? Does it handle 500 concurrent users? Does the output match these 20 real-world examples? You're not testing logic, you're testing observable reality. Writing a truly complete eval means thinking through every way the software could succeed or fail before it exists, which is a product thinking skill, not a coding skill.
- No human writes code
- No human even reviews code
- Factory runs autonomously, lights off
- Specification in, working software out
- Almost nobody operating at this level yet
What this means for software
If specs go in and software comes out, the specification
becomes the software. The real intellectual work shifts entirely upstream. Worth noting: this doesn't eliminate human value. It concentrates it. The person who can write a flawless spec for a complex system is extraordinarily rare and extraordinarily valuable. The dark factory needs an architect.
→ Full deep dive: The Dark Factory
Where is the industry right now?
Most developers are between Level 1–3, treating AI like a junior developer. When a startup claims "agentic software development," they usually mean Level 2 or 3.
Context · Why Agentic, Why Now
We're breaking ground each step of the way.
This didn't happen all at once. First the models had to get good enough that it was even worth building serious coding tools around them. Then the tooling had to catch up. Then the workflows. We're still in the early stages of all three converging.
"Agentic" simply means the AI operates with autonomy over a sequence of steps. It takes actions, reads results, and decides what to do next without waiting for a human after each move. It's not a product feature. It's a mode of operation that only became practical once the underlying models got reliable enough to trust with a longer leash.
The five levels in this document map that leash, from Level 0 (pure autocomplete) to Level 5 (fully autonomous factory). Most of the industry is still figuring out Level 2 and 3. This is a new frontier, and the map is still being drawn.
Decode the marketing: when a vendor says their tool "writes code for you," they usually mean Level 1. Adjust expectations accordingly.
Also in this series