Software gets built by adding things. Features, abstractions, integrations, configuration. Nobody ships a press release that says “we deleted 40% of the product this quarter.” But addition has a cost that compounds silently – every feature you add makes the next feature harder to add, every abstraction you introduce is an abstraction everyone has to understand, and every integration is an integration someone has to maintain forever.
I wanted to see what happens when you go the other direction. Start with a working codebase and subtract.
beads is a dependency-aware issue tracker for AI coding agents. Backed by Dolt (version-controlled SQL). The core idea is good – agents create issues, track what blocks what, and bd ready returns unblocked work. Simple graph problem.
The implementation had grown to 210,000 lines of Go across 250 commands. A formula DSL with its own parser, evaluator, and AOP-style advice system. Six external tracker integrations. A “molecule” system for template instantiation with phase transitions (solid, liquid, vapor – somebody had fun with that metaphor). A 151-method storage interface. Eight OpenTelemetry modules. Two storage backends.
The core loop – create, track, close, sync – needed maybe 15 commands.
I pointed Claude at it and told it to strip the project to the bone. It analyzed the dependency graph, mapped which commands called which storage methods, traced the import tree, and came back with a 6-phase plan:
Each phase left the binary compilable and tests passing. The whole thing took about three hours.
| before | after | |
|---|---|---|
| source lines | 135,127 | 46,500 |
| source files | 523 | 218 |
| commands | 250 | ~20 |
| storage interface methods | 151 | 0 (concrete type) |
| storage backends | 2 | 1 |
| external integrations | 6 | 0 |
| net lines deleted | 201,412 |
Two thirds of the codebase gone. 973 files changed.
The fork doesn’t do more than the original. It does strictly less. But “less” is the point.
Faster to understand. A new contributor reads 46k lines instead of 135k. There’s one storage backend instead of two. There’s no formula DSL to learn. The storage layer is a concrete type with methods on it, not a 151-method interface composed of 12 sub-interfaces with a decorator pattern and type assertions at every call site.
Smaller attack surface. Six fewer HTTP client libraries talking to external services. No OpenTelemetry dependency tree. Fewer transitive dependencies means fewer CVEs to track.
Easier to change. Every abstraction you remove is an abstraction that no longer constrains you. The storage interface existed to support two backends. With one backend, you can change the storage layer without negotiating with an interface contract. Adding a column to the issues table touches one file instead of two implementations plus an interface definition.
Faster builds, faster tests. Less code compiles faster. Fewer tests run faster. The feedback loop tightens.
Less confusing for agents. This is the ironic part. beads is a tool for AI agents, but 250 commands is a lot of surface area for an agent to navigate. Which command do I use to track a dependency – dep add, relate, link, or supersede? What’s the difference between mol seed, cook, pour, and wisp create? An agent working with bd sees 20 commands and the answer is obvious. The tool built for AI agents was actually harder for AI agents to use because it had too many options.
Less context window eaten. Every bd --help an agent runs, every bd ready output that includes molecule metadata and gate status and wisp lifecycle info, every error message that suggests 6 possible commands – that’s tokens. Agents pay for complexity with their most precious resource. A 250-command tool dumps a wall of text into the context window every time the agent asks for help. A 20-command tool gives it what it needs and gets out of the way.
None of this is controversial. Everyone knows simpler is better. The problem is that when code is cheap to write, we lose the discipline to say no. Why not add the formula DSL? Claude can write it in a day. Why not support six trackers? The integration code practically writes itself. Each individual feature seems free at the moment of creation. The formula DSL isn’t slowing down bd create. The Jira integration doesn’t affect the dependency graph. But collectively, 200,000 lines of optional code makes every change to the core harder than it needs to be.
The obvious question. If 20 commands is all you need, why not just write a fresh issue tracker in an afternoon?
Because the 46k lines that survived aren’t simple. Embedded Dolt integration with schema migrations, conflict resolution, and version-controlled commits. Hash-based ID generation that prevents collisions across agents and branches. A dependency graph engine that correctly resolves transitive blockers for bd ready. Backup and restore. Config management. Git hook integration. Partial ID resolution. Hundreds of edge cases in tests that represent real bugs someone already found and fixed.
beads has been battle-tested across millions of agent sessions. Starting from scratch means re-discovering all of that. Subtracting from a working codebase means you keep the hard-won correctness and throw away the accretion. The 184 test files that survived aren’t testing deleted features – they’re testing the core loop, and they pass.
The interesting part of this exercise wasn’t that AI can delete code. It’s that we’ve created a tool that makes code so cheap to produce that bloat is inevitable – and then realized we can use that same tool to radically simplify.
AI has made feature development nearly frictionless. Need a formula DSL? Claude can write one in an afternoon. Want six tracker integrations? Here’s the boilerplate, the auth flow, the sync logic. A molecule system with phase transitions? Why not, it’s just tokens and time. The marginal cost of adding features has dropped to almost zero, so we add everything. Every idea becomes code. Every “what if” becomes a command.
The result is 210,000 lines where 46,000 would do. Not because anyone planned it that way, but because when code is cheap, you stop asking whether you need it. You just build it.
But here’s the thing: the same AI that makes it trivial to add features makes it equally trivial to remove them. Claude analyzed the codebase, identified that 94% of the commands were optional, and proposed cutting them. It mapped dependencies, traced call graphs, and systematically deleted 200,000 lines while keeping every test green.
The AI that enabled the bloat can cure it. Not by being smarter about what to build – that’s still a human decision – but by making subtraction as cheap as addition. Removing a feature used to mean hours of archaeology, finding every reference, understanding every dependency. Now it’s a prompt: “Remove the formula DSL and everything that depends on it.” Three hours later, it’s gone.
Claude couldn’t tell me which features had users who would scream. It could map the dependency graph and tell me what was structurally necessary, but not what people actually rely on. I made the call that the formula DSL, molecule system, and tracker integrations weren’t worth their weight. That’s a product decision, not a technical one.
The tool is good at answering “what can we cut?” The human still has to answer “what should we cut?”
The flip side: if you cut something you actually needed, the original code is right there in the upstream repo. Adding a single feature back to a 46k-line codebase is straightforward. Removing a single feature from a 135k-line codebase is the hard direction. Subtraction first, selective addition later, is an easier workflow than trying to simplify something that’s already complex.
We’re entering an era where code is so cheap to produce that the default state of software is bloated. AI makes it trivial to add features, so we add them. The friction that used to keep codebases lean – the effort required to implement ideas – is gone. Every codebase is one enthusiastic afternoon away from doubling in size.
But the same tool that creates the problem can solve it. AI makes subtraction as cheap as addition. You can point Claude at a million-line codebase and say “strip this down to its essential purpose” and it will. Not perfectly – you still need to decide what’s essential – but systematically, comprehensively, and without the exhaustion that makes humans give up halfway through.
Most software would benefit from this kind of radical simplification. Not a refactor – a deletion. Take a codebase, identify the core loop, and cut everything else. The result is a codebase that’s faster to understand, easier to change, and more honest about what it actually is.
github.com/signalnine/bd. 20 commands. One database. 46k lines.