Two Speeds of Building a Platform

May 6, 2026

Last month I opened a pull request that contained zero lines of code. Five markdown files, a decision log, a fact-check appendix, and a roadmap that explicitly admits which three decisions could still be wrong. That’s it.

If you’d asked me a year ago whether I’d ship a “PR” like that, I’d have laughed. Designs were something you scribbled in a notebook on the way to the keyboard. The work was the code. Everything else was overhead.

I don’t think that any more. The reason I changed my mind is the same reason everyone in our industry is suddenly arguing about whether to slow down: AI made the keyboard part cheap, and that quietly inverted which work is actually expensive.

Two Speeds of Thought

Daniel Kahneman’s framing keeps coming up in LLM conversations for a reason. System 1 is fast, automatic, pattern-matching. System 2 is slow, deliberate, analytical. Andrej Karpathy described LLMs in his Dwarkesh interview as ghosts of statistical pattern-matching — a System 1 distillation of human text. They are extraordinarily good at the fast kind of thinking and merely competent at the slow kind.

Building a platform has the same two speeds. The fast phase is “write the function, wire the route, add the test.” The slow phase is “decide what the tenant boundary is, what the budget invariant must guarantee, which decisions we’re allowed to be wrong about.” AI collapses the cost of the first phase to near zero. It does almost nothing for the second.

So if your build feels gloriously fast and you’re suspicious it shouldn’t, the question to ask yourself isn’t “should I use AI?” — it’s which phase am I in?

The Illusion of Speed

llm-workers v1 was built fast. A Slack bot, an orchestration layer, a few agents, ship it. It worked. It also accumulated a tenancy model that didn’t really exist, a budget gate that ran after the LLM call instead of before, and a secret broker that hadn’t been threat-modeled against the OWASP LLM list. None of those bugs were visible from the keyboard. They were visible only from a chair, away from the keyboard, asking what the system was actually doing.

When I sat down to plan v2, I made a deliberate choice to stay in the slow phase until I was confident I knew what to build. The output of that phase is the PR I mentioned: five documents that say things like “the tenant is the org, not the workspace” and “every LLM call must be attributable to a four-tuple or the span is rejected at the collector.” Boring. Declarative. Slow.

Here’s what the slow phase caught that the fast phase would have shipped:

A wrong fact about Anthropic prompt caching. The first draft assumed a March 2026 TTL change from one hour to five minutes. The fact-check loop pulled the actual changelog. The change wasn’t a TTL regression — it was a cache isolation improvement, from org-level to workspace-level. The plan inverted from “we have to design around a shorter TTL” to “we can lean on the cache more, scoped per workspace.” Two weeks of imagined refactoring, gone, before it was written.

A wrong assumption about Firecracker. The first plan budgeted for bare-metal AWS instances because that’s what every Firecracker tutorial written before February 2026 says you need. Then AWS quietly added nested virtualization to standard C8i/M8i/R8i tiers and the cost dropped from roughly nine dollars an hour to under one. Same architecture, ten-x cost delta, hidden in a release note nobody had reason to check.

A Slack API that doesn’t exist. The human-in-the-loop design called hook.resume() on a Slack Bolt app to wake a paused workflow. Bolt has no such method. The idiomatic pattern is action_id handlers plus an external workflow signal. Catching that on paper meant it never became a class hierarchy.

A protocol bet I would have made too early. Generative-UI protocols were churning when I started; AG-UI looked like one of three plausible winners. By April it had clearly won. If I’d built against it in February, I’d have rebuilt twice. By being explicit that the bet was for next year, I bought myself the right to wait.

None of these were code bugs. They were assumption bugs. Code review wouldn’t have caught them. Tests wouldn’t have caught them. The only thing that catches an assumption bug is sitting with the assumption long enough to disagree with it.

When Slowness Paid Off

The v2 plan does three things I don’t think I’d have managed at the keyboard, no matter how good the autocomplete.

Invariants pinned by CI, not by hope. The plan declares five invariants the platform must hold — no cross-tenant reads, every LLM call attributable to a four-tuple, budget debited before dispatch, untrusted content never reaching a tool-calling LLM without a spotlighting boundary, durable state surviving kill -9. Each one has a CI test that fails the release if the invariant breaks. Without that list, the rules would live in my head, which is the worst place to keep them.

A decision log with rejected alternatives. Fourteen named decisions, each with the option I picked and the options I didn’t. Temporal vs DBOS, Pinecone vs pgvector, Langfuse vs Grafana-plus-Postgres. The point isn’t that I picked right; the point is that “why isn’t this Temporal?” is a five-minute conversation in six months instead of a week of archaeology.

A migration that ships value at every step. Five phases, strangler-fig. Tenancy retrofit ships first, because retrofitting org_id onto a hot schema later is the kind of mistake that ends platforms. Each phase is independently shippable; nothing in phase three depends on phase four existing. The plan is explicit about which phases could be reordered if the business changes its mind, and which dependencies that would break.

You’ll notice none of this is code. It’s all decisions about code I haven’t written yet. The shape of the work is “what would future me thank past me for nailing down before there’s a backlog of stakeholders to argue with?”

Your Turn

You probably don’t need to write a fact-checked design appendix. But if you’re shipping anything non-trivial with AI in the loop, three habits seem worth stealing from the slow phase:

Say which phase you’re in. “I’m still figuring out the constraints” is a different sentence than “I’m implementing the design.” Conflating them is how you end up with a beautiful implementation of the wrong thing. Even on a small task, name the phase out loud at the top of the day.

Write down what would have to be true for this to be wrong. This is the cheapest pre-mortem you can run. For the v2 plan, the question surfaced the Anthropic-cache-TTL mistake within an hour. AI is unusually good at generating these “what would have to be true” lists if you ask for them — that’s a System 1 task at heart.

Keep a decision log, even informally. A list of “I picked X over Y because Z” entries, written at the moment of decision, is worth more than any retrospective document written after the fact. You will not remember why you chose DBOS over Temporal in eight months. Past you will.

If the project is significant, consider a fact-check pass before you start building. Pull the primary sources for every load-bearing claim. The first time you do this, you’ll find at least one wrong fact you’d have built around. It’s almost embarrassing how reliably it works.

Wrapping Up

The thing I keep coming back to is that AI didn’t make the slow phase optional. It made the fast phase cheap, and the gap between the two phases got more expensive to misjudge.

A pull request with no code, written carefully, can be the highest-leverage thing you ship in a quarter. Not because the document is precious — most of it will be wrong eventually. Because writing it forces the disagreements to happen on paper, where they cost nothing, instead of in production, where they cost everything.

Until next time.