sqlite-leap is a multi-language rewrite of SQLite generated entirely from a single language-neutral specification. No engineer wrote a line of application code by hand. The spec drove everything. Seven days. Two Claude Max usage limits burned through. Five engines. This post explains how we did it, what we learned, and why it matters.
What We Built
We took SQLite — one of the most deployed software libraries in the world, a 24-year-old C codebase with byte-level on-disk invariants — and rewrote it from scratch across five programming languages using AI agents working from a single structured spec.
The spec lives under parts/ in the repository: ~36,000 lines of language-neutral prose and schema. The agents generated ~208,000 lines of engine code across the five language trees (~249,000 with per-target test and benchmark harnesses). The five targets: C, Rust, Zig, Go, and Python, plus a WASM build from the C engine. That's a 5.8× leverage ratio — 36K lines of spec in, 208K lines of working engine code out.
The repository is public: github.com/safitudo/sqlite-leap. Engine source is gitignored — the spec and the tests are what's checked in. Because code is regenerable; the spec is the artifact you keep.
Why SQLite
We needed a proof that spec-driven development works at production scale, not just on toy examples. SQLite is the ideal test case for three reasons:
- It is genuinely hard. SQLite handles parsing, query planning, B-tree storage, transactions, and on-disk byte invariants that have been stable for 24 years. This is not a TODO app.
- It has a definitive correctness gauntlet. The upstream
sqllogictestcorpus is 622 files that SQLite uses to test itself. There is no arguing with those numbers. - It has byte-level observability. Two implementations are either byte-identical on the same input or they aren't. No fudging the results.
If spec-driven development holds for SQLite, it holds for your CRM, your auth service, your billing engine, your legacy system nobody wants to touch. SQLite was the hardest reasonable target we could pick.
The Specification
The spec is entirely language-neutral. It describes what SQLite does, not how any particular implementation should do it. Organized into discrete parts/ files covering:
- SQL parsing rules. Supported statements, syntax, and semantics.
- Storage engine behavior. B-tree page format, overflow handling, free page management.
- Transaction semantics. ACID properties, journal modes, locking protocol.
- Type system. SQLite's dynamic typing rules, type affinity, and coercion behavior.
- Query execution. How SELECT, INSERT, UPDATE, DELETE, and JOIN operations execute against the storage engine.
No code. No language references. Just behavior and acceptance criteria. The 36K lines are the most valuable thing in the repo — they survive every regeneration, every model upgrade, every language port.
The Process
Step 1: Write the spec
We worked from the SQLite documentation, the file format spec, and the source code itself. Every behavior got translated into a discrete acceptance criterion in parts/. This was the most time-intensive part of the week — writing a spec precise enough for an agent to execute correctly requires understanding the domain deeply enough to define it without ambiguity.
Step 2: Generate implementations
Each language implementation was generated by an AI agent working from the same spec. The agent received the spec as context and produced application code in the target language plus a per-target test harness. The agents worked independently — no implementation was used as a reference for another. Each one was generated from the spec alone.
Step 3: Validate against sqllogictest
The correctness bar was SQLite's own upstream test corpus: 622 files, every statement counted. We ran all five targets against the full suite on Linux x86_64 and compared record-level pass rates against mainline SQLite. No partial credit, no skipping files that seemed hard.
Step 4: Verify byte identity
We ran two test fixtures — a small one (270 rows, one B-tree page split) and a large one (5,000 rows, deep tree splits) — through mainline SQLite and all five leap engines, then compared SHA1 hashes on the output .db files. All six implementations. Same hash. This was the finding we didn't expect.
Results
| Metric | Value |
|---|---|
Lines of specification (parts/) | ~36,000 |
| Languages generated | C, Rust, Zig, Go, Python + WASM |
| Lines of generated engine code | ~208,000 (~249K with harness) |
| Spec → code leverage | ~5.8× |
| Hand-written application code | 0 lines |
| Byte identity vs. mainline SQLite | All 5 engines — identical SHA1 on both fixtures |
| Crashes (compiled targets) | 0 |
sqllogictest correctness — 622-file upstream corpus, record-level pass rate:
| Target | excl-SKIP | exec / mainline surface |
|---|---|---|
| sqlite-mainline | 100.00% | 100.00% |
| sqlite-leap-rust | 99.98% | 99.96% |
| sqlite-leap-c | 99.97% | 99.96% |
| sqlite-leap-python | 99.98% | 98.94% |
| sqlite-leap-go | 99.92% | 99.93% |
| sqlite-leap-zig | 98.88% | 99.96% |
The four compiled targets (Rust, C, Go, Zig) each attempt 99.93–99.96% of mainline's record surface with zero crashes. leap-python covers 98.94% — the pure-Python interpreter still times out on some of the heavier random/* files.
What We Learned
1. The spec is harder to write than the code
Writing a specification precise enough for an AI agent to execute correctly is significantly harder than writing the code yourself. But the spec is reusable. The code is disposable. A good spec can generate new implementations in any language at any time — we proved that across five languages from a single source.
2. Byte identity was a surprise
We expected the implementations to be functionally correct. We didn't expect them to be byte-for-byte identical to a 24-year-old C codebase that nobody's allowed near for stability reasons. Five languages, five fresh implementations, same SHA1. That's what a precise spec does — it forces convergence on the standard, not just something that passes tests.
3. The test suite is the most valuable artifact
The code can be regenerated at any time from the spec. The spec can be refined. But the test suite — 622 files of SQLite's own correctness gauntlet — is the thing that tells you whether you're right. It's the only artifact that survives every regeneration, every model upgrade, every language port.
4. Agent-agnostic specs produce agent-agnostic results
Because the spec is language-neutral and tool-neutral, we can regenerate implementations using any AI model. The spec works across Claude, GPT, Gemini. This is what we mean by agent-agnostic delivery: the spec is the contract, the model is interchangeable.
What We're Not Claiming
Being honest about the scope matters:
- Not faster than SQLite. We lost INSERT throughput to mainline on every apples-to-apples comparison. SQLite is 24 years of hand-tuning; we had seven days. Use SQLite for production.
- Not one-button regenerable end-to-end. Leaf parts up to ~3,000 LOC regenerate cleanly per target. The large monolithic compiler files were agent-emitted once and are now maintained as source. The cross-target convergence doesn't require full regen — it requires the spec being neutral enough that five languages implement it equivalently. That part holds.
- Not for MVPs or prototypes. LEAP earns its keep when the codebase is going to outlive its first author — production infrastructure, multi-language ports, systems someone will be maintaining in two years. Vibe-code your prototype with whatever agent you want. Use LEAP when you're ready to charge money.
- Not a benchmark you'd cite in a paper. The numbers are artifacts with every asterisk documented in PUBLICATION.md. Every number has a CSV citation.
Why This Matters
sqlite-leap is not a product. It is a proof. The mental model it makes obsolete:
- "This is written in Python — it's stuck in Python forever." No it isn't.
- "We can't rewrite the legacy system, it'd take 18 months." It might take a week.
- "Cross-language ports are exotic projects." They're a Tuesday-through-Friday now.
- "The senior who wrote this is gone, we can't touch it." You don't need them. You need the tests.
The artifact you keep isn't the code. It's the spec and the tests. Code is regenerable, in any language, by any agent. That's the thesis behind everything Leap Agentic does.
Explore the Project
The full repository — spec, tests, raw CSVs, and the deep writeup — is public:
github.com/safitudo/sqlite-leap
Emitted source for all five engines is published as src-{c,rust,zig,go,python}.tar.gz in the v0.1.1 release with SHA256SUMS, if you want to browse, build, or audit without spinning up an agent.
Next step: If you want to see how this methodology applies to your engineering team, book the $500 diagnostic. We'll assess your current process across 12 dimensions and show you where spec-driven development would have the highest impact.