Why We Built LLMDex
A short story about how an internal model-tracking spreadsheet became a public site, and what we learned along the way.
Every project that ships in public has an origin story, and most of them are less mysterious than they sound. LLMDex started as a spreadsheet. This is the short version of how it became a website that catalogues 80 LLMs, 60 AI tools, 60 use-case guides, and over 1,800 head-to-head comparisons.
If you're building anything in public, a database, a directory, a curated reference, the lessons here might save you a few months.
The spreadsheet phase
In early 2024, our small team kept a Notion table titled "Models." Every time a new model launched, we'd add a row: name, provider, release date, context window, pricing, our subjective takes. The intent was internal, we needed a quick reference for "what should we use for this workload" decisions, and our memories couldn't keep up with the launch cadence.
The table was useful. It also drifted constantly. Pricing changes weren't reflected for weeks. New benchmark numbers got added inconsistently. We'd cite the table in meetings and then realize the number was a month stale. Internal-only tools have no incentive to stay accurate; the cost of a wrong cell is small, the cost of the discipline to keep them right is significant.
By mid-2024 the table had ~40 rows and was wrong in maybe a third of cells.
The "why isn't this public" moment
Around June 2024 we noticed three things:
- We searched for "X model API pricing" on Google several times a week, every week. The official pages were sometimes hard to find, sometimes inconsistent with each other, sometimes missing key details (context window, knowledge cutoff).
- The first results on Google were either provider PR pages or AI-spam content farms. Neither was authoritative. Neither was up to date.
- Artificial Analysis existed and was great, but it focused on benchmarks rather than pricing/specs/comparisons. There was a gap.
If our internal table, even partially out of date, was more useful than the top Google results, the table belonged on the internet. The leap was less "let's build a website" and more "the internet is missing this; we accidentally have what's needed; let's stop hoarding it."
The honest version of why
We could pretend LLMDex was started purely as a public good. The honest version: programmatic SEO is a viable business model for niche reference content, and the AI tooling space in 2024 was a category where (a) the gap was real, (b) the audience was high-CPM and willing to click affiliate links, and (c) the data layer is the moat.
We built it for the same reason most great reference sites get built: a combination of "this annoys us" and "this could pay for itself."
The first version
The first public LLMDex shipped in late 2024 with 25 models, 20 tools, and a hand-coded Next.js site that took maybe two weeks to build. Three things were intentional:
- No fabricated data. Every benchmark on the site traced to a public source. Where we couldn't verify, we left blanks. (We've written separately about this policy.)
- Comparisons are programmatic, not LLM-written. Every
/compare/[a-vs-b]page renders a verdict synthesized from data deltas, not generic AI prose. The page-shape is identical; the content is genuinely different per pair. - Affiliate links are visibly labeled.
rel="sponsored"on every link. A "sponsored" badge in the UI. We never adjust rankings based on commercial relationships.
The early traffic was slow. The early users were friends and people who found us via Hacker News on launch day.
The pivot to programmatic SEO
Around early 2025 we realized the comparison pages, auto-generated from the dataset, were doing something the spec sheets and tool pages weren't: ranking on long-tail "X vs Y" searches that no other site indexed comprehensively. Within four months of starting to ship comparisons, those pages drove 70% of our traffic.
The lesson: in a fast-moving space with thousands of viable model pairs, programmatic SEO works if and only if the content quality at scale is real. Spam compares ("here's why X is better than Y, with no actual data") get penalized fast. Real compares (with actual data deltas, real benchmark differences, contextual prose) rank.
This shaped the rest of the build. We invested heavily in the comparison-generation logic. Every compare page on the site today is the output of lib/verdict.ts, a deterministic function that produces meaningfully different text per pair because the input data differs.
What we got wrong early
Three mistakes worth flagging:
1. Too much focus on the homepage
For three months in 2024 we obsessed over the home page. New users land on the home page, the thinking went. But our analytics showed something different: users almost never landed on the home page. They landed on a model spec or compare page from a search result. The home page was for repeat users, not first-time visitors. We should have invested in the spec pages first.
2. Over-engineered the sitemap
We tried to be clever about which pages to include in the sitemap based on "expected search demand." This was both speculative and fragile. We'd flip pages in and out of the sitemap, Google would re-crawl, traffic would yo-yo. The right answer was simple: sitemap everything that's indexable, let Google figure out demand.
3. Underinvested in trust signals
Our first version had no methodology page, no clearly-marked sponsored links, no last-updated stamps, no "about" page worth reading. Trust signals are slow to build but compound. We retrofitted these in early 2025. They're the table-stakes we'd build first if we redid the project.
What we got right
Three things we'd do the same:
- Started with a real dataset. A working spreadsheet with 25 entries was more useful than 1,000 entries scraped poorly. Quality before quantity.
- Avoided fabrication from day one. "Benchmark not yet available" was a deliberate choice. Trust earned slowly compounds; trust lost is hard to rebuild.
- Shipped publicly early. The version we launched was embarrassing by 2026 standards. It was real, it got users, and the feedback loop made the product better than the perfect version we'd have shipped six months later.
What 2026 looks like
LLMDex covers:
- 80 LLMs with full spec pages
- 60 AI tools with alternatives pages
- 60 use-case guides
- 1,800+ head-to-head comparisons (auto-generated, programmatically verified)
- Long-form blog (this article is one of them)
- Friday newsletter
The site is profitable from sponsorships and affiliate revenue. It's still a small team. The maintenance burden is roughly 8 hours per week, most of which goes into keeping the dataset current as new models ship.
The audience is what we'd hoped: developers, AI engineers, technical founders making real decisions about what to deploy. The feedback we get is technical and specific (here's a benchmark you got wrong, here's a model you're missing) which is exactly the feedback that makes a reference site better.
Why we keep at it
Three reasons:
- The domain is interesting. LLMs and AI tools are one of the most rapidly evolving fields in software. Cataloguing it well is genuinely useful work.
- The economics support it. Programmatic SEO is a real model for reference content; we're proving it out on a niche we know.
- The trust frame matters. "Honest, audited reference data" is a position we believe in. Defending it long-term is its own reward.
If you're considering building something similar, a directory, a database, a curated reference, the playbook is straightforward: pick a domain you know, ship a small honest version, invest in the data layer, optimize for trust signals, let SEO compound. The rest is patience.
Further reading
Keep reading
- The Five LLM Myths That Won't Die
Reasoning models hallucinate too. Open-weight is not always cheaper. And three more myths the AI Twitter consensus needs to retire.
- The Two Rules of Honest AI Data
Don't fabricate. Don't omit context. The full editorial standard behind LLMDex's data and how to apply it to your own work.
- How to Read an AI Benchmark: A Skeptical Reader's Guide
MMLU, HumanEval, SWE-bench, GPQA, what they actually measure, how providers game them, and how to think about benchmark numbers in 2026.
Intelligence, distilled weekly.
One short email every Friday, new model launches, leaderboard moves, and pricing drops. Curated by hand. Free, no spam.