EditorialOperatorLessons learned

What We Got Wrong Building LLMDex (and What We'd Do Differently)

An honest postmortem from 18 months of building a programmatic SEO site for AI tools. The architectural mistakes, the editorial misjudgments, and what we'd do differently.

Published Apr 30, 2026By LLMDex Editorial

LLMDex started as a Notion table in early 2024. By April 2026 it tracks 80 LLMs, 60 AI tools, 60 use-case guides, and ships ~2,400 indexable pages. It's profitable on affiliate revenue and growing on Google search traffic. By the standards of indie content sites, that's a success.

It's also a list of mistakes we made along the way that we'd do differently if we started over today. This article is the honest postmortem. Some of it will be useful if you're building anything programmatic-SEO-shaped. Some of it is just things we wish someone had told us 18 months ago.

Mistake #1: Started with too few models

The first public version of LLMDex shipped with 25 models in the dataset. We'd reasoned that a small, curated, accurate set was better than a sprawling list with fabricated data. That logic was right, but the threshold was wrong. Twenty-five was below the minimum for the programmatic-SEO play to work.

The issue: comparison pages need at least ~50 models to produce enough indexable URLs (~1,200 head-to-head pages) to register meaningfully on search. Below that, you're producing a handful of compare pages that get lost in the long tail. Above that, you're producing enough density that one model's pages reinforce another's via internal linking.

We hit serious traffic only after we got to ~60 models. The 25-to-60 expansion was the single biggest growth lever we pulled. In retrospect, we should have started with ~60. The marginal effort to add models 26-60 was small; the traffic difference was huge.

Lesson for anyone building programmatic SEO: scale matters more than you think for the long-tail SEO play. Don't ship with too small a dataset.

Mistake #2: Spent too long on the homepage

For roughly three months in mid-2024, our team was iterating on the homepage. New hero copy, new sections, new visualizations. The reasoning: new visitors land on the homepage, so the homepage drives growth.

Analytics told a completely different story. New visitors didn't land on the homepage, they landed on model spec pages and compare pages from Google searches. The homepage was for repeat users, who were already converted. We were optimizing the wrong page.

In retrospect, we should have invested that three months in the spec pages. Better spec-page design, better internal linking, better CTA placement on spec pages, all would have moved metrics that mattered. Homepage iteration was status-game work, not value-creating work.

Lesson: optimize for entry pages, not for the page you happen to land on when you visit your own site. Look at GA4 landing-page data weekly.

Mistake #3: Underinvested in trust signals early

The first version of LLMDex didn't have a methodology page. It didn't have a clearly-marked sponsored-link disclosure. It didn't have last-updated stamps on content pages. It didn't have an about page worth reading.

These all seemed like nice-to-haves at the time. They're not. They're table-stakes, and Google's helpful-content updates increasingly weight them in rankings. Sites that visibly explain their data sources, openly disclose monetization, and stamp content with currency information rank better.

We retrofitted these in early 2025 and saw measurable ranking improvements within weeks. In retrospect, we should have shipped them in v1. The cost was small (a weekend of writing). The compounding benefit was large.

Lesson: trust signals are not optional features. They're SEO and conversion requirements. Ship them with the first public version, not the third.

Mistake #4: Over-engineered the data layer

Early on we built a sophisticated TypeScript data pipeline with strict schemas, automated cross-reference validation, programmatic test coverage of every field. The reasoning: this is a data-quality moat, so the data should be rigorously typed.

The cost of that decision became visible after about three months. Adding a new model required updating a TypeScript type, running the validator, fixing any cross-references, etc. The "right way" took 30 minutes per model. The "shortcut" of just editing JSON took 3 minutes.

Six months in, we'd added maybe 15 new models. We should have added 50+. The friction was killing throughput.

In retrospect, we should have started with looser schemas and tightened them as the dataset matured. A "wrong" entry in a quick JSON edit is much easier to fix in retrospect than the friction-of-correctness preventing entries from being added at all.

Lesson: optimize for throughput in the early phase. The data quality you can't write because of friction is worse than slightly-imperfect data you can write fast.

Mistake #5: Wrote AI-generated filler content for early articles

Our first attempt at blog content used a fairly aggressive AI-generation pipeline. Pick a topic, generate 800 words with GPT-4, do a light edit, ship. The output was technically grammatical and topically relevant. It was also boring, formulaic, and forgettable.

That content didn't rank. Google's helpful-content updates of late 2024 specifically targeted this pattern. We deindexed the worst of it in early 2025 and replaced it with hand-written long-form articles. The hand-written articles ranked dramatically better.

In retrospect, we should never have used AI-generated filler content. The savings (faster content shipping) were swamped by the costs (SEO penalty, brand erosion, having to redo the work).

Lesson: AI-generated content is a trap for SEO-driven sites. Hand-written, opinionated, specific content is the only thing that ranks durably in 2026.

Mistake #6: Treated affiliate links as an afterthought

Affiliate revenue is now the single largest revenue stream for LLMDex. In v1, affiliate links were a footnote, embedded in spec pages, no clear CTA, no rate-card tracking, no per-partner reporting.

The result was that we were leaving meaningful revenue on the table. A clear, visible "Try [Tool]" button on the right places of high-traffic pages converted at 3-5x the rate of inline mentions. Per-partner reporting helped us see which partnerships were performing and which weren't.

In retrospect, we should have treated affiliate revenue as a primary revenue stream from day one. This means clear CTAs in clear places, per-partner conversion tracking, and rate-card management as a real ops function rather than a once-a-quarter thing.

Lesson: if affiliate revenue is part of your business model, it deserves operational attention from v1. Not a feature flag, a real strategy.

Mistake #7: Underweighted internal linking

For most of v1, our internal linking was ad-hoc. Spec pages linked to a few related compare pages but not consistently. Compare pages didn't always link back to both spec pages cleanly. Best-for pages didn't always link to model spec pages.

This was bad for SEO. Internal linking is one of the strongest signals Google uses to understand site structure and topic clustering. Sparse linking leaves the site looking like disconnected pages rather than a coherent reference.

We fixed this in early 2025 by mandating "every page links to ≥5 related pages" as a hard constraint. Implementation was straightforward (programmatic relations between spec/compare/best-for/alternatives). The ranking improvements were noticeable within weeks.

In retrospect, internal linking should have been part of the spec for v1 architecture. It's basically free if you design for it; expensive to retrofit.

Lesson: design programmatic SEO sites for dense internal linking. Build the relations into the data model.

Mistake #8: Wrote content for ourselves rather than the audience

Several early blog posts were written for the AI-builder community ("here's an interesting architectural pattern we tried"). They got modest traffic from technical readers but didn't rank for any serious queries.

The articles that grew traffic were the ones written for the audience that was actually searching: "Best LLM for coding," "GPT-5 vs Claude," "What is GPT-5.5." These titles aren't beautiful. They are what people search for. The content has to follow.

In retrospect, we should have started with search-intent-driven content. Write the article that answers a real query. Make the title contain the query. Don't write for our own taste.

Lesson: SEO content lives or dies by search-intent matching. Write to the queries, not to your own interests.

What we did right

Lest this read as pure self-flagellation, three things we got right:

Data sourcing discipline. From day one, no fabricated benchmarks. Where we couldn't source a number, the field was blank. This editorial decision compounds, we've had researchers, journalists, and competitors check our numbers, and the trust we've built is durable.

Programmatic compare pages. The compare-page architecture (rendering a verdict synthesized from data deltas, not LLM-written prose) was the central architectural decision that made the SEO play work. Same shape, genuinely different content per page.

Long-form blog as anchor content. The journal pages don't drive most of our traffic, the spec/compare/best-for pages do. But the journal pages drive the highest-quality referrals (newsletter signups, sponsorship inquiries, partnership proposals). Investing in long-form was correct even though the immediate SEO ROI was lower.

What we'd do differently from scratch

If we restarted LLMDex tomorrow:

Ship 60 models in v1, not 25. Density matters.
Trust signals (methodology, sponsored disclosure, last-updated, about) in v1. Not v3.
Loose schemas for the first 6 months, tighten as dataset matures. Throughput first.
No AI-generated filler content. Hand-written from day 1.
Affiliate links as a primary revenue stream with clear CTAs and per-partner tracking from v1.
Dense internal linking baked into the data model.
Search-intent-driven content from day 1. No vanity articles.
Optimize the spec-page experience, not the homepage, until you have meaningful repeat traffic.

Each of these decisions is small in isolation. Compounded over 18 months they're the difference between a site that grows and a site that doesn't.

The deeper takeaway

Building a programmatic SEO site is mostly about discipline. The rules aren't secret, Google publishes most of them. The hard part is shipping every page with the discipline applied, every time, for two years. Most sites that fail in this category fail not because the strategy was wrong but because the execution was inconsistent.

LLMDex isn't done. We're still iterating. But after 18 months of building, the lessons above are the ones we actually believe.

Keep reading

Friday digest

One short email every Friday, new model launches, leaderboard moves, and pricing drops. Curated by hand. Free, no spam.