80 models · 60 tools · 60 guides

The definitive intelligence index.

An opulent, real-time ledger of parameters, context windows, and deployment economics across 80+ Large Language Models. No fabricated benchmarks. No marketing fluff.

Browse 80 models Compare any two Browse tools

Dataset · UpdatedApr 30, 2026

The leaderboard

Top 25 models, ranked by tier & recency

See all 80 models

Model	Type	Context	MMLU	Out · 1M	Provider
GPT-5.5 OpenAI's mid-cycle GPT-5 refresh, improved reasoning, tool use, and multimodal grounding over the 2025 launch.	Proprietary	400K	,	,	OpenAI	View
Claude Opus 4.7 Anthropic's mid-2026 flagship, ahead on SWE-bench, agent reliability, and writing quality.	Proprietary	500K	,	,	Anthropic	View
o4 OpenAI's late-2025 standalone reasoning model, an evolution of o3 with deeper chain-of-thought and stronger multimodal reasoning.	Proprietary	200K	,	,	OpenAI	View
Gemini 3 Pro Google's late-2025 flagship, set new benchmarks on long-context, vision, and reasoning at competitive pricing.	Proprietary	1.0M	91.8	,	Google	View
GPT-5 OpenAI's unified flagship combining GPT-line breadth with built-in reasoning, replacing both GPT-4o and the o-series for most users.	Proprietary	400K	91.4	$10.00	OpenAI	View
Grok 4 xAI's mid-2025 flagship, top scores on Humanity's Last Exam at launch, with native real-time X integration.	Proprietary	256K	,	$15.00	xAI	View
Claude Opus 4 Anthropic's mid-2025 flagship, the model that established Claude's lead on coding agents and SWE-bench.	Proprietary	200K	,	$75.00	Anthropic	View
o3 OpenAI's flagship reasoning model, set the bar for hard math, GPQA, and agent benchmarks in 2025.	Proprietary	200K	,	$8.00	OpenAI	View
Gemini 2.5 Pro Google's mid-2025 flagship, the model that brought Gemini decisively back to parity with the OpenAI and Anthropic frontier.	Proprietary	2.1M	86.0	$10.00	Google	View
Grok 3 xAI's first frontier-tier release, established the company's Colossus-trained model line.	Proprietary	128K	,	$15.00	xAI	View
Sonar Pro Perplexity's premium answer model, deeper search, more sources, longer answers.	Proprietary	200K	,	$15.00	Perplexity	View
Mistral Large 2 Mistral's flagship API model, strong on code and reasoning, EU-friendly hosting.	Open	128K	84.0	$6.00	Mistral	View
Claude Sonnet 4.6 Anthropic's mid-tier 4.6 release, the workhorse model behind most production Anthropic deployments.	Proprietary	200K	,	,	Anthropic	View
Gemini 3 Flash Google's high-speed, low-cost mid-tier with the same massive context window, popular for high-volume RAG.	Proprietary	1.0M	,	,	Google	View
GPT-5 mini GPT-5's mid-tier sibling, most of the quality at a fraction of the price, ideal for high-volume production workloads.	Proprietary	400K	,	$2.00	OpenAI	View
Claude Sonnet 4 Mid-2025 mid-tier Claude, the predecessor workhorse to Sonnet 4.6 and still common in production.	Proprietary	200K	,	$15.00	Anthropic	View
o4-mini Smaller, faster, cheaper member of OpenAI's reasoning-model family, great latency-cost balance for hard tasks.	Proprietary	200K	,	$4.40	OpenAI	View
GPT-4.1 OpenAI's 2025 GPT-4.x refresh, long-context, fast, still widely deployed even after GPT-5.	Proprietary	1M	86.2	$8.00	OpenAI	View
Gemini 2.5 Flash Mid-2025 fast tier, set the bar for cost-efficient long-context generation.	Proprietary	1.0M	,	$0.30	Google	View
Claude 3.7 Sonnet The first Claude with an extended-thinking mode, ushered the reasoning-model paradigm into Anthropic's lineup.	Proprietary	200K	,	$15.00	Anthropic	View
Gemini 2.0 Flash Early-2025 fast Gemini, first model with full 1M-token context at the Flash price point.	Proprietary	1.0M	,	$0.40	Google	View
o3-mini Smaller, faster reasoning model, popular as the budget thinking-model option throughout 2025.	Proprietary	200K	,	$4.40	OpenAI	View
Codestral 2 Mistral's code-specialized model, fast inline completion and strong fill-in-the-middle support.	Open	256K	,	$0.90	Mistral	View
Mistral Medium Mistral's mid-tier balanced model, production-ready at competitive pricing.	Proprietary	128K	,	$2.00	Mistral	View
Amazon Nova Pro Amazon's mid-tier multimodal, competitive pricing, deep AWS integration.	Proprietary	300K	,	$3.20	Other	View

Pricing reflects each provider's public per-token rates at the time of writing. MMLU scores come from official model cards or Artificial Analysis. Benchmarks left blank when not independently verified, see methodology.

Trending matchups

The head-to-heads everyone's asking about

Browse all comparisons

Top by category

Find the right model for the job

All 60 guides

Honest data, never fabricated

We only publish benchmark numbers we can source. Where a model has no public score, we say so, instead of guessing.

Updated weekly

A new release on Monday is on LLMDex by Friday. The dataset lives in version control, every change is auditable.

Comparisons that compare

Every compare page is a programmatic synthesis of real data deltas, not generic AI filler. Read the methodology.

Friday digest

One short email every Friday, new model launches, leaderboard moves, and pricing drops. Curated by hand. Free, no spam.

The definitive intelligence index.

Honest data, never fabricated

Updated weekly

Comparisons that compare

Intelligence, distilled weekly.