LLM·Dex
80 models · 60 tools · 60 guides

The definitive intelligence index.

An opulent, real-time ledger of parameters, context windows, and deployment economics across 80+ Large Language Models. No fabricated benchmarks. No marketing fluff.

Dataset · Updated


The leaderboard

Top 25 models, ranked by tier & recency

See all 80 models
ModelTypeContextMMLUOut · 1MProvider
GPT-5.5
OpenAI's mid-cycle GPT-5 refresh, improved reasoning, tool use, and multimodal grounding over the 2025 launch.
Proprietary400K,,OpenAIView
Claude Opus 4.7
Anthropic's mid-2026 flagship, ahead on SWE-bench, agent reliability, and writing quality.
Proprietary500K,,AnthropicView
o4
OpenAI's late-2025 standalone reasoning model, an evolution of o3 with deeper chain-of-thought and stronger multimodal reasoning.
Proprietary200K,,OpenAIView
Gemini 3 Pro
Google's late-2025 flagship, set new benchmarks on long-context, vision, and reasoning at competitive pricing.
Proprietary1.0M91.8,GoogleView
GPT-5
OpenAI's unified flagship combining GPT-line breadth with built-in reasoning, replacing both GPT-4o and the o-series for most users.
Proprietary400K91.4$10.00OpenAIView
Grok 4
xAI's mid-2025 flagship, top scores on Humanity's Last Exam at launch, with native real-time X integration.
Proprietary256K,$15.00xAIView
Claude Opus 4
Anthropic's mid-2025 flagship, the model that established Claude's lead on coding agents and SWE-bench.
Proprietary200K,$75.00AnthropicView
o3
OpenAI's flagship reasoning model, set the bar for hard math, GPQA, and agent benchmarks in 2025.
Proprietary200K,$8.00OpenAIView
Gemini 2.5 Pro
Google's mid-2025 flagship, the model that brought Gemini decisively back to parity with the OpenAI and Anthropic frontier.
Proprietary2.1M86.0$10.00GoogleView
Grok 3
xAI's first frontier-tier release, established the company's Colossus-trained model line.
Proprietary128K,$15.00xAIView
Sonar Pro
Perplexity's premium answer model, deeper search, more sources, longer answers.
Proprietary200K,$15.00PerplexityView
Mistral Large 2
Mistral's flagship API model, strong on code and reasoning, EU-friendly hosting.
Open128K84.0$6.00MistralView
Claude Sonnet 4.6
Anthropic's mid-tier 4.6 release, the workhorse model behind most production Anthropic deployments.
Proprietary200K,,AnthropicView
Gemini 3 Flash
Google's high-speed, low-cost mid-tier with the same massive context window, popular for high-volume RAG.
Proprietary1.0M,,GoogleView
GPT-5 mini
GPT-5's mid-tier sibling, most of the quality at a fraction of the price, ideal for high-volume production workloads.
Proprietary400K,$2.00OpenAIView
Claude Sonnet 4
Mid-2025 mid-tier Claude, the predecessor workhorse to Sonnet 4.6 and still common in production.
Proprietary200K,$15.00AnthropicView
o4-mini
Smaller, faster, cheaper member of OpenAI's reasoning-model family, great latency-cost balance for hard tasks.
Proprietary200K,$4.40OpenAIView
GPT-4.1
OpenAI's 2025 GPT-4.x refresh, long-context, fast, still widely deployed even after GPT-5.
Proprietary1M86.2$8.00OpenAIView
Gemini 2.5 Flash
Mid-2025 fast tier, set the bar for cost-efficient long-context generation.
Proprietary1.0M,$0.30GoogleView
Claude 3.7 Sonnet
The first Claude with an extended-thinking mode, ushered the reasoning-model paradigm into Anthropic's lineup.
Proprietary200K,$15.00AnthropicView
Gemini 2.0 Flash
Early-2025 fast Gemini, first model with full 1M-token context at the Flash price point.
Proprietary1.0M,$0.40GoogleView
o3-mini
Smaller, faster reasoning model, popular as the budget thinking-model option throughout 2025.
Proprietary200K,$4.40OpenAIView
Codestral 2
Mistral's code-specialized model, fast inline completion and strong fill-in-the-middle support.
Open256K,$0.90MistralView
Mistral Medium
Mistral's mid-tier balanced model, production-ready at competitive pricing.
Proprietary128K,$2.00MistralView
Amazon Nova Pro
Amazon's mid-tier multimodal, competitive pricing, deep AWS integration.
Proprietary300K,$3.20OtherView

Pricing reflects each provider's public per-token rates at the time of writing. MMLU scores come from official model cards or Artificial Analysis. Benchmarks left blank when not independently verified, see methodology.

Top by category

Find the right model for the job

All 60 guides

Honest data, never fabricated

We only publish benchmark numbers we can source. Where a model has no public score, we say so, instead of guessing.

Updated weekly

A new release on Monday is on LLMDex by Friday. The dataset lives in version control, every change is auditable.

Comparisons that compare

Every compare page is a programmatic synthesis of real data deltas, not generic AI filler. Read the methodology.

Friday digest

Intelligence, distilled weekly.

One short email every Friday, new model launches, leaderboard moves, and pricing drops. Curated by hand. Free, no spam.