The definitive intelligence index.
An opulent, real-time ledger of parameters, context windows, and deployment economics across 80+ Large Language Models. No fabricated benchmarks. No marketing fluff.
Dataset · Updated
Top 25 models, ranked by tier & recency
| Model | Type | Context | MMLU | Out · 1M | Provider | |
|---|---|---|---|---|---|---|
| GPT-5.5 OpenAI's mid-cycle GPT-5 refresh, improved reasoning, tool use, and multimodal grounding over the 2025 launch. | Proprietary | 400K | , | , | OpenAI | View |
| Claude Opus 4.7 Anthropic's mid-2026 flagship, ahead on SWE-bench, agent reliability, and writing quality. | Proprietary | 500K | , | , | Anthropic | View |
| o4 OpenAI's late-2025 standalone reasoning model, an evolution of o3 with deeper chain-of-thought and stronger multimodal reasoning. | Proprietary | 200K | , | , | OpenAI | View |
| Gemini 3 Pro Google's late-2025 flagship, set new benchmarks on long-context, vision, and reasoning at competitive pricing. | Proprietary | 1.0M | 91.8 | , | View | |
| GPT-5 OpenAI's unified flagship combining GPT-line breadth with built-in reasoning, replacing both GPT-4o and the o-series for most users. | Proprietary | 400K | 91.4 | $10.00 | OpenAI | View |
| Grok 4 xAI's mid-2025 flagship, top scores on Humanity's Last Exam at launch, with native real-time X integration. | Proprietary | 256K | , | $15.00 | xAI | View |
| Claude Opus 4 Anthropic's mid-2025 flagship, the model that established Claude's lead on coding agents and SWE-bench. | Proprietary | 200K | , | $75.00 | Anthropic | View |
| o3 OpenAI's flagship reasoning model, set the bar for hard math, GPQA, and agent benchmarks in 2025. | Proprietary | 200K | , | $8.00 | OpenAI | View |
| Gemini 2.5 Pro Google's mid-2025 flagship, the model that brought Gemini decisively back to parity with the OpenAI and Anthropic frontier. | Proprietary | 2.1M | 86.0 | $10.00 | View | |
| Grok 3 xAI's first frontier-tier release, established the company's Colossus-trained model line. | Proprietary | 128K | , | $15.00 | xAI | View |
| Sonar Pro Perplexity's premium answer model, deeper search, more sources, longer answers. | Proprietary | 200K | , | $15.00 | Perplexity | View |
| Mistral Large 2 Mistral's flagship API model, strong on code and reasoning, EU-friendly hosting. | Open | 128K | 84.0 | $6.00 | Mistral | View |
| Claude Sonnet 4.6 Anthropic's mid-tier 4.6 release, the workhorse model behind most production Anthropic deployments. | Proprietary | 200K | , | , | Anthropic | View |
| Gemini 3 Flash Google's high-speed, low-cost mid-tier with the same massive context window, popular for high-volume RAG. | Proprietary | 1.0M | , | , | View | |
| GPT-5 mini GPT-5's mid-tier sibling, most of the quality at a fraction of the price, ideal for high-volume production workloads. | Proprietary | 400K | , | $2.00 | OpenAI | View |
| Claude Sonnet 4 Mid-2025 mid-tier Claude, the predecessor workhorse to Sonnet 4.6 and still common in production. | Proprietary | 200K | , | $15.00 | Anthropic | View |
| o4-mini Smaller, faster, cheaper member of OpenAI's reasoning-model family, great latency-cost balance for hard tasks. | Proprietary | 200K | , | $4.40 | OpenAI | View |
| GPT-4.1 OpenAI's 2025 GPT-4.x refresh, long-context, fast, still widely deployed even after GPT-5. | Proprietary | 1M | 86.2 | $8.00 | OpenAI | View |
| Gemini 2.5 Flash Mid-2025 fast tier, set the bar for cost-efficient long-context generation. | Proprietary | 1.0M | , | $0.30 | View | |
| Claude 3.7 Sonnet The first Claude with an extended-thinking mode, ushered the reasoning-model paradigm into Anthropic's lineup. | Proprietary | 200K | , | $15.00 | Anthropic | View |
| Gemini 2.0 Flash Early-2025 fast Gemini, first model with full 1M-token context at the Flash price point. | Proprietary | 1.0M | , | $0.40 | View | |
| o3-mini Smaller, faster reasoning model, popular as the budget thinking-model option throughout 2025. | Proprietary | 200K | , | $4.40 | OpenAI | View |
| Codestral 2 Mistral's code-specialized model, fast inline completion and strong fill-in-the-middle support. | Open | 256K | , | $0.90 | Mistral | View |
| Mistral Medium Mistral's mid-tier balanced model, production-ready at competitive pricing. | Proprietary | 128K | , | $2.00 | Mistral | View |
| Amazon Nova Pro Amazon's mid-tier multimodal, competitive pricing, deep AWS integration. | Proprietary | 300K | , | $3.20 | Other | View |
Pricing reflects each provider's public per-token rates at the time of writing. MMLU scores come from official model cards or Artificial Analysis. Benchmarks left blank when not independently verified, see methodology.
The head-to-heads everyone's asking about
Find the right model for the job
- Best LLM for Coding
Developers searching for the most capable model to write, edit, or refactor code in real codebases.
See ranking - Best LLM for Reasoning
Multi-step logical reasoning, planning, and inference.
See ranking - Best LLM for RAG
Retrieval-augmented generation pipelines, answer questions from a doc corpus.
See ranking - Best LLM for Vision
Image understanding, chart analysis, OCR, screenshot reasoning, photo description.
See ranking - Cheapest LLMs
Lowest dollar-per-million-tokens models that still meet quality bars.
See ranking - Best Open-Source LLMs
The top open-weight models for self-hosting or fine-tuning.
See ranking
Honest data, never fabricated
We only publish benchmark numbers we can source. Where a model has no public score, we say so, instead of guessing.
Updated weekly
A new release on Monday is on LLMDex by Friday. The dataset lives in version control, every change is auditable.
Comparisons that compare
Every compare page is a programmatic synthesis of real data deltas, not generic AI filler. Read the methodology.
Intelligence, distilled weekly.
One short email every Friday, new model launches, leaderboard moves, and pricing drops. Curated by hand. Free, no spam.