GPTOpenAIHistoryLLM

The Complete History of GPT: From GPT-1 to GPT-5.5 (2018, 2026)

How OpenAI's GPT line evolved from a 117M-parameter research model in 2018 to GPT-5.5, pricing, parameters, capabilities, and the moments that mattered.

Published Apr 30, 2026By LLMDex Editorial

If you started using ChatGPT in late 2022, you might assume GPT was always this useful. It wasn't. The line that produced GPT-5 and GPT-5.5 began in 2018 with a research paper most people outside NLP never read, and it took eight straight years of compounding scale, architectural refinement, and reinforcement learning before "GPT" became shorthand for "AI." This is the full timeline.

We've written this for engineers, founders, and curious users who want the actual chronology, not the vibes-based one. Every model, every release date, every meaningful capability shift, in order.

GPT-1 (June 2018), The unsupervised pretraining bet

The original paper was titled "Improving Language Understanding by Generative Pre-Training," and it was almost a footnote at the time. Google's BERT had stolen the thunder of 2018's NLP cycle. GPT-1 was a 117-million-parameter transformer trained on the BookCorpus dataset, and its central thesis was simple but at the time unproven: train a single model on huge volumes of unlabeled text, then fine-tune it for downstream tasks instead of training a fresh model from scratch each time.

The model itself was unimpressive by 2026 standards, answers were often gibberish, instruction-following was nonexistent, and there was no chat interface. But the architecture and training recipe became the template every later GPT inherited. If you want to read one original paper to understand the line, this is it.

GPT-2 (February 2019), The "too dangerous to release" cycle

GPT-2 was a scale jump: 1.5 billion parameters, trained on 40GB of curated web text. It was the first model in the line that could write coherent paragraphs, hold a topic across a few hundred words, and respond plausibly to arbitrary prompts.

OpenAI's decision not to release the largest model immediately, citing misuse concerns, was the moment the AI safety conversation went mainstream. The 1.5B model eventually shipped in November 2019. In retrospect, the worry was overblown for a model of that size, but the precedent of staged release became standard industry practice.

The capabilities readers experienced were narrow by modern standards, but two things were genuinely new: it could continue arbitrary text fluently, and it sometimes invented surprisingly coherent characters or arguments. Researchers started calling these emergent behaviors.

GPT-3 (June 2020), The capability inflection

GPT-3 was 175 billion parameters, more than 100× GPT-2, and trained on roughly 500 billion tokens. It was the model that made "in-context learning" a household phrase: you could give it three or four examples in a prompt and it would generalize to new instances of the pattern, without fine-tuning.

The API launched in private beta with a waitlist. Costs were high (the original Davinci model ran several cents per thousand tokens), but the capability ceiling was something nobody had publicly demonstrated before. Copywriters, coders, and researchers all found uses within weeks.

GPT-3's biggest weaknesses were also real: it hallucinated confidently, it had no knowledge after September 2020, and it struggled with multi-step reasoning. But for the first time, the failure modes felt addressable rather than fundamental. The model that arrived two years later would prove that out.

InstructGPT and ChatGPT (March 2022 / November 2022), RLHF and the consumer breakthrough

The single biggest gap between GPT-3 and what users wanted was instruction-following. GPT-3 could continue text, but if you asked "summarize this," it might just continue the article in the same voice. InstructGPT, released in March 2022, solved this by post-training the model on human feedback (RLHF: Reinforcement Learning from Human Feedback). It became dramatically more useful for ad-hoc instructions even though its base capabilities hadn't changed.

ChatGPT in November 2022 was InstructGPT plus a chat-specific fine-tune plus a free web interface. Within five days it had a million users. Within two months it had 100 million. The product was a step-change for distribution, not for raw model quality, but it was the moment GPT escaped the research community.

GPT-4 (March 2023), Multimodal, more reliable, much larger

GPT-4 was the line's first model where most users could no longer tell the difference between "AI wrote this" and "human wrote this" on routine text tasks. It also introduced multimodal vision: you could paste a screenshot or a chart and ask questions about it. Training details remained closed (parameter count, training data, compute budget), and the company made a deliberate move toward less openness, a decision that drew sharp criticism from researchers but became the new norm.

For developers, GPT-4 brought function-calling (later called tool use), JSON-mode for structured output, and a wider context window (8K, then 32K, then 128K with GPT-4 Turbo). It was the model that made "AI agents" feel viable rather than merely demoable.

GPT-4's economics improved over time as OpenAI shipped GPT-4 Turbo (November 2023) and GPT-4o (May 2024). GPT-4o was the line's first natively-multimodal release, text, vision, and audio in one network, and it powered the original Realtime voice API.

GPT-4o, GPT-4.1, and the Nov 2023, Apr 2025 cycle

This stretch was less about capability jumps and more about consolidation. GPT-4o (May 2024) made multimodality the default. GPT-4o-mini (July 2024) brought price-per-token down dramatically and became the workhorse model behind most production deployments. GPT-4.1 (April 2025) extended context to 1 million tokens and shipped meaningful speed improvements over GPT-4o.

The o-series, o1 (preview Sep 2024, full Dec 2024), o3-mini (Jan 2025), o3 and o4-mini (April 2025), was the first time OpenAI released models that did explicit chain-of-thought "thinking" before answering. These models broke every public reasoning benchmark on release. They were slow and expensive but proved that "spending more compute at inference time" was a genuine quality lever, not a research curiosity.

GPT-5 (August 2025), Unification

GPT-5 launched in August 2025 as OpenAI's first unified model: one API, one model name, with reasoning routed automatically per query. Behind the scenes the architecture combined what was previously the GPT line and the o-series, easy questions got fast answers, hard questions got internal reasoning tokens before the visible response.

Pricing came in lower than GPT-4-class models on equivalent quality, with three tiers: full GPT-5 ($1.25/$10 per million input/output tokens), GPT-5 mini ($0.25/$2), and GPT-5 nano ($0.05/$0.40). Context window stayed at 400K. The Responses API replaced the older Chat Completions surface for new builds and became the backbone for OpenAI's agent products.

For developers, the practical effect was that you could pick one model name and not worry about whether your task needed reasoning. For most production workloads, GPT-5 mini became the new default, it cleared the quality bar at a tenth the cost of full GPT-5.

Read the full GPT-5 spec on LLMDex.

GPT-5.5 (March 2026), The mid-cycle refresh

GPT-5.5 shipped in March 2026 as a mid-cycle quality bump rather than an architectural reset. The headline upgrades:

Stronger agent performance. SWE-bench Verified scores climbed several points, and tool-use error recovery is noticeably better on long agent loops. Cursor, Cline, and Claude Code all benchmarked upgrades within the first week of release.
Better long-context recall. The 400K-token context window has measurably improved multi-needle retrieval, meaning the model is better at reasoning over long documents, not just remembering them.
Sharper multimodal grounding. Screenshot debugging, chart analysis, and document AI all leveled up. For many users, GPT-5.5 is now the first chat model that meaningfully replaces dedicated OCR for clean print.
Same API surface. No code changes for existing GPT-5 integrations, swap the model name string and you're done.

GPT-5.5 doesn't replace GPT-5 in OpenAI's pricing tables (yet); it sits alongside it. Most teams should migrate selectively: agent loops and document-AI workflows benefit immediately, while routine chat and structured-extraction workloads see less of a gap.

Read the full GPT-5.5 spec on LLMDex.

What you should actually pick in 2026

For most production workloads, GPT-5 mini is the right default. It's fast, cheap enough to be invisible at scale, and clears the quality bar for chat, RAG, customer support, and most agent workflows. Reach for GPT-5.5 when you're hitting a real wall, hard agent tasks, long-doc reasoning, or vision-heavy work where the upgrade is worth the cost.

For reasoning-bound workloads (math, hard science, multi-step planning), the o-series remains a credible alternative. For everything else, GPT-5.x's routed reasoning is sufficient.

If you're cost-sensitive and the quality bar is loose, GPT-5 nano at $0.05/$0.40 per 1M tokens is one of the cheapest serious models in the market.

The pattern across eight years

Three things have stayed constant since GPT-1:

Scaling works longer than people expect. Every release in the line beat the prior one on quality at roughly the cost the prior one charged a year earlier. The "scaling is dead" claim has been made about every generation and has been wrong every time.
Post-training matters more than people credit. GPT-3 → InstructGPT was a smaller compute investment than GPT-2 → GPT-3 but a bigger user-facing improvement. Every subsequent generation has poured more into RLHF and reasoning training.
Distribution wins. ChatGPT was a free chat interface around an existing model. It's still the largest user base in the line, and it's why GPT became the verb.

GPT-6 will ship. GPT-6.5 will ship. The pattern won't break unless training data or compute hits a hard wall, and neither has so far.

Keep reading

Friday digest

One short email every Friday, new model launches, leaderboard moves, and pricing drops. Curated by hand. Free, no spam.