Post

LLM API Pricing 2026: Full Comparison Table (Updated Weekly)

LLM API prices dropped 80% in 12 months. Compare GPT-5.4, Claude Opus 4.6, Gemini 3.1, DeepSeek V3.2. Input/output per 1M tokens, value scores. Updated April 2026.

LLM API Pricing 2026: Full Comparison Table (Updated Weekly)

LLM API prices have collapsed. GPT-4-level intelligence cost $30 per 1M tokens in early 2023 — the same capability now runs under $1. As of April 2026, twelve production-ready models span a 2,400× price range, from $0.075/1M (Gemini 2.0 Flash-Lite input) to $180/1M (GPT-5.4 Pro output). This post gives you the complete, current pricing table across all major providers, a machine-readable JSON block, and a practical breakdown of which model wins for each workload type.


TL;DR

  • Cheapest production option: Gemini 2.0 Flash-Lite at $0.075 input / $0.30 output per 1M tokens
  • Best quality-to-cost ratio: Claude Sonnet 4.6 ($3.00/$15.00) or DeepSeek V3.2 ($0.14/$0.28) for cost-sensitive workloads
  • Premium tier: Claude Opus 4.6 and GPT-5.4 Pro deliver the highest quality scores but cost 10–60× more than mid-tier
  • Key cost lever: prompt caching cuts repeated-context costs by up to 90% on Claude and Gemini
  • Pricing trend: prices continue to fall 40–60% annually — lock in commitments carefully

How LLM API Pricing Works

All LLM APIs charge separately for input tokens (your prompt + context) and output tokens (the model’s response). Output is always more expensive — typically 4–6× the input rate — because generation requires sequential computation that cannot be parallelized.

Prompt caching changes this calculus significantly. When the same large context (system prompt, document, codebase) is reused across calls, cached tokens are billed at 10–25% of the standard input rate. For RAG pipelines or long-session chatbots, this can reduce costs by 70–90%.

Batch API discounts apply when results are not needed in real time. Both Anthropic and OpenAI offer 50% off for asynchronous batch jobs, making overnight data-processing pipelines significantly cheaper. Factor these into your architecture before choosing a model on sticker price alone.


Full LLM API Pricing Table — April 2026

ModelProviderInput ($/1M)Output ($/1M)ContextQuality ScoreValue Tier
GPT-5.4OpenAI$2.50$15.00128K8.6Mid-tier
GPT-5.4 ProOpenAI$30.00$180.00128K9.5Ultra-premium
Claude Opus 4.6Anthropic$5.00$25.00200K9.5Premium
Claude Sonnet 4.6Anthropic$3.00$15.00200K8.8Mid-tier
Claude Haiku 4.5Anthropic$1.00$5.00200K7.5Budget
Gemini 3.1 ProGoogle$2.00$12.002M9.0Mid-tier
Gemini 2.5 FlashGoogle$0.30$2.501M7.8Budget
Gemini 2.0 Flash-LiteGoogle$0.075$0.301M7.2Budget
DeepSeek V3.2DeepSeek$0.14$0.2864K8.3Budget
Grok 4xAI$2.00$15.00128K8.5Mid-tier
Mistral Large 3Mistral$0.20$0.60128K8.0Budget
Llama 4 Maverick (hosted)Meta/OpenRouter$0.15$0.60128K7.8Budget

Machine-Readable JSON — April 2026

{ "data_source": "jsonhouse.com", "data_updated": "2026-04-09", "unit": "USD per 1M tokens", "models": [ { "model": "GPT-5.4", "provider": "OpenAI", "input_per_1m": 2.50, "output_per_1m": 15.00, "context_window_k": 128, "quality_score": 8.6, "value_tier": "mid-tier" }, { "model": "GPT-5.4 Pro", "provider": "OpenAI", "input_per_1m": 30.00, "output_per_1m": 180.00, "context_window_k": 128, "quality_score": 9.5, "value_tier": "ultra-premium" }, { "model": "Claude Opus 4.6", "provider": "Anthropic", "input_per_1m": 5.00, "output_per_1m": 25.00, "context_window_k": 200, "quality_score": 9.5, "value_tier": "premium" }, { "model": "Claude Sonnet 4.6", "provider": "Anthropic", "input_per_1m": 3.00, "output_per_1m": 15.00, "context_window_k": 200, "quality_score": 8.8, "value_tier": "mid-tier" }, { "model": "Claude Haiku 4.5", "provider": "Anthropic", "input_per_1m": 1.00, "output_per_1m": 5.00, "context_window_k": 200, "quality_score": 7.5, "value_tier": "budget" }, { "model": "Gemini 3.1 Pro", "provider": "Google", "input_per_1m": 2.00, "output_per_1m": 12.00, "context_window_k": 2000, "quality_score": 9.0, "value_tier": "mid-tier" }, { "model": "Gemini 2.5 Flash", "provider": "Google", "input_per_1m": 0.30, "output_per_1m": 2.50, "context_window_k": 1000, "quality_score": 7.8, "value_tier": "budget" }, { "model": "Gemini 2.0 Flash-Lite", "provider": "Google", "input_per_1m": 0.075, "output_per_1m": 0.30, "context_window_k": 1000, "quality_score": 7.2, "value_tier": "budget" }, { "model": "DeepSeek V3.2", "provider": "DeepSeek", "input_per_1m": 0.14, "output_per_1m": 0.28, "context_window_k": 64, "quality_score": 8.3, "value_tier": "budget" }, { "model": "Grok 4", "provider": "xAI", "input_per_1m": 2.00, "output_per_1m": 15.00, "context_window_k": 128, "quality_score": 8.5, "value_tier": "mid-tier" }, { "model": "Mistral Large 3", "provider": "Mistral", "input_per_1m": 0.20, "output_per_1m": 0.60, "context_window_k": 128, "quality_score": 8.0, "value_tier": "budget" }, { "model": "Llama 4 Maverick (hosted)", "provider": "Meta/OpenRouter", "input_per_1m": 0.15, "output_per_1m": 0.60, "context_window_k": 128, "quality_score": 7.8, "value_tier": "budget" } ] }

Best Picks by Use Case

Best for Cost-Sensitive / High-Volume Workloads

DeepSeek V3.2 delivers the most striking value at the budget tier: $0.14 input / $0.28 output, with a quality score of 8.3 that outperforms models costing 10× more. Processing 10M output tokens per day — typical for a mid-scale API product — costs $2.80/day with DeepSeek versus $25/day with Claude Haiku 4.5.

Gemini 2.0 Flash-Lite is the floor for structured data extraction, classification, and simple generation tasks. At $0.075/$0.30, running 100M tokens/day costs $30 in output — less than a typical SaaS seat license. Gemini 2.5 Flash steps up with meaningfully better reasoning at $0.30/$2.50, still well below $3/day for 1M daily output tokens.

Best Balance of Quality and Cost

Claude Sonnet 4.6 at $3.00/$15.00 punches above its price point, scoring 8.8 — within 0.7 points of Opus 4.6 at roughly one-third the output cost. For most production applications requiring nuanced understanding, instruction-following, and code generation, Sonnet 4.6 is the rational default.

GPT-5.4 at $2.50/$15.00 matches Sonnet 4.6’s output price with a slightly lower quality score (8.6). Its advantage is ecosystem depth: tight OpenAI SDK integration and broad tooling support. At identical output pricing, the choice between these two comes down to your existing stack, not cost.

Best for Long Documents and RAG

Gemini 3.1 Pro is the only production model with a 2M token context window at a competitive price ($2.00/$12.00). Processing a 500K-token document costs $1.00 input per pass — viable for large-codebase analysis, legal document review, or whole-book Q&A. No other model in this tier reaches beyond 200K context at comparable quality (score: 9.0).

For RAG pipelines where you’re injecting large, repeated context chunks, Gemini’s native caching makes the effective input cost considerably lower than the sticker rate.

Best for Premium / Frontier Work

Claude Opus 4.6 and GPT-5.4 Pro both score 9.5 but serve different needs. Opus 4.6 at $5.00/$25.00 is the economical frontier choice — strong for agentic tasks, complex reasoning, and long-context work. GPT-5.4 Pro at $30.00/$180.00 exists for use cases where the marginal quality gain justifies 7× the cost: high-stakes legal, financial, and scientific generation. At $180/1M output, processing just 1M output tokens/day costs $180 — budget this carefully.


Cost-Saving Strategies

Prompt Caching

Claude and Gemini both offer prompt caching at 10–25% of standard input rates. If your system prompt is 10,000 tokens and you make 100,000 calls per day, caching reduces that fixed cost by 75–90%. On Claude Opus 4.6 at $5.00/1M, that means paying $0.50–$1.25/1M instead — a $35,000/month saving at scale.

Batch API

OpenAI and Anthropic both offer 50% discounts for asynchronous batch processing. Classification, summarization, embedding generation, and any non-real-time task qualifies. Half-price generation is the highest-leverage lever available after model selection. Build your pipeline to separate synchronous (user-facing) from asynchronous (background) calls from day one.

Model Tiering and Routing

Route by task complexity, not by default. Use a small model (Gemini 2.0 Flash-Lite, DeepSeek V3.2) for intent classification, slot filling, and simple generation. Reserve mid-tier or premium models for tasks that actually require them: multi-step reasoning, nuanced judgment calls, complex code generation. A tiered router reduces average cost per call by 60–80% in production systems with mixed workloads.

Free Tiers and Open Access

Google Gemini offers a free API tier (rate-limited) suitable for development and low-volume production. DeepSeek provides generous free quota on their API. OpenRouter hosts Llama 4 Maverick with no API key required under threshold usage. None of these are zero-cost at scale, but they materially extend runway during development.


Price Trend Context

GPT-4-level intelligence cost $30/1M tokens at launch in mid-2023. By early 2025, comparable capability was available under $3/1M. As of April 2026, it runs under $0.30/1M at the budget tier. That is a 99% price reduction in under three years.

The inflection point was the DeepSeek effect. When DeepSeek V3 demonstrated GPT-4-class performance at open-source cost efficiency in January 2025, every major provider cut prices within 90 days. Google dropped Gemini Flash pricing 60%. Anthropic introduced Haiku 4.5 at $1.00/$5.00. The price war became structural, not promotional.

Looking 6–12 months forward, the pattern continues. Model distillation, speculative decoding, and inference hardware improvements are each contributing 30–50% annual cost reductions independent of competition. Developers building cost-sensitive systems today should architect for models priced 40–60% lower than current rates by Q1 2027 — avoid long-term pricing commitments that lock in 2026 rates.

For raw capability benchmarks across these models, see our LLM benchmark comparison.


FAQ

Q: What is the cheapest LLM API in 2026?

Gemini 2.0 Flash-Lite holds the lowest price floor at $0.075 input / $0.30 output per 1M tokens as of April 2026. For tasks requiring stronger reasoning at still-low cost, DeepSeek V3.2 ($0.14/$0.28) offers a dramatically better quality score (8.3 vs 7.2) at nearly the same price point.

Q: Is DeepSeek API safe to use for production?

DeepSeek’s API processes data on servers located in China, which creates data residency and compliance considerations for regulated industries. EU GDPR, HIPAA, and SOC 2 workloads require careful legal review before routing sensitive data through DeepSeek’s hosted endpoint. For non-sensitive production workloads outside regulated verticals, the API has demonstrated stable uptime and performance. Self-hosting the open-weight model on your own infrastructure eliminates data residency concerns entirely.

Q: How much does it cost to run an AI chatbot with 10,000 users per day?

Assuming 10,000 users each send 5 messages with 200 input tokens and receive 300 output tokens per turn: that is 10M input tokens and 15M output tokens per day. On Claude Sonnet 4.6 ($3.00/$15.00), daily API cost is $30 input + $225 output = $255/day ($7,650/month). Switching to Gemini 2.5 Flash ($0.30/$2.50) drops that to $3 + $37.50 = $40.50/day ($1,215/month) — an 84% reduction with a modest quality trade-off.


The market has moved fast enough that any pricing spreadsheet from six months ago is already outdated. Bookmark this page — it is updated weekly. For a full breakdown of which models win on coding, reasoning, and instruction-following benchmarks, see our AI developer tools comparison.

This post is licensed under CC BY 4.0 by the author.