AI Pricing Explained: Tokens, Credits, and What You're Actually Paying For
Guides13 min readDecember 7, 2025

AI Pricing Explained: Tokens, Credits, and What You're Actually Paying For

Confused by AI pricing? This guide explains tokens, input vs output costs, context windows, and real-world pricing examples to help you understand what you're actually paying for.

AI pricing is confusing by design. Tokens, credits, context windows, rate limits—it can feel like providers are deliberately making it hard to compare costs. This guide cuts through the complexity and helps you understand what you're actually paying for.

The Core Concept

AI models charge based on how much text you send them (input) and how much they generate (output). This is measured in "tokens"—roughly ¾ of a word. Everything else in AI pricing flows from this basic principle.

What Are Tokens?

Tokens are the fundamental unit of AI pricing. They're not quite words, not quite characters—they're chunks of text that the model processes.

Token Examples

Hello = 1 token

Hello, world! = 4 tokens

Artificial intelligence = 2 tokens

supercalifragilistic = 5 tokens

Quick Estimates

1 token ≈ 4 characters

1 token ≈ ¾ of a word

100 tokens ≈ 75 words

1,000 tokens ≈ 750 words

1 page of text ≈ 400-500 tokens

Why tokens instead of words? AI models don't "read" text like humans. They break text into pieces that make mathematical sense for their architecture. Common words are often single tokens, while unusual words get split into multiple pieces. This is why the same sentence can have different token counts depending on the words used.

💡 Practical Tip

For rough estimates, assume 1,000 tokens ≈ 750 words. A typical ChatGPT conversation might use 500-2,000 tokens per exchange. A long document analysis could use 10,000-50,000 tokens.

Input vs Output Tokens

This is where pricing gets interesting. AI providers charge differently for:

Input Tokens

What you send to the model: your prompt, any context, documents you're asking about, conversation history. Generally cheaper.

Output Tokens

What the model generates in response. The AI's answer, generated content, code it writes. Usually 2-5x more expensive than input.

Why is output more expensive? Generating text requires more computation than processing it. The model has to "think" about each word it produces, making predictions and selecting from possibilities. Reading your input is comparatively simple.

Current Pricing: The Real Numbers

Here's what major providers actually charge (as of late 2024). Prices are per 1 million tokens to make comparison easier.

ModelInput/1M tokensOutput/1M tokensTier
GPT-4o$2.50$10.00Flagship
GPT-4o Mini$0.15$0.60Budget
Claude Opus 4$15.00$75.00Premium
Claude Sonnet 4$3.00$15.00Flagship
Claude Haiku 3.5$0.80$4.00Budget
Gemini 1.5 Pro$1.25$5.00Flagship
Gemini 2.0 Flash$0.10$0.40Budget

What does this mean in practice?

A typical conversation (1,000 input + 500 output tokens) costs:

  • GPT-4o Mini: $0.0005 (basically nothing)
  • GPT-4o: $0.0075
  • Claude Sonnet 4: $0.0105
  • Claude Opus 4: $0.0525

Context Windows: Why Size Matters

The "context window" is how much text a model can consider at once—both your input and its output combined.

8K

~6,000 words

Basic models

128K

~96,000 words

GPT-4o

200K

~150,000 words

Claude

1M+

~750,000 words

Gemini 1.5

Why does context window matter?

  • Longer documents: A 200K context window can process an entire book at once
  • Better memory: Longer conversations without the AI "forgetting" earlier parts
  • More complex tasks: Can consider more information when making decisions

The catch: Using a large context window costs more. If you send 100,000 tokens of context with every query, you're paying for those input tokens every single time.

Subscription vs API: Two Ways to Pay

There are fundamentally two pricing models for AI services:

📱 Subscription (ChatGPT Plus, Claude Pro)

How it works: Fixed monthly fee for access through the web/app interface.

Cost: £15-25/month

Best for: Individual users, moderate usage, no coding needed

Limits: Usually has usage caps (messages per hour/day)

⚡ API (Pay-as-you-go)

How it works: Pay per token used. For building apps or high-volume usage.

Cost: Varies by model and usage

Best for: Developers, businesses, automation, high volume

Limits: Rate limits (requests per minute), but no hard usage caps

💰 When API Becomes Cheaper

ChatGPT Plus costs $20/month. At GPT-4o's API pricing, that buys you roughly:

  • ~2,000 typical conversations, or
  • ~8 million input tokens, or
  • ~2 million output tokens

For most individual users, the subscription is better value. API pricing makes sense when you're building applications or processing large volumes automatically.

Hidden Costs and Gotchas

1. Conversation History Adds Up

Every time you continue a conversation, the AI re-reads the entire history. A 50-message conversation might have 20,000+ tokens of history being re-sent with each new message.

The Compounding Problem

Message 1: 500 tokens input → Message 2: 1,200 tokens input → Message 3: 2,100 tokens input → ... → Message 20: 15,000+ tokens input

This is why long conversations get expensive. Start fresh conversations for new topics.

2. System Prompts Count

If you're using custom instructions or system prompts, those tokens are included in every single request. A 500-token system prompt across 1,000 API calls = 500,000 extra input tokens.

3. Failed Requests Still Cost Money

If the model starts generating a response but you cancel it, or if there's an error partway through, you still pay for what was processed.

4. Caching Can Save Money

Some providers offer "prompt caching"—if you send the same prefix repeatedly, you pay reduced rates. Anthropic's prompt caching, for example, reduces input costs by up to 90% for cached content.

Real-World Cost Examples

Let's calculate actual costs for common use cases:

📧 Email Assistant (100 emails/day)

~300 input + 200 output tokens per email

GPT-4o Mini: $0.05/day (~$1.50/month) | GPT-4o: $0.95/day (~$28/month)

📝 Blog Writing (10 articles/month)

~500 input + 2,000 output tokens per article

GPT-4o Mini: $0.01/month | Claude Sonnet: $0.32/month

💬 Customer Support Bot (1,000 conversations/day)

~800 input + 400 output tokens per conversation

GPT-4o Mini: $0.36/day (~$11/month) | GPT-4o: $6/day (~$180/month)

📚 Document Analysis (50-page report)

~25,000 input + 1,000 output tokens

Claude Sonnet: $0.09/report | Claude Opus: $0.45/report

Choosing the Right Model for Your Budget

The most expensive model isn't always the best choice. Here's a practical framework:

Decision Framework

Use budget models (GPT-4o Mini, Gemini Flash, Haiku) when:

  • Simple tasks: classification, extraction, basic Q&A
  • High volume: thousands of requests per day
  • Speed matters more than nuance
  • You can accept occasional errors

Use flagship models (GPT-4o, Sonnet, Gemini Pro) when:

  • Complex reasoning or analysis required
  • Quality directly impacts outcomes
  • Nuanced writing or creative work
  • Moderate volume with higher stakes

Use premium models (Opus, o1) when:

  • Most complex tasks requiring deep reasoning
  • Research or analysis where accuracy is critical
  • Low volume, high-value outputs
  • Tasks where you'd spend hours doing it yourself

Cost Optimisation Strategies

1. Start Small, Scale Up

Try the cheapest model first. Only upgrade if quality isn't good enough. You might be surprised how capable budget models are.

2. Keep Prompts Concise

Every unnecessary word costs money. Be clear and direct. Remove examples and context that aren't needed.

3. Limit Output Length

Ask for "concise" responses or specify maximum length. Output tokens cost more, so shorter responses save money.

4. Use Caching

If your prompts have repeated elements (system prompts, context), use providers that offer prompt caching for significant savings.

5. Batch Similar Requests

Process multiple items in one request when possible. "Categorise these 10 emails" is cheaper than 10 separate "categorise this email" calls.

6. Start Fresh Conversations

Don't continue old conversations for new topics. The history accumulates and you pay for it with every message.

Free Tiers: What You Actually Get

Most providers offer free access with limitations:

ProviderFree OfferingLimitations
ChatGPTGPT-4o Mini accessLimited GPT-4o messages, slower at peak times
ClaudeSonnet accessDaily message limits, no Opus
GeminiGemini Pro accessRate limits, some features restricted
API Credits$5-18 free credits (new accounts)One-time, expires after 3 months typically

The Bottom Line

Key Takeaways

Tokens = ~¾ of a word. You pay for input (cheap) and output (expensive)

Subscriptions (~£20/month) are best for individuals; APIs are best for builders and high volume

Budget models are surprisingly capable—start there

Context window = how much text the model can consider at once

Conversation history is the hidden cost killer—start fresh when topics change

AI is remarkably cheap for what it delivers. A task that might take you an hour can often be done for pennies. The key is matching the right model to your needs—and not paying premium prices for tasks that budget models handle just fine.

AI pricingtokensAPI costsChatGPTClaudeguide
Share:

Stay Updated on AI

Get the latest news and tutorials

No spam, unsubscribe anytime.

Comments

Loading comments...

Related Articles