How to Count AI Tokens: A Complete Guide for ChatGPT, Claude & Gemini (2026)
Tokens control your AI costs and context limits. Learn exactly what tokens are, how different models count them, why non-English text costs more, and how to optimise your token usage.
What Is an AI Token?
A token is the basic unit of text that a large language model (LLM) processes. It is not simply a word or a character — it is a chunk of text determined by the model's tokeniser algorithm. The most widely used approach is Byte-Pair Encoding (BPE), which splits text into frequently recurring sequences of characters.
In practice, for standard English text:
- 1 token ≈ 4 characters
- 1 token ≈ 0.75 words
- 100 tokens ≈ 75 words
- 1,000 words ≈ 1,333 tokens
Why Tokens Matter
Tokens are the unit of measurement for two critical limits in AI work:
1. Context Window
Every AI model has a maximum number of tokens it can process in a single conversation — this is the context window. If your input plus the model's output exceeds this limit, older parts of the conversation get cut off, or the request fails entirely.
- GPT-4o: 128,000 tokens
- Claude 3.5 Sonnet: 200,000 tokens
- Gemini 1.5 Pro: 1,000,000 tokens
- GPT-3.5 Turbo: 16,385 tokens
When building AI-powered applications, keeping close track of token usage prevents silent truncation — one of the most common causes of confusing or incorrect AI responses.
2. API Cost
Commercial AI APIs charge per 1,000 tokens (or per 1 million tokens in newer pricing). Both input tokens (what you send) and output tokens (what the model returns) are billed separately, with output typically costing 3–5× more than input.
Example: A workflow that sends a 500-token system prompt + 200-token user message and receives a 300-token response uses 700 input tokens + 300 output tokens = 1,000 tokens total per request. At scale, this adds up quickly.
How Tokenisation Works: BPE Explained
Byte-Pair Encoding starts with individual characters and repeatedly merges the most frequently occurring pairs. After training on vast text corpora, the result is a vocabulary of 50,000–100,000 token pieces.
This means:
- Common words like "the", "is", "and" are usually a single token
- Longer or rarer words get split: "tokenisation" → ["token", "isation"] = 2 tokens
- Punctuation often becomes its own token: "Hello, world!" → ["Hello", ",", " world", "!"] = 4 tokens
- Numbers can be split digit-by-digit: "12345" → ["123", "45"] or ["1", "2", "3", "4", "5"]
- Code is often tokenised differently from prose — identifiers and symbols may cost more
→ ["Hello", ",", " how", " are", " you", " today", "?"] = 7 tokens
Tokens Across Different Models
Each model family uses its own tokeniser, so the same text can produce different token counts on different models:
- OpenAI models (GPT-3.5, GPT-4, GPT-4o): Use the
cl100k_baseoro200k_basetokeniser (via the open-source tiktoken library) - Anthropic Claude: Uses a proprietary tokeniser; token counts are typically close to OpenAI's but not identical — especially for code and special characters
- Google Gemini: Uses SentencePiece tokenisation; generally produces slightly higher counts than GPT-4 for the same English text
- Meta Llama: Uses BPE via SentencePiece with a 32,000-token vocabulary — typically produces more tokens per text than GPT-4o
For production applications, always measure token counts using the target model's actual tokeniser, not an approximation.
Why Non-English Text Uses More Tokens
AI tokenisers were primarily trained on English text, so non-Latin scripts and languages with rich morphology are significantly less efficient:
- Spanish, French, German: ~1.1–1.3× more tokens than English
- Turkish, Finnish, Hungarian: ~1.5–2× — agglutinative languages combine many meanings into one word, forcing the tokeniser to split it
- Arabic, Hebrew: ~1.5–2× — right-to-left scripts with complex morphology
- Chinese, Japanese: ~1.5–2.5× per character — CJK characters often tokenise individually
- Korean: ~2–3× — Hangul syllable blocks split differently by each tokeniser
- Thai: ~2–4× — no spaces between words makes segmentation expensive
Tokens in Code vs. Prose
Programming code is tokenised differently from natural language, and the efficiency varies by language:
- Python: Moderate — indentation whitespace consumes tokens, but keywords are common and well-represented
- JavaScript/TypeScript: Similar to Python; common keywords are single tokens
- SQL: Efficient for standard keywords (SELECT, FROM, WHERE), but table/column names may split
- Regex: Expensive — special characters often tokenise individually, making complex patterns very token-heavy
- JSON: Moderately expensive due to repeated punctuation (curly braces, quotes, colons) each consuming tokens
When sending code to an AI, consider stripping comments and whitespace you do not need — this can reduce token count by 15–30% for verbose codebases.
Practical Tips to Reduce Token Usage
1. Trim System Prompts
System prompts run on every API call. A 500-token system prompt across 10,000 requests costs 5 million input tokens. Review them regularly — remove redundant instructions, combine duplicate rules, and use shorter phrasing.
2. Summarise Long Conversations
As a chat conversation grows, the full history is re-sent with each message. After 10–15 exchanges, summarise the conversation so far and replace the raw history with the summary. This can cut token usage by 60–80% in long sessions.
3. Use Retrieval Instead of Stuffing Context
Instead of pasting entire documents into the prompt, use a retrieval-augmented generation (RAG) approach: retrieve only the relevant paragraphs and inject those. For a 100-page PDF, this reduces input from ~80,000 tokens to ~1,000–2,000 tokens per query.
4. Request Concise Outputs
Explicitly ask for shorter responses when detail is not needed: "Answer in 2–3 sentences", "Give a bullet-point summary only", "Skip the explanation and give just the final answer." Output tokens are typically more expensive than input, so shorter outputs meaningfully reduce cost.
5. Strip Unnecessary Formatting
When sending data (logs, JSON payloads, HTML), remove indentation, extra newlines, and redundant whitespace before sending. For JSON, minify it. This alone can cut 10–25% of tokens from data-heavy prompts.
6. Choose the Right Model for the Task
Smaller, faster models (GPT-4o mini, Claude Haiku, Gemini Flash) cost 10–20× less per token than their full-size counterparts. For simple classification, keyword extraction, or formatting tasks, the smaller model usually performs equally well. Reserve the large model for complex reasoning and generation.
How to Calculate AI API Costs
Use this formula for any OpenAI-compatible API:
Cost = (Input Tokens / 1,000,000 × Input Price) + (Output Tokens / 1,000,000 × Output Price)
Example — GPT-4o (May 2026 pricing: $2.50/M input, $10/M output):
Request: 800 input tokens + 400 output tokens
Input cost: 800 / 1,000,000 × $2.50 = $0.000002
Output cost: 400 / 1,000,000 × $10 = $0.000004
Total per request: $0.000006
At 100,000 requests/month: $0.60/month
Costs seem trivial per request but scale with volume. At 10 million requests per month with a verbose 2,000-token prompt, the input cost alone would exceed $50,000.
Checking Token Counts Before You Send
The best practice is to count tokens before making an API call — especially for long inputs. This lets you:
- Verify you are within the model's context window
- Estimate the cost before committing to an expensive call
- Trim the input if it is unexpectedly long
- Compare how the same text performs across different models
You can use OpenAI's tiktoken library in Python (pip install tiktoken), or use a browser-based token counter for quick checks without any setup.
Count Your AI Tokens for Free
Paste any text and instantly see token count for GPT-4, Claude, and Gemini models — with cost estimates and character breakdown.