Guide 07 May 2026 9 min read

How to Count AI Tokens: A Complete Guide for ChatGPT, Claude & Gemini (2026)

Tokens control your AI costs and context limits. Learn exactly what tokens are, how different models count them, why non-English text costs more, and how to optimise your token usage.

What Is an AI Token?

A token is the basic unit of text that a large language model (LLM) processes. It is not simply a word or a character — it is a chunk of text determined by the model's tokeniser algorithm. The most widely used approach is Byte-Pair Encoding (BPE), which splits text into frequently recurring sequences of characters.

In practice, for standard English text:

1 token ≈ 4 characters
1 token ≈ 0.75 words
100 tokens ≈ 75 words
1,000 words ≈ 1,333 tokens

Quick rule of thumb: Take your word count, multiply by 1.33, and you have a rough token estimate for English text. For non-English languages, multiply by 1.5–3× instead.

Why Tokens Matter

Tokens are the unit of measurement for two critical limits in AI work:

1. Context Window

Every AI model has a maximum number of tokens it can process in a single conversation — this is the context window. If your input plus the model's output exceeds this limit, older parts of the conversation get cut off, or the request fails entirely.

GPT-4o: 128,000 tokens
Claude 3.5 Sonnet: 200,000 tokens
Gemini 1.5 Pro: 1,000,000 tokens
GPT-3.5 Turbo: 16,385 tokens

When building AI-powered applications, keeping close track of token usage prevents silent truncation — one of the most common causes of confusing or incorrect AI responses.

2. API Cost

Commercial AI APIs charge per 1,000 tokens (or per 1 million tokens in newer pricing). Both input tokens (what you send) and output tokens (what the model returns) are billed separately, with output typically costing 3–5× more than input.

Example: A workflow that sends a 500-token system prompt + 200-token user message and receives a 300-token response uses 700 input tokens + 300 output tokens = 1,000 tokens total per request. At scale, this adds up quickly.

How Tokenisation Works: BPE Explained

Byte-Pair Encoding starts with individual characters and repeatedly merges the most frequently occurring pairs. After training on vast text corpora, the result is a vocabulary of 50,000–100,000 token pieces.

This means:

Common words like "the", "is", "and" are usually a single token
Longer or rarer words get split: "tokenisation" → ["token", "isation"] = 2 tokens
Punctuation often becomes its own token: "Hello, world!" → ["Hello", ",", " world", "!"] = 4 tokens
Numbers can be split digit-by-digit: "12345" → ["123", "45"] or ["1", "2", "3", "4", "5"]
Code is often tokenised differently from prose — identifiers and symbols may cost more

Example tokenisation of "Hello, how are you today?"
→ ["Hello", ",", " how", " are", " you", " today", "?"] = 7 tokens

Tokens Across Different Models

Each model family uses its own tokeniser, so the same text can produce different token counts on different models:

OpenAI models (GPT-3.5, GPT-4, GPT-4o): Use the cl100k_base or o200k_base tokeniser (via the open-source tiktoken library)
Anthropic Claude: Uses a proprietary tokeniser; token counts are typically close to OpenAI's but not identical — especially for code and special characters
Google Gemini: Uses SentencePiece tokenisation; generally produces slightly higher counts than GPT-4 for the same English text
Meta Llama: Uses BPE via SentencePiece with a 32,000-token vocabulary — typically produces more tokens per text than GPT-4o

For production applications, always measure token counts using the target model's actual tokeniser, not an approximation.

Why Non-English Text Uses More Tokens

AI tokenisers were primarily trained on English text, so non-Latin scripts and languages with rich morphology are significantly less efficient:

Spanish, French, German: ~1.1–1.3× more tokens than English
Turkish, Finnish, Hungarian: ~1.5–2× — agglutinative languages combine many meanings into one word, forcing the tokeniser to split it
Arabic, Hebrew: ~1.5–2× — right-to-left scripts with complex morphology
Chinese, Japanese: ~1.5–2.5× per character — CJK characters often tokenise individually
Korean: ~2–3× — Hangul syllable blocks split differently by each tokeniser
Thai: ~2–4× — no spaces between words makes segmentation expensive

Practical impact: If you build a multilingual chatbot and budget based on English token usage, your actual costs for Turkish, Arabic, or Thai users could be 2–3× higher. Always benchmark with real translated content.

Tokens in Code vs. Prose

Programming code is tokenised differently from natural language, and the efficiency varies by language:

Python: Moderate — indentation whitespace consumes tokens, but keywords are common and well-represented
JavaScript/TypeScript: Similar to Python; common keywords are single tokens
SQL: Efficient for standard keywords (SELECT, FROM, WHERE), but table/column names may split
Regex: Expensive — special characters often tokenise individually, making complex patterns very token-heavy
JSON: Moderately expensive due to repeated punctuation (curly braces, quotes, colons) each consuming tokens

When sending code to an AI, consider stripping comments and whitespace you do not need — this can reduce token count by 15–30% for verbose codebases.

Practical Tips to Reduce Token Usage

1. Trim System Prompts

System prompts run on every API call. A 500-token system prompt across 10,000 requests costs 5 million input tokens. Review them regularly — remove redundant instructions, combine duplicate rules, and use shorter phrasing.

2. Summarise Long Conversations

As a chat conversation grows, the full history is re-sent with each message. After 10–15 exchanges, summarise the conversation so far and replace the raw history with the summary. This can cut token usage by 60–80% in long sessions.

3. Use Retrieval Instead of Stuffing Context

Instead of pasting entire documents into the prompt, use a retrieval-augmented generation (RAG) approach: retrieve only the relevant paragraphs and inject those. For a 100-page PDF, this reduces input from ~80,000 tokens to ~1,000–2,000 tokens per query.

4. Request Concise Outputs

Explicitly ask for shorter responses when detail is not needed: "Answer in 2–3 sentences", "Give a bullet-point summary only", "Skip the explanation and give just the final answer." Output tokens are typically more expensive than input, so shorter outputs meaningfully reduce cost.

5. Strip Unnecessary Formatting

When sending data (logs, JSON payloads, HTML), remove indentation, extra newlines, and redundant whitespace before sending. For JSON, minify it. This alone can cut 10–25% of tokens from data-heavy prompts.

6. Choose the Right Model for the Task

Smaller, faster models (GPT-4o mini, Claude Haiku, Gemini Flash) cost 10–20× less per token than their full-size counterparts. For simple classification, keyword extraction, or formatting tasks, the smaller model usually performs equally well. Reserve the large model for complex reasoning and generation.

How to Calculate AI API Costs

Use this formula for any OpenAI-compatible API:

Cost = (Input Tokens / 1,000,000 × Input Price) + (Output Tokens / 1,000,000 × Output Price)

Example — GPT-4o (May 2026 pricing: $2.50/M input, $10/M output):

Request: 800 input tokens + 400 output tokens

Input cost:  800 / 1,000,000 × $2.50 = $0.000002
Output cost: 400 / 1,000,000 × $10   = $0.000004
Total per request: $0.000006

At 100,000 requests/month: $0.60/month

Costs seem trivial per request but scale with volume. At 10 million requests per month with a verbose 2,000-token prompt, the input cost alone would exceed $50,000.

Checking Token Counts Before You Send

The best practice is to count tokens before making an API call — especially for long inputs. This lets you:

Verify you are within the model's context window
Estimate the cost before committing to an expensive call
Trim the input if it is unexpectedly long
Compare how the same text performs across different models

You can use OpenAI's tiktoken library in Python (pip install tiktoken), or use a browser-based token counter for quick checks without any setup.

Count Your AI Tokens for Free

Paste any text and instantly see token count for GPT-4, Claude, and Gemini models — with cost estimates and character breakdown.

Open AI Token Counter

English Türkçe Español Deutsch Français Italiano 日本語 Bahasa Melayu ภาษาไทย Русский