Token Getting Started Guide: A Complete Manual of Basic Units of Text in the AI Era

You may have encountered this situation: you only wrote a few hundred words of prompt words, but the AI did not reply to you; or when you saw the API bill at the end of the month, you were confused-why did you use so many words? Behind these questions, they all point to a concept that you must understand: Token.

Token is the basic unit for the AI model to understand, process, and generate text. By understanding it, you can accurately estimate API costs, write more efficient prompts, and avoid inexplicable truncation. To put it bluntly, if you don't understand Token, you are groping in a muddle when using AI.

I will take you from scratch and explain all the ins and outs of Token, calculation methods, price differences between different models, and money-saving techniques in practical applications. After you finish school, you can not only calculate how much Tokens you spend on each call, but you can also use skills to cut the cost in half.

1. What will you learn

After completing this tutorial, you can complete the following tasks independently:

Accurately calculate the number of Tokens in any text: Whether it is a Chinese or an English paper, you can calculate how many Tokens are consumed
Compare the Token prices of different AI models: Know the difference between GPT-4, Claude, and domestic models, and why the price of the same content is several times different
Understand the logic behind API bills: no longer be confused by inexplicable billing, you can calculate the cost of each call yourself
Write a prompt to save more Tokens: Use techniques to compress the same effect into fewer Tokens, saving real money
Handle the boundary situation of long text: Know what the context window is, what to do when it is full, and how to process long documents in blocks

2. Pre-knowledge

Before reading this article, you need to know these:

Basic programming concepts: Know what variables, functions, and API calls are. As long as you don't feel scared when you see the code, you don't need to know how to write it.
Have used ChatGPT or similar AI conversation tools: Know what AI can do and have a basic impression of the word "big model."
Junior high school English level: Token calculation involves some English terms, but I can explain the core concepts thoroughly in Chinese.

If you are a pure novice, it is recommended to try ChatGPT or Claude first before reading this article. You will feel more.

3. Environmental preparation

Before starting to calculate Tokens, you need to have calculation tools ready. I recommend using the 'tiktoken' library officially provided by OpenAI, which is the most accurate calculation tool currently available.

Software to be installed:

Python 3.8+: Programming environment, must be installed. If you are not sure about the version, open the terminal and type python3 --version to view it.
pip: Python package manager, usually installed with Python. Verification command: pip --version

Installation steps:

Open the terminal (Mac is the Terminal application, Windows is the PowerShell or Command Prompt)
Create a folder specifically for practice:

mkdir token-practice && cd token-practice

Install Python virtual environment tools (optional but recommended):

python3 -m venv venv

Activate the virtual environment:

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

After successful activation, the (venv) flag will appear in front of the terminal prompt.

Install the tiktoken library:

pip install tiktoken

Expected output:

Collecting tiktoken
  Downloading tiktoken-0.7.0-cp311-cp311-macosx_10_9_x86_64.macwheel11.0.dist.whl (xxx kB)
Installing collected packages: tiktoken
Successfully installed tiktoken-0.7.0

Install an additional auxiliary tool 'langchain' to facilitate subsequent demonstrations:

pip install langchain

After the installation is complete, enter python -c "import tiktoken; print ('tiktoken OK')" to verify that the installation was successful. If you see 'tiktoken OK', it means there is no problem.

4. What exactly is Token

To understand Token, you must first understand how the AI model handles text.

AI models do not directly "read" text like the human brain does. It first breaks the text into small pieces, processes it and then puts it back together. These small pieces are Tokens.

A simple analogy: When you read an article, your eyes are actually scanning groups of words quickly, rather than reading each word at a time. The AI model processes text in a similar way, except that its "grouping rules" follow linguistic statistical rules.

English Token Rules:

The Token division in English is relatively simple and crude. The usual rules are:

1 Token ≈ 4 characters
1 English word ≈ 1-2 Tokens

For example,'hello' is 1 Token,'Artificial' is 2 Tokens (because it is relatively long), and 'a' is 1 Token.

Chinese Token rule:

The situation in Chinese is a little more special. Chinese does not have natural spaces separated like English, so it is handled differently:

1 Chinese character ≈ 1-2 Tokens
Punctuation marks also count as tokens
1 Chinese word ≈ 1-3 Tokens

More accurately, the number of Tokens in Chinese text depends on the specific word segmentation and model used. OpenAI's GPT series uses the BPE (Byte Pair Encoding) algorithm, and Chinese calculates 1-2 Tokens per character on average.

Why is Token division so complicated?

Because the AI model does not understand language in terms of "words" or "words", it understands language in terms of "subwords." The word Transformer 'may be broken into two Tokens' Trans '+' former 'because these two parts often appear in other words. This splitting allows the model to better handle new words that have not been seen before.

Remember this core point: Token is not the number of words, but the minimum number of units processed by the model. In the same paragraph, Chinese usually consumes less Tokens than English (because of the high information density of Chinese words), but this is not absolute and depends on the specific content.

5. How is the Token number calculated

Now that you know what Token is, you need to figure out how to calculate it.

Official calculation rules (take OpenAI as an example):

OpenAI uses a word segmentation algorithm called BPE (Byte Pair Encoding). The simple understanding is: count the most frequently occurring character combinations in the corpus and merge them into Tokens. The more mergers, the larger the Token formed.

The 'cl100k_base' encoder used by GPT-4 can recognize about 100,000 different Tokens.

Practical calculation method:

Use the 'tiktoken' library to calculate the most accurately:

import tiktoken

# Select encoder, use cl100k_base for GPT-4
encoding = tiktoken.get_encoding("cl100k_base")

text = "Hello, world! Hello world! "

# The encode method returns a list of Token IDs
token_ids = encoding.encode(text)

print(f"text length: {len(text)} characters")
print(f"Token quantity: {len(token_ids)}")
print(f"Token IDs：{token_ids}")

Expected output:

Text length: 28 characters
Number of Tokens: 25
Token IDs：[40, 1917, 1917, 443, 1917, 357, 248, 1917, 220, 1917, 357, 1917, 1917, 1917, 1917, 1917, 1917, 1917, 1917, 1917, 1917, 1917, 1917, 1917, 1917]

See? 28 characters correspond to 25 Tokens. The ratio is about 1:1, but punctuation and special characters will account for more.

Token consumption estimates for different scenarios:

text type	estimation mode	example
Short English sentences	1 Token for every 4 characters	"Hello world" → 2 Token
Long English paragraph	1 Token per 4 characters +0.75 Token per word	100 words article → approximately 80-100 Tokens
Short Chinese sentences	1-1.5 Token per character	"Hello World" → 4-5 Token
Chinese long paragraph	1-1.5 Token per character	100 words in Chinese → approximately 120-150 Tokens
code	There is a big difference. Simple code is close to English, and complex code costs more Token.	1 line of code → 5-20 Token

Recommended practical calculation tools:

In addition to tiktoken, there are also some online tools that can be used directly:

OpenAI Tokenizer (platform.openai.com/tokenizer): The official tool, the most accurate, but only supports GPT series encoding
Anthropic Token Counter (anthropic.com): Claude's dedicated calculator
tiktokenizer (tiktokenizer.vercel.app): Online version of tiktokenizer, with good visualization

It is recommended to collect OpenAI Tokenizer and throw it in to calculate when you are uncertain.

6. Comparison of Token prices for mainstream AI models

After you know how to calculate Token, you must care about the price. The difference between different models is not even a bit.

Comparison of Token prices for mainstream models (data as of 2024, prices are estimates, and actual prices are based on the latest official pricing):

model	Enter the price ($/1M Token)	Export price ($/1M Token)	contextual Windows	remarks
GPT-4o	$5.00	$15.00	128K	The most cost-effective multimodal model
GPT-4 Turbo	$10.00	$30.00	128K	A quick version of GPT-4
GPT-4（8K）	$30.00	$60.00	8K	Old version, more expensive
GPT-3.5 Turbo	$0.50	$1.50	16K	Cheapest GPT model
Claude 3.5 Sonnet	$3.00	$15.00	200K	Strongest processing of long text
Claude 3 Opus	$15.00	$75.00	200K	Flagship model, more expensive
Claude 3 Haiku	$0.25	$1.25	200K	The cheapest Claude
Tongyi Thousand Questions Qwen-Max	¥0.04	¥0.12	8K	Domestic low-cost options
Wenxin Yiyan 4.0	¥0.12	¥0.12	4K	Baidu system has strict current restrictions
Kimi（Moonshot）	¥0.012	¥0.012	128K	King of long-context cost performance

Reasons for price differences:

You may ask, why is Claude 3.5 Sonnet cheaper than GPT-4o, but the context window is so much larger? There are several reasons behind this:

Different model capabilities: GPT-4o is stronger in reasoning, multi-round dialogue, and multimodal understanding.
Differences in training costs: The larger the model and the more training data, the higher the cost
Market positioning: Domestic models usually adopt low-price strategies to seize the market
Contextual windows: Being able to process longer text means more video memory and computing resources are needed

Example of actual cost calculation:

Suppose you want to process a 100,000-word Chinese document (about 50,000 Tokens), using the cost of different models:

model	input costs	Output cost (assuming 20,000 Tokens)	total cost
GPT-4o	$0.025	$0.30	$0.325
GPT-3.5 Turbo	$0.025	$0.03	$0.055
Claude 3.5 Sonnet	$0.015	$0.30	$0.315
Kimi	¥0.06	¥0.24	¥0.30

If you process 10 of these documents a day, it costs about $97 per month with GPT-4o and $16 per month with GPT-3.5 Turbo. The gap is still obvious.

7. Practical practice 1: Calculate the number of Tokens in any text

Finally, it's the hands-on part. In this section, I will take you to write a complete Token calculation script.

Task goal: Write a Python script that can calculate the number of Tokens for any text and give a rough cost estimate.

Full code:

# !/ usr/bin/env python3
"""
Token calculator
Function: Calculate the number of text tokens and estimate the cost of API calls
"""

import tiktoken
import sys

# Encoder names for different models
ENCODERS = {
    "gpt-4": "cl100k_base",
    "gpt-3.5": "cl100k_base",
    "claude": "cl200k_base", # Claude dedicated
}

# Price list ($/1M Token), data source: official pricing of each manufacturer (estimate)
PRICES = {
    "gpt-4o": {"input": 5.00, "output": 15.00},
    "gpt-4-turbo": {"input": 10.00, "output": 30.00},
    "gpt-3.5-turbo": {"input": 0.50, "output": 1.50},
    "claude-3.5-sonnet": {"input": 3.00, "output": 15.00},
    "claude-3-haiku": {"input": 0.25, "output": 1.25},
}

def count_tokens(text, model="gpt-4"):
    """Calculate the number of tokens in the text"""
    #Both GPT-4 and GPT-3.5 use cl100k_base
    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(text)
    return len(tokens)

def estimate_cost(token_count, model="gpt-4o", is_output=False):
    """Estimate API call costs"""
    if model not in PRICES:
        return None
    
    price = PRICES[model]["output"] if is_output else PRICES[model]["input"]
    cost = (token_count / 1_000_000) * price
    return cost

def main():
    print("=" * 50)
    print("Token Calculator v1.0")
    print("=" * 50)
    
    #Read input text
    if len(sys.argv) > 1:
        #Read from command line parameters
        text = " ".join(sys.argv[1:])
    else:
        #Interactive Input
        print("\nPlease enter the text to be calculated (press Ctrl+D to finish after entering):")
        text = sys.stdin.read()
    
    if not text.strip():
        print("Error: No text entered")
        sys.exit(1)
    
    #Calculate the number of tokens
    token_count = count_tokens(text)
    char_count = len(text)
    
    print(f"\n [Calculation Result]")
    print(f"Number of characters: {char_count}")
    print(f"Token quantity: {token_count}")
    print(f"Token/character ratio: {token_count/char_count:.2f}")
    
    #Estimate costs for each model
    print(f"\n [Cost estimation (assuming 1000 Tokens output)]")
    for model, prices in PRICES.items():
        input_cost = estimate_cost(token_count, model, is_output=False)
        output_cost = estimate_cost(1000, model, is_output=True)
        total = input_cost + output_cost
        print(f"{model:25s} |Input: ${input_cost:.6f}| Output: ${output_cost:.6f}| Total: ${total:.6f}")

if __name__ == "__main__":
    main()

Operation mode:

Save the above code as token_counter.py
Method 1: Calculate a single line of text

python token_counter.py "Hello, how are you today? "

Expected output:

==================================================
Token Calculator v1.0
==================================================

[Calculation Results]
Number of characters: 26
Number of Tokens: 8
Token/character ratio: 0.31

[Cost estimation (assuming output of 1000 Tokens)]
gpt-4o                   | Enter: $0.000040| Output: $0.015000| Total: $0.015040
gpt-4-turbo              | Enter: $0.000080| Output: $0.030000| Total: $0.030080
gpt-3.5-turbo            | Enter: $0.00004| Output: $0.001500| Total: $0.001504
claude-3.5-sonnet        | Enter: $0.000024| Output: $0.015000| Total: $0.015024
claude-3-haiku           | Input: $0.00002| Output: $0.001250| Total: $0.001252

Method 2: Calculate multiple lines of text (interactive mode)

python token_counter.py

Then enter multiple lines of text and press Ctrl+D (Mac) or Ctrl+Z (Windows) to end the entry.

If an error is reported:

ModuleNotFoundError: No module named 'tiktoken': Run pip install tiktoken to reinstall
UnicodeEncodeError: It may be a terminal encoding problem. Try setting export PYTHONIOENCODING=utf-8
Enter Chinese and display garbled code: Make sure the terminal code is UTF-8. Windows users may need 'chcp65001'

8. Practical combat 2: Optimize prompt words to reduce Token consumption

After calculating the number of Tokens, the next step is how to save money. Prompt optimization is the most cost-effective way-the same effect costs half the money.

Scenario example: You need to write a prompt and ask AI to help you review the code.

Before optimization (lengthy version):

Please help me review the code below. I will give you a piece of Python code that you need to review from the following aspects:

1. Correctness of the code: Check whether the code works properly and whether there are any syntax errors or logic errors
2. Code security: Check for security vulnerabilities, such as SQL injection, XSS, etc.
3. Code performance: Check for obvious performance issues, such as unnecessary loops, inefficient algorithms, etc.
4. Readability of the code: Check whether the code is easy to understand, whether the variable names are clear, and whether the comments are sufficient
5. Normalness of the code: Check whether it conforms to the PEP8 specification and whether the naming specification is consistent

Please provide a detailed analysis of each aspect. If a problem is found, please point out the specific line number and problem description.
Finally, please give an overall evaluation, on a scale of 1-10, and explain the reasons.

Here is the code:
def calculate_sum(numbers):
    result = 0
    for i in range(len(numbers)):
        result = result + numbers[i]
    return result

Please review it carefully.

After optimization (simplified version):

Review this Python code: check for correctness, security, performance, readability, and standardization. Point out specific problems and score an overall evaluation of 1-10.

Code:
def calculate_sum(numbers):
    result = 0
    for i in range(len(numbers)):
        result = result + numbers[i]
    return result

Comparison results:

version	number of characters	Token number	Savings ratio
before optimization	482	142	-
optimized	147	52	63%

Summary of optimization skills:

Delete the nonsense: AI can understand concise instructions and does not need the polite words of "please" and "please be careful"
Consolidation and duplication requirements: originally, five aspects were mentioned, but in fact, the word "review" can be summarized
Remove the example format: If the AI output format requirements are not strict, don't write too much detail
Get straight to the point: "Review the code" is much simpler than "Help me review the code from the following aspects"
Use abbreviations and code names: for example,"1-10" is cheaper than "scale from one to ten"

Another practical case: Optimization of structured output

Suppose you want AI to return data in JSON format:

Before optimization:

Please help me generate a user information, including user name, email address, age, mobile phone number, and the format is JSON. Please make sure that the email format is correct, that the age is a number, and that the mobile phone number is 11 digits. Please let me know if any of the fields do not meet the requirements.

After optimization:

Generate user JSON: {"name":"","email":"","age":0,"phone":""}. The mailbox must conform to the format, the age is numeric, and the phone is 11 digits.

Token was reduced from 68 to 31, saving 54%, with almost the same effect.

9. Key configuration: context window and Token restrictions

In addition to the Token consumption of a single call, there is another important concept: the Context Window.

What is a context window?

The context window refers to the upper limit of the total number of Tokens that AI can process at one time. Including the prompt words you entered + all previous AI replies + current replies.

For example: The context window of GPT-4o is 128K Tokens, which means that in a single conversation, the total of all historical content cannot exceed 128,000 Tokens.

Why is this restriction important?

Because AI does not have infinite memories. When your conversation is too long and exceeds the context window:

Old content will be "squeezed out"(discarding the earliest news)
Or directly report an error and prevent you from continuing to enter

Comparison of context windows for different models:

model	contextual Windows	Approximately equal to how many words
GPT-4o	128K	About 100,000 Chinese characters
GPT-3.5 Turbo	16K	About 12,000 Chinese characters
Claude 3.5 Sonnet	200K	About 150,000 Chinese characters
Claude 3 Opus	200K	About 150,000 Chinese characters
Kimi	128K	About 100,000 Chinese characters

What should I do if I exceed the limit?

Scenario: You have a 200,000-word article for AI summary, but the model context window only has 128,000 Tokens.

Solution: Block processing

def chunk_text(text, max_tokens=3000, overlap=100):
    """
    Divide long text into blocks, each block does not exceed max_tokens
    overlap is the number of overlapping Tokens between blocks used to maintain context continuity
    """
    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(text)
    
    chunks = []
    start = 0
    
    while start < len(tokens):
        end = start + max_tokens
        chunk_tokens = tokens[start:end]
        chunk_text = encoding.decode(chunk_tokens)
        chunks.append(chunk_text)
        
        #Move the window and add overlap
        start = end - overlap
    
    return chunks

# uses examples
long_text = """Here is the content of a very long document...
(Omitted 200,000 words)"""

chunks = chunk_text(long_text, max_tokens=3000, overlap=200)
print(f"document is divided into {len(chunks)} blocks")

# block-wise processing
for i, chunk in enumerate(chunks):
    print(f"\nProcess {i+1} block ({len(encoding.encode(chunk))} Token)... ")
    #Call the AI API here to process each piece

Practical skills:

When processing long documents, it is recommended to leave a 20% margin for the number of Tokens per block (for example, the limit is 3000, but the actual use is 2400)
Use overlap to ensure that content between blocks does not break apart
You can add a sentence of "previous summary" at the beginning of each piece to let the AI know the context

10. Common problems and troubleshooting

This section lists several pits that are easy to step on in practical use.

Problem 1: The calculation result of 'tiktoken' is different from that returned by the API

Reason: API billing will also include the message format (role, content and other fields), not just the text content itself
Solution: If you want to accurately match API billing, use 'len(response.usage)' to obtain the actual number of Tokens returned by the API instead of calculating it yourself
Code example:

import openai

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "your problem"}]
)

# Get the exact number of tokens from the API response
print(f"Enter Token: {response.usage.prompt_tokens}")
print(f"Output Token: {response.usage.completion_tokens}")
print(f"Total Token: {response.usage.total_tokens}")

Question 2: Claude's Token calculation is different from GPT

Reason: Claude uses a different word segmenter (tiktoken does not support Claude's encoding)
Solution: Use Anthropic official tools or estimates (Claude's encoding is about 3-4 Tokens per character, which is more compact than GPT's BPE)
Estimation formula: Claude Token number ≈ character number × 0.75 (Chinese) or character number/ 4 (English)

Question 3: The prompt word is too long and is truncated

Error message: This model's maximum context window is 128000 tokens
Reason: Prompt words + historical dialogue + expected output exceeds the context limit of the model
Solution:
1. Shorten the prompt words (use the optimization techniques in Section 8 of this article)
2. Clean up the conversation history and delete unimportant news
3. If the limit exceeds the limit in a single call, use blocking processing (see Section 9)

Question 4: The API returns a '400 Bad Request' or '413' error

Reason: The request body exceeds the maximum number of Tokens allowed by the model
Solution: Check the length of the prompt word to ensure that the total number of Tokens is <the context window limit × 0.9 (reserve 10% for the output)
Debugging code:

def check_limit(prompt, model_max_tokens, output_estimate=500):
    """Check to see if the limit is exceeded"""
    prompt_tokens = count_tokens(prompt)
    total_needed = prompt_tokens + output_estimate
    
    if total_needed > model_max_tokens * 0.9:
        print(f"Warning: {total_needed} Token is required, limit may be exceeded")
        print(f"Suggestion: Reduce approximately {total_needed - int(model_max_tokens * 0.9)} Token")
    else:
        print(f"Security: Use {total_needed} Token, remaining {int(model_max_tokens * 0.9 - total_needed)} Token")

Question 5: The output was accidentally truncated

Reason: The output is too long, triggering the output limit of the model (not the context limit)
Solution: Clearly require "concise" or segmented output in the prompt, or use the max_tokens parameter to limit the output length

11. Advanced direction

After completing the basics, you can continue to explore in these directions:

Direction 1: Build an automated pipeline optimized for Tokens

Integrate Token calculation into the prompt word development process, and automatically detect Token consumption and cost every time a prompt word is modified. For example, write a script to automatically alarm when the number of Tokens exceeds a threshold.

Direction 2: Learn the principles of word segmentation of different models

If you are interested in technical details, you can study the principles of word segmentation algorithms such as BPE, WordPiece, and SentencePiece. Understanding the underlying principles can help you optimize prompts more accurately.

Direction 3: Explore the latest technologies for long-context optimization

RAG (Retrieval Enhanced Generation), MapReduce mode, reflection mode, etc. are all advanced skills for processing long documents. These methods allow you to process extra-long text at a lower cost.

Direction 4: Cost monitoring and budget management

In the production environment, it is recommended to build a monitoring kanban for Token consumption. Record the cost of each API call and set budget alerts to avoid exploding bills at the end of the month.

Token may seem like a small concept, but it is the underlying logic used by AI. Understand it, your efficiency in using AI can be greatly improved and your cost can be reduced a lot. I hope this tutorial is helpful to you.

If there is anything that you haven't explained clearly, or if you encounter other problems in actual operation, you are welcome to chat in the comment area.

Token Getting Started Guide: A Complete Manual of Basic Units of Text in the AI Era

Token Getting Started Guide: A Complete Manual of Basic Units of Text in the AI Era

1. What will you learn

2. Pre-knowledge

3. Environmental preparation

4. What exactly is Token

5. How is the Token number calculated

6. Comparison of Token prices for mainstream AI models

7. Practical practice 1: Calculate the number of Tokens in any text

8. Practical combat 2: Optimize prompt words to reduce Token consumption

9. Key configuration: context window and Token restrictions

10. Common problems and troubleshooting

11. Advanced direction

Related Articles

面试官问你：如何解决大模型的上下文长度限制——标准回答框架

大模型上下文长度限制完全指南：从原理到工程落地的 4 种方案

面试官问你：RAG 如何处理 PDF——别再说转文本切片了