Everything about tokens in ChatGPT - Experience the Future: Dive into Azure & AI

Have you ever noticed that you get error messages in ChatGPT, or that your conversation increasingly resembles a conversation with a very forgetful person? This is all about tokens. In this blog, we describe what tokens are and why they are important.

What are tokens?

Tokens are the fundamental units of text used by language models like ChatGPT to process and analyze text. Tokens can be single characters, words, or parts of words, depending on the tokenizer used and the language.

In the context of natural language processing (NLP) and language models, a text is first broken down into tokens before it is analyzed. The process of breaking down text into tokens is called tokenization. A tokenizer is an algorithm trained to split text in a meaningful way so that the language model can understand the structure and meaning of the text.

For example, the sentence “ChatGPT helps answer questions.” can be broken down into the following tokens: [“ChatGPT”, “helps”, “answer”, “questions”, “.”]. Note that punctuation marks, such as the period at the end of the sentence, are also considered separate tokens.

In short, tokens are the basic units of text that enable language models to analyze, understand, and generate text.

What happens when tokens in a session run out

ChatGPT does not have a strict limit on the total number of tokens in a single session. However, there are certain constraints when it comes to processing tokens while generating a response. For OpenAI models, such as GPT-3, there is a maximum limit of 4096 tokens for processing both input and output tokens. GPT-4 has a limit of 8096 tokens.

If the combination of input tokens and output tokens exceeds the maximum token limit, there are various approaches you can consider:

Shorten the text: Shorten the conversation history or the question to keep the total number of tokens below the limit.
Split the text: Divide the text into smaller parts and input them separately.

When you reach the limit while generating a response, the model may no longer be able to produce meaningful or complete answers. It appears as if the chat starts to ‘forget’ things. It is essential to manage the input and output text to ensure that the tokens remain within the model’s limits, so you can get effective and accurate results.

How tokens work in ChatGPT

Splitting text: ChatGPT breaks the input text into tokens. In some cases, a token is as small as one character, but it can also be a word or a part of a word.
Token IDs: Each token is associated with
a unique number, the so-called token ID. These IDs are stored in a vocabulary, which is a large list of all tokens the model knows.
Input: The token IDs are arranged in a sequence that reflects the original order of the text. This sequence is then fed into the model.
Processing: ChatGPT consists of layers of neural networks that process the token IDs. In each layer, the relationships between the tokens are evaluated to understand meaning and context.
Output: After passing through the neural network layers, ChatGPT generates a sequence of token IDs as output. These IDs represent the most likely continuation of the text based on the input and the knowledge the model has gained during training.
Back to text: Finally, the token IDs from the output are converted into their corresponding textual tokens and merged to form the final text response.

A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words). Want to see how tokens work for yourself? Using the following site, you can see that this article consists of 852 tokens and 3738 characters:

Open AI Tokenizer

What are tokens?

What happens when tokens in a session run out

How tokens work in ChatGPT

Related Posts

Leave a Comment