Skip to main content
Bluecoders
← Tech glossary

Token (AI)

TermConcept

In the context of LLMs, a token is the basic unit manipulated by the model: a chunk of text (often part of a word, sometimes a short whole word or a single character) produced by a tokenizer before inference.

In the context of LLMs, a token is the basic unit manipulated by the model: a chunk of text (often part of a word, sometimes a short whole word or a single character) produced by a tokenizer before inference.

A French text of 1,000 characters typically represents between 250 and 350 tokens. LLMs bill usage based on the number of input and output tokens, and their context window is also expressed in tokens.

The choice of tokenizer (BPE, SentencePiece, Tiktoken…) influences performance on non-English languages: a tokenizer poorly optimised for French can consume many more tokens per character than a well-suited one.

Ready to find the missing piece of your team?

Let's talk about your hiring needs. A team member will get back to you quickly to qualify the brief and kick off the search.