How LLMs actually work: a five-step mechanism

When you prompt an AI, it isn’t “thinking.” It’s following a five-step mechanism to guess what comes next.

1. Tokenisation

The model breaks your text into small pieces called “tokens.” This happens before the AI even starts processing.

2. Embeddings

These pieces are turned into vectors (long lists of numbers) that act as coordinates in a “meaning space.” This is how the model knows that “Python” the language is related to “JavaScript” but not to snakes.

3. Attention

Think of this like a spotlight. The model decides which words in your prompt are most important to focus on to understand the context.

4. Probabilities

The model doesn’t “know” the answer yet. It generates a probability score for every possible next token in its vocabulary, which can be over 100,000 options.

5. Sampling and looping

One token is selected based on your temperature settings, added to the text and then the entire process starts over.

Why this matters

The biggest takeaway? The model has no idea what it is about to say. There is no planned sentence or hidden script. It is simply making a probabilistic guess, one piece at a time, based on everything that came before it.

Understanding this mechanism, from why models hallucinate to why they have context limits, makes you a much better user of the technology.

Hallucinations aren’t bugs. They’re a natural consequence of a system that generates the most probable next token without any ground truth to check against. Context limits exist because the attention mechanism has a fixed window. Temperature controls the randomness of token selection, which is why lower temperatures produce more predictable output and higher temperatures produce more creative (and less reliable) results.

Once you understand the mechanism, you stop treating AI tools like magic and start treating them like what they are: sophisticated pattern-matching engines that you can steer with better inputs.