Transformer
The transformer is the neural-network architecture introduced by Google in the paper Attention Is All You Need (2017), which underpins practically every modern generative AI model: LLMs, image, code and multimodal model…
The transformer is the neural-network architecture introduced by Google in the paper Attention Is All You Need (2017), which underpins practically every modern generative AI model: LLMs, image, code and multimodal models.
Its key innovation is the attention mechanism: at each step, the model dynamically weighs the relevance of each input element against the others, without depending on a sequential traversal like the older RNNs/LSTMs. This enables massive parallelism during training.
GPT, BERT, Llama, Claude, Gemini, Mistral, Stable Diffusion are all built on transformer variants. Alternative architectures (Mamba, RWKV, state space models) are emerging to address its quadratic cost in context length, but the transformer remains dominant in 2026.
