Transformer

TechConcept

The transformer is the neural-network architecture introduced by Google in the paper Attention Is All You Need (2017), which underpins practically every modern generative AI model: LLMs, image, code and multimodal model…

The transformer is the neural-network architecture introduced by Google in the paper Attention Is All You Need (2017), which underpins practically every modern generative AI model: LLMs, image, code and multimodal models.

Its key innovation is the attention mechanism: at each step, the model dynamically weighs the relevance of each input element against the others, without depending on a sequential traversal like the older RNNs/LSTMs. This enables massive parallelism during training.

GPT, BERT, Llama, Claude, Gemini, Mistral, Stable Diffusion are all built on transformer variants. Alternative architectures (Mamba, RWKV, state space models) are emerging to address its quadratic cost in context length, but the transformer remains dominant in 2026.

Related terms

Ready to find the missing piece of your team?

Let's talk about your hiring needs. A team member will get back to you quickly to qualify the brief and kick off the search.

Start a search I'm a candidate

Related terms

Claude

Gemini

GPT

LLM

Mistral

Multimodal

Ready to find the missing piece of your team?