AI Basics

Transformer

The architecture behind modern AI

TL;DR

Not the movie robots. It's the math breakthrough from 2017 that made ChatGPT, Claude, and basically all modern AI possible. The engine inside the car.

The Plain English Version

In 2017, a team at Google published a paper called "Attention Is All You Need." Sounds like a Beatles song, but it quietly changed the world. They invented the transformer — a new way for AI to process information — and everything since has been built on top of it.

Before transformers, AI read text like you'd read through a tiny keyhole — one word at a time, left to right, slowly losing track of what came before. Transformers changed the game by letting AI look at ALL the words at once and figure out which ones matter most for understanding the meaning. It's like the difference between reading a book one word at a time versus being able to see the whole page and instantly know which sentences are important.

The T in GPT stands for Transformer. Claude uses transformers. Google's AI uses transformers. Pretty much every impressive AI you've seen in the last few years? Transformers. It's the single most important architectural invention in modern AI — the equivalent of the internal combustion engine for cars.

Why Should You Care?

Because when people say "AI got good suddenly," this is why. One architectural breakthrough in 2017 unlocked everything — chatbots, image generators, code assistants, translation, all of it. Understanding transformers means understanding why AI seemed to go from "meh" to "holy crap" seemingly overnight.

The Nerd Version (if you dare)

Transformers use self-attention mechanisms to compute relationships between all positions in a sequence simultaneously, avoiding the sequential processing bottleneck of RNNs and LSTMs. The architecture consists of encoder and decoder stacks with multi-head attention, positional encoding, and feed-forward layers. Variants include encoder-only (BERT), decoder-only (GPT), and encoder-decoder (T5) models. Scaling laws have shown predictable performance improvements with increased parameters and training data.

Related terms

GPT LLM Neural Network

Like this? Get one every week.

Every Tuesday, one AI concept explained in plain English. Free forever.

Want all 75 terms in one PDF? Grab the SpeakNerd Cheat Sheet — $9