AI Basics

Parameter

The dials inside an AI's brain

TL;DR

The internal 'dials' that get tuned during training. GPT-4 has hundreds of billions of them. More parameters generally means smarter (and way more expensive).

The Plain English Version

Imagine a massive mixing board in a recording studio — thousands of dials and sliders that shape the sound. Each one controls something slightly different. Together, they determine what the final music sounds like. An AI model's parameters are those dials.

When an AI trains, it's essentially adjusting billions of these tiny numerical dials until the output sounds right. "Turn this one up a bit. Turn that one down." Each parameter captures a tiny piece of learned knowledge. One parameter alone means nothing, but billions of them working together can understand language, generate images, and write code.

The numbers are wild. GPT-3 had 175 billion parameters. GPT-4 is rumored to have over a trillion. Claude and other models are in the same ballpark. That's why training these models costs millions of dollars — you're tuning billions of dials across massive amounts of data, and that takes enormous computing power.

Why Should You Care?

Because parameter count is one of the first things people mention when comparing AI models, and it helps you understand the cost conversation. More parameters generally means more capable (up to a point) but also more expensive to train and run. When a company says "we trained a 70 billion parameter model," you now know that means 70 billion internal dials that were tuned to capture knowledge.

The Nerd Version (if you dare)

Parameters are the learnable weights and biases in a neural network, adjusted during training via backpropagation and gradient descent to minimize the loss function. In transformer models, parameters exist in embedding layers, attention weight matrices (Q, K, V projections), feed-forward layers, and layer normalization. Scaling laws (Chinchilla) suggest optimal compute allocation between parameter count and training tokens. Parameter-efficient fine-tuning methods (LoRA, QLoRA, adapters) modify a small subset of parameters for downstream tasks.

Related terms

LLM Model Training Data

Like this? Get one every week.

Every Tuesday, one AI concept explained in plain English. Free forever.

Want all 75 terms in one PDF? Grab the SpeakNerd Cheat Sheet — $9