AI Basics

Alignment

Making AI do what we actually want

TL;DR

The challenge of making sure AI does what humans actually MEANT, not just what they technically said. Like the monkey's paw — you got your wish, just not the way you wanted.

The Plain English Version

Ever given someone instructions and they did EXACTLY what you asked — but it was completely wrong? "Clean up the living room" and they shoved everything into a closet? Technically they cleaned up. But that's not what you meant.

That's the alignment problem in AI, and it's one of the biggest challenges in the field. How do you make sure an AI system does what you actually WANT, not just a technically correct interpretation of what you said? Tell an AI to "maximize user engagement" and it might learn that outrage keeps people scrolling — so it starts showing increasingly extreme content. It did what you asked. It just ruined society in the process.

The stakes go up as AI gets more powerful. A dumb AI that's misaligned is just annoying. A superintelligent AI that's misaligned is potentially catastrophic. That's why some of the smartest people in AI spend their careers on this problem — not building new capabilities, but making sure existing ones actually serve human values.

Why Should You Care?

Because alignment determines whether AI is a tool that helps humanity or one that accidentally harms it. Every time a social media algorithm radicalizes someone, or a chatbot gives dangerous medical advice, or an AI system discriminates — that's an alignment failure. Understanding alignment helps you think critically about who's building AI and what values they're building into it.

The Nerd Version (if you dare)

AI alignment research focuses on ensuring AI systems' objectives and behaviors match human intentions and values. Techniques include RLHF (Reinforcement Learning from Human Feedback), constitutional AI, interpretability research, red-teaming, and scalable oversight. Key challenges include Goodhart's Law (optimizing for a proxy metric corrupts the metric), specification gaming, reward hacking, and the difficulty of formally specifying human values. Leading research organizations include Anthropic, OpenAI's alignment team, DeepMind's safety team, and MIRI.

Related terms

AGI Bias Hallucination

Like this? Get one every week.

Every Tuesday, one AI concept explained in plain English. Free forever.

Want all 75 terms in one PDF? Grab the SpeakNerd Cheat Sheet — $9