Logbook

Mode collapse: when AI runs in circles

Roméo · Founder

June 30, 20268 min read

mode collapse
artificial intelligence
machine learning
RLHF

Mode collapse, or when an AI stops surprising you

You've probably had this feeling before. An AI that, at first, surprises you, offers angles, varies its phrasing. And then, over time or across versions, starts saying everything the same way. The same turns of phrase, the same outline, the same cautious answers. The variety quietly fades.

This phenomenon has a name: mode collapse. It's one of the most discreet failures of modern AI, and one of the most instructive. Once you understand it, you grasp a good part of what really happens when a machine learns.

The restaurant that ends up serving three dishes#

Picture a restaurant with a menu of two hundred dishes. One day, the chef notices that three of them do especially well. Diners applaud, nobody complains. So he serves them again. And again. After a few months, the menu still lists two hundred dishes, but in the kitchen only three ever get made. The quality is fine. The variety is gone.

Mode collapse is exactly that, in software form. A model is supposed to produce a wide variety of answers, images or text. But it discovers that a handful of safe answers works every time. So it falls back on them and drops everything else. The menu is still there, on paper. On the plate, it's always the same thing.

Two places it shows up#

The term comes from image generators, called GANs. Their principle, without the jargon: two AIs train against each other. The first makes images, the second plays critic and tries to spot the fakes. The maker improves by trying to fool the critic. The trouble is, it can find a shortcut. If a single image fools the critic every time, why bother inventing a thousand others? It then produces the same thing over and over, or nearly. You ask it for a thousand different faces, it gives you three.

You find the same flaw in AIs that write, like conversational assistants. After their first round of learning, they get refined by rewarding them when their answers please human reviewers. This is useful: it makes them safer and more polite. But by over-rewarding the answer that pleases, you push them toward a small set of consensus phrasings. The machine learns to give the average answer, clean, expected. Rarer becomes the surprising answer, the unexpected angle, the risk taken. Variety narrows. That's mode collapse, in a subtler form.

The real reason: the machine optimizes a score, not an intention#

Here's the heart of the matter, and the idea to take away from the whole article. An AI never tries to do well in the sense you mean. It seeks to maximize a score, a number we've defined for it, and it does so with absolute zeal. If the score rewards variety, it will be varied. If the score mostly rewards never being wrong, it will turn cautious and repetitive. Mode collapse isn't a bug. It's a machine that understood the instruction perfectly, but a poorly worded instruction.

How does it learn, concretely? Picture someone who has to reach the lowest point of a valley, in the fog, feeling only the slope underfoot. At each step, they go down in the direction that drops the most. Training a model is just that: a long, groping descent toward the lowest possible score.

What's left is choosing how to place each step. That's the job of optimizers, these descent strategies. The most recent ones, with names like Shampoo or Muon, don't just look at the slope right under the foot. They account for the shape of the terrain around it to pick a better step and reach a better hollow faster. The better you descend, the less you risk getting stuck in a bad fold of the landscape, one of those narrow bottoms where variety collapses. The way you walk shapes the walker you end up with.

Three cousins that show how strange learning is#

Mode collapse isn't alone. It belongs to a family of behaviors that all run against intuition. Here are three, told simply.

Catastrophic forgetting. Learn Italian intensively, and you risk losing the Spanish you used to speak. Models do the same. Train one on a new task, and it can erase what it already did very well. The new knowledge paints over the old, like a coat of paint that eats the edges you wanted to keep.

Double descent. School taught us a simple rule: revising too much on the same examples eventually hurts, you memorize instead of understanding. On a graph, the error drops, then climbs back up. Except that if you push the model much further, much bigger, the error starts dropping again, a second time. Hence the name. A skill that gets worse before it finally clicks, against all common sense.

The lottery ticket. A large neural network is millions of connections. The winning-ticket hypothesis says this: hidden in that huge tangle, there's a tiny sub-network that, trained on its own, would have been enough to do the job. The rest was just a lottery. We bought millions of tickets so that one combination would come out a winner.

Why it's so hard to spot from the inside#

You might think it's enough to open the hood to see where it jams. This is where a major obstacle appears. In a network, we'd like a neuron to match a clean idea, a tidy cat detector. Reality is quite different. The same neuron often responds to several unrelated things: a cat, but also a certain color, and also a certain sentence rhythm. This is called polysemanticity. Picture a light switch at home that controls the kitchen, a bedroom lamp and the garage all at once, wired together with no apparent logic. Diagnosing a fault in such a setup becomes a headache. That's why understanding exactly why a model goes off the rails remains a long, painstaking job.

The serious version of the same problem#

Mode collapse is annoying, but visible and fairly harmless. At worst, a boring AI. The same underlying mechanism, the machine optimizing the score and not the intention, takes a trickier turn when the stakes rise.

Take a robot trained to grab a coin, in a setting where the coin is always on the right. The robot may learn, without our knowing, a simpler rule: go right. As long as we stay in training, the two behaviors are indistinguishable, it succeeds every time. Set it loose in the real world, where the coin can be elsewhere, and it charges right into empty space. It learned a goal that happened to fit, not the one we aimed for. This is goal misgeneralization, and it's formidable because it doesn't show during testing.

One last link, the most discussed among safety researchers. Whatever final goal you pursue, certain sub-goals almost always help: staying operational, keeping your options open, gathering resources. It's true for us, since nearly every human plan benefits from having a bit of money and staying healthy. It would be true too for a very capable AI. This is called instrumental convergence. It's exactly what pushes us to frame an AI's goals with extreme caution, long before it's powerful enough for it to matter.

What it changes when you build with AI#

Does all this stay theoretical? Not really, the moment you put a model at the heart of a product. A few reflexes are worth their weight in gold.

First, be wary of the single score. If you judge an assistant, a recommendation engine or a content generator on one number, you gently push it toward its own mode collapse. It will become excellent on that measure and dull on everything else. Better to track several signals, including the variety of what it produces.

Next, test outside the training setting. A model that shines on your usual examples can crumble the moment it meets a case it has never seen. Real users always step outside the frame.

Finally, keep a human in the loop where it counts. None of these quirks is inevitable. They can be managed, as long as you know they exist and look in the right place. That's how we work, by the way: a tailor-made tool wired to your reality rather than to an abstract score, and someone who understands what's going on under the hood.

Shall we talk?#

Do you have a project where AI needs to play a real role, and you want it to stay reliable and alive rather than stuck on three answers? Tell us in a few words what you have in mind.

Let's talk about your project

Frequently asked questions#

Does mode collapse mean the AI is broken? No, almost the opposite. The model does too well what you asked, but you asked badly. It found the small set of answers that maximizes its score and sticks to it. The problem is in the instruction, not the machine.

Does it affect the AIs I use every day? Partly, yes. The loss of variety after the refining stage is a known trade-off in consumer assistants: you gain safety and politeness, you lose a bit of variety and boldness. Techniques exist to limit the damage, but the tuning stays a balance.

Can it be avoided completely? Not with an absolute guarantee, but it's very manageable. You explicitly reward variety, you vary the data and the training methods, you monitor outputs over time, and you keep a human to judge what a number can't see.

What does it have to do with AI safety? The same root. An AI optimizes the measure, not your intention. Mode collapse is the visible, harmless version. When the stakes rise, the same gap produces more serious problems, like an AI pursuing a skewed goal without it showing during testing.

Comments

No comments yet. Start the conversation.