The AI Snake Won't Eat Its Own Tail

The fear is that AI-generated content poisons future training data. The fear is wrong. The real problem is worse.

There's a tidy hypothesis repeated at every AI panel: ChatGPT outputs flood the web, future models train on that flood, quality degrades, the snake eats its tail, the end. It's a clean story. It rhymes. It's also mostly wrong, and being wrong about it is causing people to ignore the actual problem.

Here's what the model-collapse worry gets right. If you train a generative model exclusively on earlier model outputs, with no human-generated grounding, the new model degrades. This is a real result. Shumailov, Shumaylov, Zhao et al. published The Curse of Recursion: Training on Generated Data Makes Models Forget in 2023, showing tail-of-distribution collapse over successive generations of model-on-model training. The paper is good. Mode collapse is real.

Now look at what the frontier labs actually do. They don't scrape AI outputs and train on them. They scrape the web, license data, pay humans to write high-quality content, and run synthetic data pipelines where a model generates problems and a different model or a verifier checks the work. The synthetic data used in training is curated, filtered, and graded — not pasted in raw. The cycle that would cause collapse requires a level of negligence the labs are not displaying.

The web isn't becoming AI-only. People keep writing. Engineers keep filing tickets, doctors keep dictating notes, Reddit keeps being Reddit. The new training corpora are a blend, not a monoculture. When you train on a blend with appropriate weighting, the distribution doesn't drift toward the synthetic mode — it drifts toward whatever's good, because the loss function rewards being good. That's the whole point of a loss function.

So the snake-eats-tail story is a fun graph for a TED talk, and it's not what's going to break AI.

Here's what is.

The real problem is that the web is becoming worse to read, not worse to train on. Google's search results are full of regenerated SEO sludge that ranks because it answers the question shallowly and fast. Bloggers who used to write thoughtfully are now feeding a wordcount target into an LLM at 3am. Recipes are 1500 words of "growing up in Tuscany" before three lines of "boil water." The signal-to-noise ratio for humans browsing the web is collapsing, fast, and that's the part nobody at the AI labs has any incentive to fix.

The labs are fine. Their models will keep getting better. They have proprietary data, RLHF pipelines, and increasingly large in-house annotation teams. What's degrading is the open commons — the thing that made the web worth indexing in the first place. The doomsday isn't that the models eat themselves. It's that the open web stops being a place worth reading, because the cheapest possible writer is no longer human.

Directional estimate. Sources include the Originality.AI study of top-ranked Google results and Graphite's 2025 analysis. Definitions of "AI-generated" vary.

That has a few consequences worth taking seriously.

Original research, primary sources, and first-person operator writing become more valuable, not less. If you build software, ship real systems, debug real production incidents — write about it. There is now infinite content describing how things should work, and almost none describing what actually broke and how someone fixed it. Be the second kind.

Curation becomes a product category. The reason newsletters are having a moment is that a human picking ten interesting things is now cheaper to read than the algorithm's hundred. Substack, hand-curated link lists, recommendation graphs built on actual trust — these are growing because the open web stopped being browsable. This is a market signal, not nostalgia.

Search will keep degrading until something replaces it. Google has the wrong shape of incentive to fix this. The fix is going to come from somewhere else — maybe Perplexity, maybe something not yet built, maybe a model that just refuses to surface low-effort regenerated content. Whatever it is, the company that solves "show me content actually written by a human who has done the thing" wins a lot.

So no, the snake isn't going to eat itself. The training data is fine. The labs are fine. The thing that's breaking is the part of the internet you and I used to enjoy reading on a Sunday morning, and the people pretending otherwise are looking in the wrong direction.

Worry about the right thing.