Tuesday, June 17, 2025
HomeFuture NewsChatGPT Has Already Polluted the Internet So Badly That It's Hobbling Future...

ChatGPT Has Already Polluted the Internet So Badly That It’s Hobbling Future AI Development

Published on

spot_img


The rapid rise of ChatGPT — and the cavalcade of competitors’ generative models that followed suit — has polluted the internet with so much useless slop that it’s already kneecapping the development of future AI models.

As the AI-generated data clouds the human creations that these models are so heavily dependent on amalgamating, it becomes inevitable that a greater share of what these so-called intelligences learn from and imitate is itself an ersatz AI creation. 

Repeat this process enough, and AI development begins to resemble a maximalist game of telephone in which not only is the quality of the content being produced diminished, resembling less and less what it’s originally supposed to be replacing, but in which the participants actively become stupider. The industry likes to describe this scenario as AI “model collapse.”

As a consequence, the finite amount of data predating ChatGPT’s rise becomes extremely valuable. In a new feature, The Register likens this to the demand for “low-background steel,” or steel that was produced before the detonation of the first nuclear bombs, starting in July 1945 with the US’s Trinity test. 

Just as the explosion of AI chatbots has irreversibly polluted the internet, so did the detonation of the atom bomb release radionuclides and other particulates that have seeped into virtually all steel produced thereafter. That makes modern metals unsuitable for use in some highly sensitive scientific and medical equipment. And so, what’s old is new: a major source of low-background steel, even today, is WW1 and WW2 era battleships, including a huge naval fleet that was scuttled by German Admiral Ludwig von Reuter in 1919.

Maurice Chiodo, a research associate at the Centre for the Study of Existential Risk at the University of Cambridge called the admiral’s actions the “greatest contribution to nuclear medicine in the world.” 

“That enabled us to have this almost infinite supply of low-background steel. If it weren’t for that, we’d be kind of stuck,” he told The Register. “So the analogy works here because you need something that happened before a certain date.”

“But if you’re collecting data before 2022 you’re fairly confident that it has minimal, if any, contamination from generative AI,” he added. “Everything before the date is ‘safe, fine, clean,’ everything after that is ‘dirty.'”

In 2024, Chiodo co-authored a paper arguing that there needs to be a source of “clean” data not only to stave off model collapse, but to ensure fair competition between AI developers. Otherwise, the early pioneers of the tech, after ruining the internet for everyone else with their AI’s refuse, would boast a massive advantage by being the only ones that benefited from a purer source of training data.

Whether model collapse, particularly as a result of contaminated data, is an imminent threat is a matter of some debate. But many researchers have been sounding the alarm for years now, including Chiodo.

“Now, it’s not clear to what extent model collapse will be a problem, but if it is a problem, and we’ve contaminated this data environment, cleaning is going to be prohibitively expensive, probably impossible,” he told The Register

One area where the issue has already reared its head is with the technique called retrieval-augmented generation (RAG), which AI models use to supplement their dated training data with information pulled from the internet in real-time. But this new data isn’t guaranteed to be free of AI tampering, and some research has shown that this results in the chatbots producing far more “unsafe” responses. 

The dilemma is also reflective of the broader debate around scaling, or improving AI models by adding more data and processing power. After OpenAI and other developers reported diminishing returns with their newest models in late 2024, some experts proclaimed that scaling had hit a “wall.” And if that data is increasingly slop-laden, the wall would become that much more impassable.

Chiodo speculates that stronger regulations like labeling AI content could help “clean up” some of this pollution, but this would be difficult to enforce. In this regard, the AI industry, which has cried foul at any government interference, may be its own worst enemy.

“Currently we are in a first phase of regulation where we are shying away a bit from regulation because we think we have to be innovative,” Rupprecht Podszun, professor of civil and competition law at Heinrich Heine University Düsseldorf, who co-authored the 2024 paper with Chiodo, told The Register. “And this is very typical for whatever innovation we come up with. So AI is the big thing, let it go and fine.”

More on AI: Sam Altman Says “Significant Fraction” of Earth’s Total Electricity Should Go to Running AI



Source link

Latest articles

Gemini 2.5 model family expands

We designed Gemini 2.5 to be a family of hybrid reasoning models that...

As Rumors Fly, Elon Musk’s Posts “Results” of Drug Test for PCP, Benzos and More

Late last month, the New York Times published an eyebrow-raising story in which...

Qodo teams up with Google Cloud, to provide devs with FREE AI code review tools directly within platform

Join the event trusted by enterprise leaders for nearly two decades. VB Transform...

More like this

Gemini 2.5 model family expands

We designed Gemini 2.5 to be a family of hybrid reasoning models that...

As Rumors Fly, Elon Musk’s Posts “Results” of Drug Test for PCP, Benzos and More

Late last month, the New York Times published an eyebrow-raising story in which...

Qodo teams up with Google Cloud, to provide devs with FREE AI code review tools directly within platform

Join the event trusted by enterprise leaders for nearly two decades. VB Transform...