Home News Emergence of ‘Model Collapse’ Highlights Need for AI Data Governance

Emergence of ‘Model Collapse’ Highlights Need for AI Data Governance

21 June, 2023

182

Emergence of ‘Model Collapse’ Highlights Need for AI Data Governance
BY PYMNTS | JUNE 21, 2023
|
artificial intelligence
Despite its alleged “societal-scale” risk, artificial intelligence’s (AI’s) biggest threat may be to itself.

Not to humanity.

Copies of copies generally tend to get worse — and if the training data used to power future generative AI engines, including large language models (LLMs), gaussian mixture models (GMMs) and variational autoencoders (VAE), continues to be scraped from the internet, it will inevitably come to be trained on content that was produced by today’s generative AI tools.

And that isn’t good news for the reliability and usability of those AI models.

A group of leading academic researchers has published a paper entitled “The Curse of Recursion: Training on Generated Data Makes Models Forget” showing that the content generated by various AI models becomes progressively degraded and loses intelligibility when successively trained on learning data produced by other models.

“[T]he use of LLMs at scale to publish content on the internet will pollute the collection of data to train them,” the paper stated.

“Furthermore, we show that this process is inevitable,” it added.

The researchers have termed this phenomenon “model collapse.”

It holds far-reaching implications for organizations attempting to leverage generative AI’s revolutionary business capabilities. It also underscores the importance of data sovereignty and future-fit governance processes as they relate to AI-powered integrations across corporate operations.

See also: 10 Insiders on Generative AI’s Impact Across the Enterprise

The Danger of Dubious Data Foundations
As the research paper noted, nearly all the material stored online was originally produced and curated by humans.

But the internet revolutionized the way information was able to be shared, and it created new modes of communication, allowing text to be analyzed, modified and surfaced by search platforms and other layered-on solutions.

Now, generative AI, so called because of its ability to independently produce and generate content, is building another information layer atop the online landscape.

“At a high level, generative AI has the potential to create a new data layer, like when HTTP was created and gave rise to the internet beginning in the 1990s. As with any new data layer or protocol, governance, rules and standards must apply,” Shaunt Sarkissian, founder and CEO of AI-ID, told PYMNTS last month.

At the center of many business use concerns around the integration of generative AI solutions lies ongoing questions around the integrity of data and information fed to the AI models, as well as the provenance and security of those data inputs.

The fact that AI models trained on data produced by other AI models has been shown to result in degenerative processes, where over time the models forget the true underlying data distribution and produce increasingly simplistic outputs, only serves to emphasize the importance of using good data when developing AI tools meant to be used in an enterprise setting.

After all, achieving the type of competitive advantage in today’s operating environment that can translate to sustainable business success frequently boils down to a firm’s ability to access and leverage best-in-class data.

Read also: Preparing for a Generative AI World

First Mover Advantage
“We’re about to fill the internet with blah,” wrote Ross Anderson, one of the Cambridge scientists behind the research paper and the founder of the Foundation for Information Policy Research, on his personal blog. “This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that…”

“LLMs are like fire — a useful tool, but one that pollutes the environment,” Anderson added. “How will we cope with it?”

To avoid model collapse, access to genuine human-generated content is essential, as are effective governance standards that provide a firm go-forward infrastructure for AI training.

“The future [of AI model development] will be … more of a continual dance where there is active learning and reinforcement learning where multiple, highly-trained experts are part of the workflow to continually improve a model,” Erik Duhaime, co-founder and CEO of Centaur Labs, told PYMNTS in May.

The world is now at a tipping point where the sweeping digitization of the business ecosystem has gifted firms with untold terabytes of proprietary data about their customers and their operations, providing a fertile foundation for AI models that avoids the need to scrape the internet for inputs.

But before looking to tap wholly-owned company data whose provenance is never in question, businesses must ensure first that they have the appropriate data infrastructure in place to propel their business processes into the 21st century while avoiding its foundational pitfalls.

His Highness awards the Ambassador of Tajikistan the Order of Al-Wajbah

There’s a careful plan behind Xi’s European tour

Israel tried to lure Iran. Here’s why it failed

‘We are in danger because of NATO membership’: Turkish Patriotic party…

04 April 2024 Ministry of Justice: New Real Estate Registration Law…

Qatar Visa Processing Time 2024: When Can You Expect to Receive…

Qatar Work Visa Eligibility

This AI startup went from zero to $1 billion in two…

04 April 2024 Ministry of Justice: New Real Estate Registration Law…

Qatar Visa Processing Time 2024: When Can You Expect to Receive…

Qatar Work Visa Eligibility

Getting a work visa in Qatar

LEAVE A REPLY Cancel reply

Popular Categories

Famous People

Visa sponsorship jobs in Qatar for foreigners 2023/2024 | Apply Here

Business Visas In Qatar: What You Need To Know

Qatar- Workers without health card too will receive free Covid-19 care

Delayed ‘Friends’ reunion expected to film in March, Matthew Perry says

Qatar Visa Information

Calculator about the Salary and Employment Benefits in Doha Qatar???latest update

How to change job in Qatar without noc ? latest #updatesQatar 2022

know the top 5 mega projects underway in Qatar

Qatar Family Visa, Family Visit Visa Qatar 2023

Qatar Visa Sponsorship Jobs 2023 | Apply Now

Check Qatar Visa By Passport Number Real/Fake || Hindi/Udu || Qatar || Gulf Life? Video explaining

Working in Qatar: How to Calculate Your Overtime Pay?2021Overtime calculation formula in Qatar

Aspetar experts present their research and expertise at one of the largest sports medicine conferences in the world

Employing in #Qatar: What You Need to Know?

When will Nationals and residents be allowed back into Qatar?#welcomeqatar.com latest news worldwide 2020