[Vienna, September 25 2024] — A new study published in PNAS Nexus reveals that the widespread adoption of large language models (LLMs), such as ChatGPT, has led to a significant decline in public knowledge sharing on platforms like Stack Overflow. The study highlights a 25% reduction in user activity on the popular programming Q&A site within six months of ChatGPT's release, relative to similar platforms where access to ChatGPT is restricted.
“LLMs are so powerful, have such a high value, and make a huge impact on the world. One begins to wonder about their future,” says first author Maria del Rio-Chanona, an associate faculty member at the Complexity Science Hub (CSH).
“Our study hypothesized that instead of posting questions and receiving answers on public platforms like Stack Overflow, where everybody can see them and learn from them, people are asking privately on ChatGPT instead. However, LLMs like ChatGPT are also trained on this open and public data, which they are replacing in some way. So what's going to happen?,” adds Del Rio-Chanona, who’s also an assistant professor at University College London, an associate researcher at the Institute for New Economic Thinking at the Oxford Martin School, and the Bennett Institute for Public Policy, University of Cambridge.
Implications are Major
“In our findings, we noticed less and less questions and answers on Stack Overflow after ChatGPT was released. This has quite big implications. This means there may not be enough public data to train models in the future” warns Del Rio-Chanona. In this study, she worked together with Nadzeya Laurentsyeva, from Ludwig Maximilian University of Munich; and Johannes Wachs, faculty member at CSH and professor at Corvinus University in Budapest.
“Stack Overflow is an immensely valuable knowledge database accessible to anyone with an internet connection. People all over the world learn from questions and answers that other people post,” says Wachs. In fact, even AI models like ChatGPT are trained on human generated content like Stack Overflow posts. Ironically, the displacement of human content creation by AI will make it more difficult to train future AI models. Using data generated by AI to train new models is generally thought to perform poorly, a process likened to making a photocopy of a photocopy.
A Shift from Public to Private
The findings also point out scenarios that go beyond mere technological changes to touch the fabric of our economic and social structures as well. Users may become less inclined to contribute to open knowledge platforms as they interact more with LLMs like ChatGPT, resulting in valuable data being transferred from public repositories to privately-owned AI systems, explain Del Rio-Chanona and colleagues.
“This represents a significant shift of knowledge from public to private domains,” argue the researchers. According to them, this could also deepen the competitive advantage of early movers in AI, further concentrating knowledge and economic power.
All experience and quality levels
Del Rio-Chanona and her colleagues found that the decline in content creation on Stack Overflow affected users of all experience levels, from novices to experts. They also observed that the quality of posts did not decrease significantly, as measured by user feedback, indicating that both low and high quality contributions are being displaced by LLMs.
In addition, the study showed that posting activity in some programming languages, such as Python and Javascript, dropped significantly more than the platform’s average. “The results suggest that people are indeed asking questions about Python and Javascript, two of the most commonly used programming languages, on ChatGPT rather than Stack Overflow,” says Del Rio-Chanona.
About the Study
This research, titled "Large Language Models Reduce Public Knowledge Sharing on Online Q&A Platforms," by R Maria del Rio-Chanona, Nadzeya Laurentsyeva, and Johannes Wachs, was published in PNAS Nexus and is available online.
About CSH
The Complexity Science Hub (CSH) is Europe’s research center for the study of complex systems. We derive meaning from data from a range of disciplines — economics, medicine, ecology, and the social sciences — as a basis for actionable solutions for a better world. Established in 2015, we have grown to over 70 researchers, driven by the increasing demand to gain a genuine understanding of the networks that underlie society, from healthcare to supply chains. Through our complexity science approaches linking physics, mathematics, and computational modeling with data and network science, we develop the capacity to address today’s and tomorrow’s challenges.
Journal
PNAS Nexus
Method of Research
Data/statistical analysis
Subject of Research
People
Article Title
Large language models reduce public knowledge sharing on online Q&A platforms
Article Publication Date
11-Sep-2024
COI Statement
The authors declare no competing interests.