Ask a Large language model (LLM) such as ChatGPT to summarize what people are saying about a topic, and although the model might summarize the facts efficiently, it might give a false impression of how people feel about the topic. LLMs play an increasingly large role in research, but rather than being a transparent window into the world, they can present and summarize content with a different tone and emphasis than the original data, potentially skewing research results. Yi Ding and colleagues compared a climate dataset of 18,896,054 tweets that mentioned "climate change" from January 2019 to December 2021 to rephrased tweets prepared by LLMs. The authors found that the LLM-rephrased tweets tend to display a more neutral sentiment than the original texts, a blunting effect that occurred irrespective of the prompts employed or the sophistication of the LLMs. A similar effect occurred when LLMs were asked to rephrase Amazon reviews. Possible mitigation strategies include using predictive models to retroactively adjust sentiment levels. According to the authors, if it is not known whether a text was written by a human or a LLM, it would be more useful to work with an LLM that has been fine-tuned not to blunt the emotional content it is summarizing.
Journal
PNAS Nexus
Article Title
Echoes of authenticity: Reclaiming human sentiment in the large language model era
Article Publication Date
25-Feb-2025