Social media has supercharged the spread of information—and misinformation, which presents significant challenges when trying to distinguish between fact and fiction on social media platforms like Twitter.
One of the most prolific, widely shared, and highly scrutinized Twitter accounts of the past several years belonged to former U.S. President Donald Trump. In the final year of his presidency, Trump tweeted, on average, more than 33 times each day. These tweets ranged from easily verifiable statements of fact to comments that were demonstrably false.
The sheer volume of Trump’s social media record and its thorough analysis by fact checkers allowed a team of researchers to conduct a unique comparison of his word choices when he shared either true or false information.
The results of this study, published in the journal Psychological Science, show that Trump’s word choices differed in clear and predictable ways when he shared information that he knew to be factually incorrect. Van der Zee and her colleagues then used this information to create a model to predict whether a single tweet was factually correct or incorrect. Similar personalized linguistic models may eventually help detect lies in other real-world settings.
“We created a personalized language model that could predict which statements from the former president were correct and which potentially deceitful,” said Sophie van der Zee, a researcher at Erasmus University in Rotterdam and first author on the paper. “His language was so consistent that in about three quarters of the cases, our model could correctly predict if Trump’s tweets were factual or not based solely on his word use.”
For their analysis, the researchers collected two separate data sets, each containing 3 months’ worth of presidential tweets sent by the @realDonaldTrump Twitter account. The researchers then cross-referenced these data sets with a fact-checked data set of tweets from the Washington Post to determine if a tweet was correct or incorrect.
To avoid data pollution, the researchers removed all tweets that did not reflect Trump’s own language use (e.g., retweets, long quotes).
The first data set revealed large differences in language use between Trump’s factually correct and incorrect tweets. Van der Zee and her colleagues then used this information to create a model to predict whether an individual tweet was factual.
“Using this model, we could predict how truthful Trump was in three out of four tweets,” said van der Zee. “We also compared our new personalized model with other similar detection models and found it outperformed them by at least 5 percentage points.”
Given these results, the researchers speculate that their personalized model could help distinguish fact from fiction in Trump’s future communications. Similar models could also be made for other politicians who are systematically fact-checked.
“Our paper also constitutes a warning for all people sharing information online,” said van der Zee. “It was already known that information people post online can be used against them. We now show, using only publicly available data, that the words people use when sharing information online can reveal sensitive information about the sender, including an indication of their trustworthiness.”
Article Title
A personal model of Trumpery: Linguistic deception detection in a real-world high-stakes setting