News Release

Exploring the use of 'stretchable' words in social media

Analysis of 100 billion tweets provides new insights into linguistic patterns

Peer-Reviewed Publication

PLOS

Exploring the Use of "Stretchable" Words in Social Media

image: The tree of laughter. This spelling tree for stretched versions of the word 'ha' shows many of the different ways these words get spelled as they get stretched. The patterns of the tree represent the spellings of the words, with the initial 'h' at the root, and the following letters branching right for an 'a' and left for an 'h'. Thicker paths represent more dominant patterns, with many words stopping at an internal node after a few branchings. A few of the longer patterns reaching a terminal node are annotated with stars. The inset plot shows how frequent different stretched versions of 'ha' are based on how long they are stretched. A few points are annotated with example stretched versions of that length, but the point represents all stretched versions of that length. Points for an even number of characters tend to be higher because of the tendency to perfectly alternate 'h' and 'a' as in 'hahaha...'. view more 

Credit: Gray et al, 2020

An investigation of Twitter messages reveals new insights and tools for studying how people use stretched words, such as "duuuuude," "heyyyyy," or "noooooooo." Tyler Gray and colleagues at the University of Vermont in Burlington present these findings in the open-access journal PLOS ONE on May 27, 2020.

In spoken and written language, stretched words can modify the meaning of a word. For instance, "suuuuure" can imply sarcasm, while "yeeessss" may indicate excitement. Stretched words are rare in formal writing, but the rise of social media has opened up new opportunities to study them.

Gray and colleagues have now completed the most comprehensive study to date of "stretchable" words in social media. They developed a new, more thorough strategy for identifying stretched words in tweets and used it to analyze a randomly selected dataset of about 10 percent of all tweets generated between September 2008 and December 2016--totaling about 100 billion tweets.

The researchers identified thousands of "stretchable" words in the tweets, including "ha" (e.g., "hahaha" or "haaahaha"), "awesome" (e.g., "awesssssommmmmeeeeee") and "goal) (e.g., ggggoooooaaaaallllll).

They also identified two key ways of measuring the characteristics of stretchable words: balance and stretch. Balance refers to the degree to which different letters tend to be repeated. For instance, "ha" has a high degree of balance because when it is stretched, the "h" and the "a" tend to be repeated just about equally. "Goal" is less balanced, with "o" repeated more than any other letter in the word.

Stretch refers to how long a word tends to be stretched. For instance, short words or sounds like "ha" have a high degree of stretch because people often repeat them many times (e.g., "hahahahahahahaha"). Meanwhile, regular words like "infinity" have lower stretch, often with just one letter repeated: "infinityyyy."

For this analysis, the researchers developed various tools and methods that could be used in future research of stretchable words, such as investigations of mis-typings and misspellings. The tools could also be applied to improve natural language processing, search engines, and spam filters

The authors add: "We were able to comprehensively collect and count stretched words like 'gooooooaaaalll' and 'hahahaha', and map them across the two dimensions of overall stretchiness and balance of stretch, while developing new tools that will also aid in their continued linguistic study, and in other areas, such as language processing, augmenting dictionaries, improving search engines, analyzing the construction of sequences, and more."

###

Citation: Gray TJ, Danforth CM, Dodds PS (2020) Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings. PLoS ONE 15(5): e0232938. https://doi.org/10.1371/journal.pone.0232938

Funding: CMD and PSD were supported by National Science Foundation Grant Number IIS-1447634, and TJG, CMD, and PSD were supported by a gift from MassMutual. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: We have the following interests: TJG, CMD, and PSD were supported by a gift from MassMutual. There are no patents, products in development, or marketed products to declare. This does not alter our adherence to all of the PLOS ONE policies on sharing data and materials.

In your coverage please use this URL to provide access to the freely available article in PLOS ONE: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0232938


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.