The Secret Life of Pronouns by James Pennebaker is on my ‘must read as soon as possible’ list. It promises to tell me what those least dramatic of words – pronouns and words that cement our sentences or utterances together – reveal about our personalities and social connections. Its method is one that is coming of age as Google digitises the world’s books and we constantly document our every movement on Twitter and Facebook.
The researchers count words, categorise them into dictionaries and then ask volunteers to suggest emotional values for each. It’s an enormous task enabled by software that aggregates the content and then counts the words. It’s also a team effort as Pennebaker recognises when he details the lengthy development of the software LIWC (Linguistic Inquiry and Word Count) by his research assistant. Once in place though, this particular programme can process enormous amounts of input and detail the degree to which people use different categories of words.
Here’s a simple example from a side-shoot of LIWC, the free to use Analyze Words. It gives a flavour of what the full-blown programme can (presumably) do.
AnalyzeWords helps reveal your personality by looking at how you use words. It is based on good scientific research connecting word use to who people are. So go to town – enter your Twitter name or the handles of friends, lovers, or Hollywood celebrities to learn about their emotions, social styles, and the ways they think.
My own tweets are insufficiently prodigious to offer any results (I must be quiet and reserved!) so here is the result of the programme analysing the tweets of Tweeter extraordinaire, Stephen Fry:
You can try it out yourself and can, by doing so, decide how accurate you think the programme is. I need to look into it in greater detail (which I will when I read the book), but one of the problems that I already have is understanding what might be meant by the key words describing emotional states – what does upbeat actually signify? Being worried is an ‘average’ state? And surely depressed is a complex syndrome on a wide continuum from mildly brassed off to clinically catatonic.
The other problem with research using these kinds of methods is the often over-generalised ways in which they are reported. To be fair, this happens more often when the research is picked up by the media and made into a ‘news’ item. But there is I think a natural temptation when dealing with such enormous data sets (really enormous) to think they are more representative than they really are.
Here’s a study by Isabel Klouman reported in Wired Science in August 2011 which sought an answer to the question of whether postive ideas are spread more quickly than negative ideas, or vice versa.
the researchers decided to approach the question with overwhelming mathematical force. They analyzed four enormous textual databases — 361 billion words in 3.29 million books on Google Books, 9 billion words in 821 million tweets issued between 2008 and 2010, 1 billion words in 1.8 million New York Times articles published from 1987 to 2007, and 58.6 million words from the lyrics of 295,000 popular songs — and compiled for each a list of the 5,000 most-used words.
This produced a list of 10,122 words. The researchers then used Amazon’s Mechanical Turk labor-outsourcing service to obtain 50 separate evaluations of each word, which were scored from negative to positive on a scale of 1 to 9. (“Terrorist,” for example, received an average score of 1.30, while “laughter” merited an 8.50, the highest of any word.)
Altogether, positive-inflected words outnumbered the negative, and were used more frequently. The findings “suggest that a positivity bias is universal,” wrote Klouman and colleagues. “In our stories and writings we tend toward pro-social communication.”
Whilst fascinating, I’m not sure whether this really is the emotional pulse of the planet. Those producing such content as counted in the databases are a relatively small group. Even with the so-called democratisation of user-generated content with Web 2.0, not that many people in relation to populations as a whole, are actively productive.
In general, only 21 percent of respondents [of a recent study into user-generated content] produced some type of online content daily or weekly, 31 percent produced content on a monthly basis or less, and 45 percent said they had never produced any. These low numbers are not very surprising, as previous studies have shown that user-generated content online is produced by a relatively small group. Reported by Aleks Krotosky.
So, perhaps the pulse of the writers … in English. Still interesting but not quite as earth-shattering.