PROJECTS

Prospects for incorporating social media content into social research

There has been considerable enthusiasm about exploiting social media content for conducting social research for at least a decade. Social media is potentially timelier and less expensive than traditional survey measures and especially intriguing as the relevance of surveys has been called into question due to low participation rates. This project, conducted with Johann Gagnon-Bartsch (Assistant Professor, Statistics) and Robyn Ferg (Doctoral Candidate, Statistics) is investigating the extent to which the sentiment of tweets containing certain keywords tell the same story as responses to certain survey questions. We have primarily focused on the possible relationship between tweets containing the word “jobs” and consumer sentiment as measured in the Surveys of Consumers (SCA) because earlier research found a reasonably high correlation between the two through 2009 but not beyond about 2011. We have taken a number of measures to see if the relationship continues to exist but is concealed by changes in the content of “jobs tweets,” or perhaps because the wrong measure of association has been used (correlation versus a measure less sensitive to outliers such as co-movement). None of these efforts has succeeded which has led us to investigation whether the earlier reported relationship might in fact have been spurious. The evidence points in that direction: sorting jobs tweets into meaningful categories based on their content such as “personal” (my job), “news and politics” (e.g., unemployment statistics), “advertisements” (primarily from #tweetmyjobs), “junk” (e.g., tweets about “Steve Jobs” or Apple more generally, “nut jobs,” jobs of a sexual nature), and “other” (links to articles or lists posted online) does not strengthen the correlations between tweets within these categories and consumer sentiment. In fact the highest correlation is between sentiment of tweets from the Junk category and consumer sentiment. Other analyses are consistent with the idea that comparing the sentiment of tweets containing certain key words to survey responses concerning a plausibly related topic is not a promising approach. Instead finding a way to weight the tweets based on inferred user characteristics to more closely resemble the US adult population (as is represented by the SCA sample), and taking into account how the psychology of survey response and social media posting differ may enable social media – specifically tweets – to be used to supplement if not replace survey data.