ISR Awards

New Approaches to Analyzing Social Media Content for Enhancing Census Bureau Data

There has been substantial interest in whether and how analyses of social media data can add value to social research and the production of official statistics. This interest is based on the hope by some that survey findings can be augmented with social media content, while others hope that costs might be reduced and timeliness improved by replacing at least some survey data, e.g., some variables, data from some waves of longitudinal data collection, with social media analyses. Skeptics worry that the rationale for this approach succeeding is not evident and that purported alignment between analyses of social media posts and survey responses are not compelling. The proposed research has four objectives, each with associated research activities. The four objectives are:
1. Explore conditions under which alignment between surveys and social media are most likely to occur. The research activities will initially analyze the correspondence between answers to a single question on the Census tracking poll and Twitter posts as a calibration exercise, and then extend this to other questions from the tracking poll, an additional social media platform (Reddit), and other Census surveys that address other topics, e.g., the American Community Survey.
2. Mine social media for qualitative insights. To address this objective, the researchers will apply Natural Language Processing techniques such as Topic Modelling to social media corpora to explore the promise of this content for use in developing survey instruments much as focus groups are currently used, e.g., to identify vocabulary used by target groups, but automated and on a much larger scale.
3. Improve statistical products by including social media analytics. Much as political scientists have shown that incorporating certain variable derived from social media use into survey data sets can improve the predictive ability of models, so we will explore the possibility of providing variables derived from social media in Census statistical products.
4. Exploit the interconnections in social media. In most attempts so far to identify relationships between social media posts and survey data, the posts are treated as independent texts roughly analogous to survey responses. Yet social media posts are at least potentially social in the sense that they are threaded exchanges in which prior posts lead to subsequent posts. Using NLP and discourse analytic techniques designed to extract meaning from extended texts, we will explore whether there is a connection between survey data and social media threads instead that is not detected when analyzing isolated posts.

As results emerge, they will be discussed with Census Bureau partners so that the research can be as valuable as possible to the agency?s goals and mission.