University of Cambridge > > NLIP Seminar Series > Language and Demographics on Twitter: Inferring Latent User Attributes from Streaming Communications

Language and Demographics on Twitter: Inferring Latent User Attributes from Streaming Communications

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Ekaterina Kochmar.

Content shared locally within a user’s social network can reveal latent attributes of a user. However, not all attributes are pronounced equally given similar amounts of content (some attributes are harder to predict). We explore various network structures on Twitter for the prediction of attributes of varying levels of difficulty (gender, age, and political beliefs), examining the impact of graph-type and amount of available content. We show that even when limited or no self-authored data is available, language from neighbor communications provide sufficient evidence for prediction. We find that a friend graph leads to highest accuracy for gender, while a follower-graph is preferred for age, and a retweet-graph is best for political belief classification.

However, the above models for social media personal analytics assume access to thousands of messages per user, even though most users author content only sporadically over time. Given this sparsity, we: (i) leverage content from the local neighborhood of a user and (ii) estimate the amount of time and tweets required for a dynamic model to predict user preferences. When updating our dynamic models over time, we find that political beliefs can be often predicted using roughly 100 tweets, depending on the context of user selection, where this could mean hours, or weeks, based on the author’s tweeting frequency.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2021, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity