A person’s language use reveals much about the person’s social identity, which is based on the social categories a person belongs to including age and gender. We discuss the development of TweetGenie, a computer program that predicts the age of Twitter users based on their language use. We explore age prediction in three different ways: classifying users into age categories, by life stages, and predicting their exact age. An automatic system achieves better performance than humans on these tasks. Both humans and the automatic systems tend to underpredict the age of older people. We find that most linguistic changes occur when people are young, and that after around 30 years the studied variables show little change, making it difficult to predict the ages of older Twitter users.
- Social Media