In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users’ timezone, the user’s language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.
|CTIT Technical Report Series
|University of Twente, Centre for Telematics and Information Technology (CTIT)