Determine the User Country of a Tweet

Han van der Veen, Djoerd Hiemstra, Tijs Adriaan van den Broek, Michel Léon Ehrenhard, Ariana Need

Research output: Book/ReportReportProfessional

55 Downloads (Pure)

Abstract

In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users’ timezone, the user’s language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.
Original languageUndefined
Place of PublicationEnschede
PublisherCentre for Telematics and Information Technology (CTIT)
Number of pages12
Publication statusPublished - 7 Aug 2015

Publication series

NameCTIT Technical Report Series
PublisherUniversity of Twente, Centre for Telematics and Information Technology (CTIT)
No.TR-CTIT-15-05
ISSN (Print)1381-3625

Keywords

  • METIS-312685
  • IR-97046
  • EWI-26182

Cite this

van der Veen, H., Hiemstra, D., van den Broek, T. A., Ehrenhard, M. L., & Need, A. (2015). Determine the User Country of a Tweet. (CTIT Technical Report Series; No. TR-CTIT-15-05). Enschede: Centre for Telematics and Information Technology (CTIT).