Abstract

In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users’ timezone, the user’s language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.
Original languageUndefined
Place of PublicationEnschede
PublisherCentre for Telematics and Information Technology (CTIT)
Number of pages12
StatePublished - 7 Aug 2015

Publication series

NameCTIT Technical Report Series
PublisherUniversity of Twente, Centre for Telematics and Information Technology (CTIT)
No.TR-CTIT-15-05
ISSN (Print)1381-3625

Fingerprint

GPS

Keywords

  • METIS-312685
  • IR-97046
  • EWI-26182

Cite this

van der Veen, H., Hiemstra, D., van den Broek, T. A., Ehrenhard, M. L., & Need, A. (2015). Determine the User Country of a Tweet. (CTIT Technical Report Series; No. TR-CTIT-15-05). Enschede: Centre for Telematics and Information Technology (CTIT).

van der Veen, Han; Hiemstra, Djoerd; van den Broek, Tijs Adriaan; Ehrenhard, Michel Léon; Need, Ariana / Determine the User Country of a Tweet.

Enschede : Centre for Telematics and Information Technology (CTIT), 2015. 12 p. (CTIT Technical Report Series; No. TR-CTIT-15-05).

Research output: ProfessionalReport

@book{fe48554cdae8492cbb47665e0af298da,
title = "Determine the User Country of a Tweet",
abstract = "In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users’ timezone, the user’s language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.",
keywords = "METIS-312685, IR-97046, EWI-26182",
author = "{van der Veen}, Han and Djoerd Hiemstra and {van den Broek}, {Tijs Adriaan} and Ehrenhard, {Michel Léon} and Ariana Need",
year = "2015",
month = "8",
series = "CTIT Technical Report Series",
publisher = "Centre for Telematics and Information Technology (CTIT)",
number = "TR-CTIT-15-05",
address = "Netherlands",

}

van der Veen, H, Hiemstra, D, van den Broek, TA, Ehrenhard, ML & Need, A 2015, Determine the User Country of a Tweet. CTIT Technical Report Series, no. TR-CTIT-15-05, Centre for Telematics and Information Technology (CTIT), Enschede.

Determine the User Country of a Tweet. / van der Veen, Han; Hiemstra, Djoerd; van den Broek, Tijs Adriaan; Ehrenhard, Michel Léon; Need, Ariana.

Enschede : Centre for Telematics and Information Technology (CTIT), 2015. 12 p. (CTIT Technical Report Series; No. TR-CTIT-15-05).

Research output: ProfessionalReport

TY - BOOK

T1 - Determine the User Country of a Tweet

AU - van der Veen,Han

AU - Hiemstra,Djoerd

AU - van den Broek,Tijs Adriaan

AU - Ehrenhard,Michel Léon

AU - Need,Ariana

PY - 2015/8/7

Y1 - 2015/8/7

N2 - In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users’ timezone, the user’s language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.

AB - In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users’ timezone, the user’s language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.

KW - METIS-312685

KW - IR-97046

KW - EWI-26182

M3 - Report

T3 - CTIT Technical Report Series

BT - Determine the User Country of a Tweet

PB - Centre for Telematics and Information Technology (CTIT)

ER -

van der Veen H, Hiemstra D, van den Broek TA, Ehrenhard ML, Need A. Determine the User Country of a Tweet. Enschede: Centre for Telematics and Information Technology (CTIT), 2015. 12 p. (CTIT Technical Report Series; TR-CTIT-15-05).