Mining user-generated geographic content: an interactive, crowdsourced approach to validation and supervision

F.O. Ostermann, Gustavo Adolfo Garcia Chapeton, R. Zurita-Milla, M.J. Kraak

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

7 Downloads (Pure)

Abstract

This paper describes a pilot study that implements a novel approach to validate data mining tasks by using the crowd to train a classifier. This hybrid approach to processing successfully addresses challenges faced during human curation or machine processing of user-generated geographic content (UGGC), namely quality control, reproducibility, sustainability, scaling, data quality, overfitting, and training costs. We test the approach on mining UGGC to derive information on local places as humans perceive them. Specifically, we retrieve Flickr image metadata, enrich it semantically by building term vectors using a controlled vocabulary, cluster it spatially, let online participants rate those clusters, classify them into noise and places by using both semantic and cluster characteristics, let online participants supervise the classification by annotating the results, and use their feedback to improve clustering and revise the trained model. The results show that the approach is feasible and suggest future studies to improve it, while also indicating that mining places from UGGC requires more than a single source.
Original languageEnglish
Title of host publicationSocietal Geo-Innovation
Subtitle of host publicationshort papers, posters and poster abstracts of the 20th AGILE Conference on Geographic Information Science, 9-12 May 2017, Wageningen, the Netherlands
EditorsA. Bergt, T. Sarjakoski, R. van Lammeren, F. Rip
Place of PublicationWageningen
PublisherWageningen University & Research Centre
Number of pages6
ISBN (Print)978-90-816960-7-4
Publication statusPublished - 2017
Event20th AGILE Conference on Geographic Information Science, AGILE 2017 - Wageningen, Netherlands
Duration: 9 May 201712 May 2017
Conference number: 20
https://agile-online.org/index.php/conference/proceedings/proceedings-2017

Conference

Conference20th AGILE Conference on Geographic Information Science, AGILE 2017
Abbreviated titleAGILE
CountryNetherlands
CityWageningen
Period9/05/1712/05/17
Internet address

Fingerprint

Thesauri
Processing
Metadata
Quality control
Data mining
Sustainable development
Classifiers
Semantics
Feedback
Costs

Cite this

Ostermann, F. O., Garcia Chapeton, G. A., Zurita-Milla, R., & Kraak, M. J. (2017). Mining user-generated geographic content: an interactive, crowdsourced approach to validation and supervision. In A. Bergt, T. Sarjakoski, R. van Lammeren, & F. Rip (Eds.), Societal Geo-Innovation: short papers, posters and poster abstracts of the 20th AGILE Conference on Geographic Information Science, 9-12 May 2017, Wageningen, the Netherlands Wageningen: Wageningen University & Research Centre.
Ostermann, F.O. ; Garcia Chapeton, Gustavo Adolfo ; Zurita-Milla, R. ; Kraak, M.J. / Mining user-generated geographic content : an interactive, crowdsourced approach to validation and supervision. Societal Geo-Innovation: short papers, posters and poster abstracts of the 20th AGILE Conference on Geographic Information Science, 9-12 May 2017, Wageningen, the Netherlands. editor / A. Bergt ; T. Sarjakoski ; R. van Lammeren ; F. Rip. Wageningen : Wageningen University & Research Centre, 2017.
@inproceedings{b105c50b81164684b1f56d487802adc7,
title = "Mining user-generated geographic content: an interactive, crowdsourced approach to validation and supervision",
abstract = "This paper describes a pilot study that implements a novel approach to validate data mining tasks by using the crowd to train a classifier. This hybrid approach to processing successfully addresses challenges faced during human curation or machine processing of user-generated geographic content (UGGC), namely quality control, reproducibility, sustainability, scaling, data quality, overfitting, and training costs. We test the approach on mining UGGC to derive information on local places as humans perceive them. Specifically, we retrieve Flickr image metadata, enrich it semantically by building term vectors using a controlled vocabulary, cluster it spatially, let online participants rate those clusters, classify them into noise and places by using both semantic and cluster characteristics, let online participants supervise the classification by annotating the results, and use their feedback to improve clustering and revise the trained model. The results show that the approach is feasible and suggest future studies to improve it, while also indicating that mining places from UGGC requires more than a single source.",
author = "F.O. Ostermann and {Garcia Chapeton}, {Gustavo Adolfo} and R. Zurita-Milla and M.J. Kraak",
year = "2017",
language = "English",
isbn = "978-90-816960-7-4",
editor = "A. Bergt and T. Sarjakoski and {van Lammeren}, R. and F. Rip",
booktitle = "Societal Geo-Innovation",
publisher = "Wageningen University & Research Centre",

}

Ostermann, FO, Garcia Chapeton, GA, Zurita-Milla, R & Kraak, MJ 2017, Mining user-generated geographic content: an interactive, crowdsourced approach to validation and supervision. in A Bergt, T Sarjakoski, R van Lammeren & F Rip (eds), Societal Geo-Innovation: short papers, posters and poster abstracts of the 20th AGILE Conference on Geographic Information Science, 9-12 May 2017, Wageningen, the Netherlands. Wageningen University & Research Centre, Wageningen, 20th AGILE Conference on Geographic Information Science, AGILE 2017, Wageningen, Netherlands, 9/05/17.

Mining user-generated geographic content : an interactive, crowdsourced approach to validation and supervision. / Ostermann, F.O.; Garcia Chapeton, Gustavo Adolfo; Zurita-Milla, R.; Kraak, M.J.

Societal Geo-Innovation: short papers, posters and poster abstracts of the 20th AGILE Conference on Geographic Information Science, 9-12 May 2017, Wageningen, the Netherlands. ed. / A. Bergt; T. Sarjakoski; R. van Lammeren; F. Rip. Wageningen : Wageningen University & Research Centre, 2017.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Mining user-generated geographic content

T2 - an interactive, crowdsourced approach to validation and supervision

AU - Ostermann, F.O.

AU - Garcia Chapeton, Gustavo Adolfo

AU - Zurita-Milla, R.

AU - Kraak, M.J.

PY - 2017

Y1 - 2017

N2 - This paper describes a pilot study that implements a novel approach to validate data mining tasks by using the crowd to train a classifier. This hybrid approach to processing successfully addresses challenges faced during human curation or machine processing of user-generated geographic content (UGGC), namely quality control, reproducibility, sustainability, scaling, data quality, overfitting, and training costs. We test the approach on mining UGGC to derive information on local places as humans perceive them. Specifically, we retrieve Flickr image metadata, enrich it semantically by building term vectors using a controlled vocabulary, cluster it spatially, let online participants rate those clusters, classify them into noise and places by using both semantic and cluster characteristics, let online participants supervise the classification by annotating the results, and use their feedback to improve clustering and revise the trained model. The results show that the approach is feasible and suggest future studies to improve it, while also indicating that mining places from UGGC requires more than a single source.

AB - This paper describes a pilot study that implements a novel approach to validate data mining tasks by using the crowd to train a classifier. This hybrid approach to processing successfully addresses challenges faced during human curation or machine processing of user-generated geographic content (UGGC), namely quality control, reproducibility, sustainability, scaling, data quality, overfitting, and training costs. We test the approach on mining UGGC to derive information on local places as humans perceive them. Specifically, we retrieve Flickr image metadata, enrich it semantically by building term vectors using a controlled vocabulary, cluster it spatially, let online participants rate those clusters, classify them into noise and places by using both semantic and cluster characteristics, let online participants supervise the classification by annotating the results, and use their feedback to improve clustering and revise the trained model. The results show that the approach is feasible and suggest future studies to improve it, while also indicating that mining places from UGGC requires more than a single source.

UR - http://ezproxy.utwente.nl:2048/login?url=https://webapps.itc.utwente.nl/library/2017/conf/ostermann_min.pdf

M3 - Conference contribution

SN - 978-90-816960-7-4

BT - Societal Geo-Innovation

A2 - Bergt, A.

A2 - Sarjakoski, T.

A2 - van Lammeren, R.

A2 - Rip, F.

PB - Wageningen University & Research Centre

CY - Wageningen

ER -

Ostermann FO, Garcia Chapeton GA, Zurita-Milla R, Kraak MJ. Mining user-generated geographic content: an interactive, crowdsourced approach to validation and supervision. In Bergt A, Sarjakoski T, van Lammeren R, Rip F, editors, Societal Geo-Innovation: short papers, posters and poster abstracts of the 20th AGILE Conference on Geographic Information Science, 9-12 May 2017, Wageningen, the Netherlands. Wageningen: Wageningen University & Research Centre. 2017