Automated semantic annotation of species names in handwritten texts

Lise Stork, Andreas Weber, Jaap van den Herik, Aske Plaat, Fons Verbeek, Katherine Wolstencroft

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

In this paper, scientific species names from images of handwritten species observations are automatically recognised and annotated with semantic concepts, so that they can be used for document retrieval and faceted search. Until now, automated semantic annotation of such named entities was only applied to printed or digital text. We employ a two-step approach. First, word images are classified, identifying elements of scientific species names; Genus, species, author, using (i) visual structural features, (ii) position, and (iii) context. Second, the identified species names are semantically annotated according to the NHC-Ontology, an ontology that describes species observations. Internationalised Resource Identifiers (IRIs) are assigned to the elements so that they can be linked and disambiguated at a later stage by individual researchers. For the identification of scientific species names, we achieve an average F1 score of 0.86. Moreover, we discuss how our method will function in a semi-automated annotation process, with a fruitful dialogue between system and user as the main objective.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval
Subtitle of host publication41st European Conference on IR Research, ECIR 2019, Proceedings, Part I
EditorsLeif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, Djoerd Hiemstra
Place of PublicationCham
PublisherSpringer
Pages667-680
Number of pages14
ISBN (Electronic)978-3-030-15712-8
ISBN (Print)978-3-030-15711-1
DOIs
Publication statusPublished - 7 Apr 2019
Event41st European Conference on Information Retrieval, ECIR 2019 - Cologne, Germany
Duration: 14 Apr 201918 Apr 2019
Conference number: 41
http://ecir2019.org/

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11437 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference41st European Conference on Information Retrieval, ECIR 2019
Abbreviated titleECIR 2019
CountryGermany
CityCologne
Period14/04/1918/04/19
Internet address

Fingerprint

Semantic Annotation
Ontology
Semantics
Document Retrieval
Text
Names
Annotation
Genus
Resources

Keywords

  • Deep learning
  • Historical biodiversity research
  • Ontologies
  • Scientific names
  • Semantic annotation
  • Taxonomy

Cite this

Stork, L., Weber, A., van den Herik, J., Plaat, A., Verbeek, F., & Wolstencroft, K. (2019). Automated semantic annotation of species names in handwritten texts. In L. Azzopardi, B. Stein, N. Fuhr, P. Mayr, C. Hauff, & D. Hiemstra (Eds.), Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Proceedings, Part I (pp. 667-680). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11437 LNCS). Cham: Springer. https://doi.org/10.1007/978-3-030-15712-8_43
Stork, Lise ; Weber, Andreas ; van den Herik, Jaap ; Plaat, Aske ; Verbeek, Fons ; Wolstencroft, Katherine. / Automated semantic annotation of species names in handwritten texts. Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Proceedings, Part I. editor / Leif Azzopardi ; Benno Stein ; Norbert Fuhr ; Philipp Mayr ; Claudia Hauff ; Djoerd Hiemstra. Cham : Springer, 2019. pp. 667-680 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{daa7c752e4e64d37983aaac970475788,
title = "Automated semantic annotation of species names in handwritten texts",
abstract = "In this paper, scientific species names from images of handwritten species observations are automatically recognised and annotated with semantic concepts, so that they can be used for document retrieval and faceted search. Until now, automated semantic annotation of such named entities was only applied to printed or digital text. We employ a two-step approach. First, word images are classified, identifying elements of scientific species names; Genus, species, author, using (i) visual structural features, (ii) position, and (iii) context. Second, the identified species names are semantically annotated according to the NHC-Ontology, an ontology that describes species observations. Internationalised Resource Identifiers (IRIs) are assigned to the elements so that they can be linked and disambiguated at a later stage by individual researchers. For the identification of scientific species names, we achieve an average F1 score of 0.86. Moreover, we discuss how our method will function in a semi-automated annotation process, with a fruitful dialogue between system and user as the main objective.",
keywords = "Deep learning, Historical biodiversity research, Ontologies, Scientific names, Semantic annotation, Taxonomy",
author = "Lise Stork and Andreas Weber and {van den Herik}, Jaap and Aske Plaat and Fons Verbeek and Katherine Wolstencroft",
year = "2019",
month = "4",
day = "7",
doi = "10.1007/978-3-030-15712-8_43",
language = "English",
isbn = "978-3-030-15711-1",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "667--680",
editor = "Leif Azzopardi and Benno Stein and Norbert Fuhr and Philipp Mayr and Claudia Hauff and Djoerd Hiemstra",
booktitle = "Advances in Information Retrieval",

}

Stork, L, Weber, A, van den Herik, J, Plaat, A, Verbeek, F & Wolstencroft, K 2019, Automated semantic annotation of species names in handwritten texts. in L Azzopardi, B Stein, N Fuhr, P Mayr, C Hauff & D Hiemstra (eds), Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Proceedings, Part I. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11437 LNCS, Springer, Cham, pp. 667-680, 41st European Conference on Information Retrieval, ECIR 2019, Cologne, Germany, 14/04/19. https://doi.org/10.1007/978-3-030-15712-8_43

Automated semantic annotation of species names in handwritten texts. / Stork, Lise; Weber, Andreas; van den Herik, Jaap; Plaat, Aske; Verbeek, Fons; Wolstencroft, Katherine.

Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Proceedings, Part I. ed. / Leif Azzopardi; Benno Stein; Norbert Fuhr; Philipp Mayr; Claudia Hauff; Djoerd Hiemstra. Cham : Springer, 2019. p. 667-680 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11437 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Automated semantic annotation of species names in handwritten texts

AU - Stork, Lise

AU - Weber, Andreas

AU - van den Herik, Jaap

AU - Plaat, Aske

AU - Verbeek, Fons

AU - Wolstencroft, Katherine

PY - 2019/4/7

Y1 - 2019/4/7

N2 - In this paper, scientific species names from images of handwritten species observations are automatically recognised and annotated with semantic concepts, so that they can be used for document retrieval and faceted search. Until now, automated semantic annotation of such named entities was only applied to printed or digital text. We employ a two-step approach. First, word images are classified, identifying elements of scientific species names; Genus, species, author, using (i) visual structural features, (ii) position, and (iii) context. Second, the identified species names are semantically annotated according to the NHC-Ontology, an ontology that describes species observations. Internationalised Resource Identifiers (IRIs) are assigned to the elements so that they can be linked and disambiguated at a later stage by individual researchers. For the identification of scientific species names, we achieve an average F1 score of 0.86. Moreover, we discuss how our method will function in a semi-automated annotation process, with a fruitful dialogue between system and user as the main objective.

AB - In this paper, scientific species names from images of handwritten species observations are automatically recognised and annotated with semantic concepts, so that they can be used for document retrieval and faceted search. Until now, automated semantic annotation of such named entities was only applied to printed or digital text. We employ a two-step approach. First, word images are classified, identifying elements of scientific species names; Genus, species, author, using (i) visual structural features, (ii) position, and (iii) context. Second, the identified species names are semantically annotated according to the NHC-Ontology, an ontology that describes species observations. Internationalised Resource Identifiers (IRIs) are assigned to the elements so that they can be linked and disambiguated at a later stage by individual researchers. For the identification of scientific species names, we achieve an average F1 score of 0.86. Moreover, we discuss how our method will function in a semi-automated annotation process, with a fruitful dialogue between system and user as the main objective.

KW - Deep learning

KW - Historical biodiversity research

KW - Ontologies

KW - Scientific names

KW - Semantic annotation

KW - Taxonomy

UR - http://www.scopus.com/inward/record.url?scp=85064860066&partnerID=8YFLogxK

UR - https://rdcu.be/bFzrr

U2 - 10.1007/978-3-030-15712-8_43

DO - 10.1007/978-3-030-15712-8_43

M3 - Conference contribution

SN - 978-3-030-15711-1

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 667

EP - 680

BT - Advances in Information Retrieval

A2 - Azzopardi, Leif

A2 - Stein, Benno

A2 - Fuhr, Norbert

A2 - Mayr, Philipp

A2 - Hauff, Claudia

A2 - Hiemstra, Djoerd

PB - Springer

CY - Cham

ER -

Stork L, Weber A, van den Herik J, Plaat A, Verbeek F, Wolstencroft K. Automated semantic annotation of species names in handwritten texts. In Azzopardi L, Stein B, Fuhr N, Mayr P, Hauff C, Hiemstra D, editors, Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Proceedings, Part I. Cham: Springer. 2019. p. 667-680. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-15712-8_43