Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text

Maurice van Keulen, Mena Badieh Habib

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

77 Downloads (Pure)

Abstract

Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. To automatically resolve ambiguity, the grammatical structure of sentences is used. However, when we move to informal language widely used in social media, the language becomes more ambiguous and thus more challenging for automatic understanding. Information Extraction (IE) is the research field that enables the use of unstructured text in a structured way. Named Entity Extraction (NEE) is a sub task of IE that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention. The goal of this paper is to provide an overview on some approaches that mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. The proposed methods open the doors for more sophisticated applications based on users’ contributions on social media. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against the informality of the used language. We have discovered a reinforcement effect and exploited it a technique that improves extraction quality by feeding back disambiguation results. We present a method of handling the uncertainty involved in extraction to improve the disambiguation results.
Original languageUndefined
Title of host publicationUncertainty Reasoning for the Semantic Web III
Place of PublicationBerlin
PublisherSpringer
Pages309-328
Number of pages20
ISBN (Print)978-3-319-13412-3
DOIs
Publication statusPublished - Nov 2014

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Number8816
Volume8816
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • Named entity recognitionNamed entity linkingNamed entity extractionNamed entity disambiguationInformal textUncertainty handling
  • EWI-25421
  • Uncertainty handling
  • IR-93592
  • Named Entity Disambiguation
  • Named Entity Extraction
  • METIS-309724
  • Informal text

Cite this

van Keulen, M., & Habib, M. B. (2014). Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text. In Uncertainty Reasoning for the Semantic Web III (pp. 309-328). (Lecture Notes in Computer Science; Vol. 8816, No. 8816). Berlin: Springer. https://doi.org/10.1007/978-3-319-13413-0_16
van Keulen, Maurice ; Habib, Mena Badieh. / Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text. Uncertainty Reasoning for the Semantic Web III. Berlin : Springer, 2014. pp. 309-328 (Lecture Notes in Computer Science; 8816).
@inbook{ac3eaa4cb41e4f1c9963b7ce4cca0255,
title = "Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text",
abstract = "Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. To automatically resolve ambiguity, the grammatical structure of sentences is used. However, when we move to informal language widely used in social media, the language becomes more ambiguous and thus more challenging for automatic understanding. Information Extraction (IE) is the research field that enables the use of unstructured text in a structured way. Named Entity Extraction (NEE) is a sub task of IE that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention. The goal of this paper is to provide an overview on some approaches that mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. The proposed methods open the doors for more sophisticated applications based on users’ contributions on social media. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against the informality of the used language. We have discovered a reinforcement effect and exploited it a technique that improves extraction quality by feeding back disambiguation results. We present a method of handling the uncertainty involved in extraction to improve the disambiguation results.",
keywords = "Named entity recognitionNamed entity linkingNamed entity extractionNamed entity disambiguationInformal textUncertainty handling, EWI-25421, Uncertainty handling, IR-93592, Named Entity Disambiguation, Named Entity Extraction, METIS-309724, Informal text",
author = "{van Keulen}, Maurice and Habib, {Mena Badieh}",
note = "10.1007/978-3-319-13413-0_16",
year = "2014",
month = "11",
doi = "10.1007/978-3-319-13413-0_16",
language = "Undefined",
isbn = "978-3-319-13412-3",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
number = "8816",
pages = "309--328",
booktitle = "Uncertainty Reasoning for the Semantic Web III",

}

van Keulen, M & Habib, MB 2014, Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text. in Uncertainty Reasoning for the Semantic Web III. Lecture Notes in Computer Science, no. 8816, vol. 8816, Springer, Berlin, pp. 309-328. https://doi.org/10.1007/978-3-319-13413-0_16

Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text. / van Keulen, Maurice; Habib, Mena Badieh.

Uncertainty Reasoning for the Semantic Web III. Berlin : Springer, 2014. p. 309-328 (Lecture Notes in Computer Science; Vol. 8816, No. 8816).

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

TY - CHAP

T1 - Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text

AU - van Keulen, Maurice

AU - Habib, Mena Badieh

N1 - 10.1007/978-3-319-13413-0_16

PY - 2014/11

Y1 - 2014/11

N2 - Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. To automatically resolve ambiguity, the grammatical structure of sentences is used. However, when we move to informal language widely used in social media, the language becomes more ambiguous and thus more challenging for automatic understanding. Information Extraction (IE) is the research field that enables the use of unstructured text in a structured way. Named Entity Extraction (NEE) is a sub task of IE that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention. The goal of this paper is to provide an overview on some approaches that mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. The proposed methods open the doors for more sophisticated applications based on users’ contributions on social media. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against the informality of the used language. We have discovered a reinforcement effect and exploited it a technique that improves extraction quality by feeding back disambiguation results. We present a method of handling the uncertainty involved in extraction to improve the disambiguation results.

AB - Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. To automatically resolve ambiguity, the grammatical structure of sentences is used. However, when we move to informal language widely used in social media, the language becomes more ambiguous and thus more challenging for automatic understanding. Information Extraction (IE) is the research field that enables the use of unstructured text in a structured way. Named Entity Extraction (NEE) is a sub task of IE that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention. The goal of this paper is to provide an overview on some approaches that mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. The proposed methods open the doors for more sophisticated applications based on users’ contributions on social media. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against the informality of the used language. We have discovered a reinforcement effect and exploited it a technique that improves extraction quality by feeding back disambiguation results. We present a method of handling the uncertainty involved in extraction to improve the disambiguation results.

KW - Named entity recognitionNamed entity linkingNamed entity extractionNamed entity disambiguationInformal textUncertainty handling

KW - EWI-25421

KW - Uncertainty handling

KW - IR-93592

KW - Named Entity Disambiguation

KW - Named Entity Extraction

KW - METIS-309724

KW - Informal text

U2 - 10.1007/978-3-319-13413-0_16

DO - 10.1007/978-3-319-13413-0_16

M3 - Chapter

SN - 978-3-319-13412-3

T3 - Lecture Notes in Computer Science

SP - 309

EP - 328

BT - Uncertainty Reasoning for the Semantic Web III

PB - Springer

CY - Berlin

ER -

van Keulen M, Habib MB. Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text. In Uncertainty Reasoning for the Semantic Web III. Berlin: Springer. 2014. p. 309-328. (Lecture Notes in Computer Science; 8816). https://doi.org/10.1007/978-3-319-13413-0_16