A Support Vector Machine Approach to Dutch Part-of-Speech Tagging

  • 5 Citations

Abstract

Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a large corpus of annotated transcriptions of spoken Dutch. Special attention is paid to the decomposition of the large data set into parts for common, uncommon and unknown words. This does not only solve the space problems caused by the amount of data, it also improves the tagging time. The performance of the resulting tagger in terms of accuracy is 97.54%, which is quite good, where the speed of the tagger is reasonably good.
Original languageUndefined
Title of host publicationAdvances in Intelligent Data Analysis VII. Proceedings of the 7th International Symposium on Intelligent Data Analysis, IDA 2007
EditorsM.R. Berthold, J. Shawe-Taylor, N. Lavrac
Place of PublicationLondon
PublisherSpringer Verlag
Pages274-283
Number of pages10
ISBN (Print)978-3-540-74824-3
DOIs
StatePublished - Sep 2007

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
NumberLNCS4549
Volume4723
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Tag
Part of speech
Tagging
Decomposition
Support vector machine
Transcription
Assignment
Natural language
Part-of-speech tagging

Keywords

  • EWI-11050
  • IR-61912
  • METIS-241907
  • HMI-CI: Computational Intelligence

Cite this

Poel, M., Stegeman, L., & op den Akker, H. J. A. (2007). A Support Vector Machine Approach to Dutch Part-of-Speech Tagging. In M. R. Berthold, J. Shawe-Taylor, & N. Lavrac (Eds.), Advances in Intelligent Data Analysis VII. Proceedings of the 7th International Symposium on Intelligent Data Analysis, IDA 2007 (pp. 274-283). [10.1007/978-3-540-74825-0_25] (Lecture Notes in Computer Science; Vol. 4723, No. LNCS4549). London: Springer Verlag. DOI: 10.1007/978-3-540-74825-0_25

Poel, Mannes; Stegeman, L.; op den Akker, Hendrikus J.A. / A Support Vector Machine Approach to Dutch Part-of-Speech Tagging.

Advances in Intelligent Data Analysis VII. Proceedings of the 7th International Symposium on Intelligent Data Analysis, IDA 2007. ed. / M.R. Berthold; J. Shawe-Taylor; N. Lavrac. London : Springer Verlag, 2007. p. 274-283 10.1007/978-3-540-74825-0_25 (Lecture Notes in Computer Science; Vol. 4723, No. LNCS4549).

Research output: Scientific - peer-reviewConference contribution

@inbook{3dd4e183d4a04de2a9be02f988dc326a,
title = "A Support Vector Machine Approach to Dutch Part-of-Speech Tagging",
abstract = "Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a large corpus of annotated transcriptions of spoken Dutch. Special attention is paid to the decomposition of the large data set into parts for common, uncommon and unknown words. This does not only solve the space problems caused by the amount of data, it also improves the tagging time. The performance of the resulting tagger in terms of accuracy is 97.54%, which is quite good, where the speed of the tagger is reasonably good.",
keywords = "EWI-11050, IR-61912, METIS-241907, HMI-CI: Computational Intelligence",
author = "Mannes Poel and L. Stegeman and {op den Akker}, {Hendrikus J.A.}",
year = "2007",
month = "9",
doi = "10.1007/978-3-540-74825-0_25",
isbn = "978-3-540-74824-3",
series = "Lecture Notes in Computer Science",
publisher = "Springer Verlag",
number = "LNCS4549",
pages = "274--283",
editor = "M.R. Berthold and J. Shawe-Taylor and N. Lavrac",
booktitle = "Advances in Intelligent Data Analysis VII. Proceedings of the 7th International Symposium on Intelligent Data Analysis, IDA 2007",

}

Poel, M, Stegeman, L & op den Akker, HJA 2007, A Support Vector Machine Approach to Dutch Part-of-Speech Tagging. in MR Berthold, J Shawe-Taylor & N Lavrac (eds), Advances in Intelligent Data Analysis VII. Proceedings of the 7th International Symposium on Intelligent Data Analysis, IDA 2007., 10.1007/978-3-540-74825-0_25, Lecture Notes in Computer Science, no. LNCS4549, vol. 4723, Springer Verlag, London, pp. 274-283. DOI: 10.1007/978-3-540-74825-0_25

A Support Vector Machine Approach to Dutch Part-of-Speech Tagging. / Poel, Mannes; Stegeman, L.; op den Akker, Hendrikus J.A.

Advances in Intelligent Data Analysis VII. Proceedings of the 7th International Symposium on Intelligent Data Analysis, IDA 2007. ed. / M.R. Berthold; J. Shawe-Taylor; N. Lavrac. London : Springer Verlag, 2007. p. 274-283 10.1007/978-3-540-74825-0_25 (Lecture Notes in Computer Science; Vol. 4723, No. LNCS4549).

Research output: Scientific - peer-reviewConference contribution

TY - CHAP

T1 - A Support Vector Machine Approach to Dutch Part-of-Speech Tagging

AU - Poel,Mannes

AU - Stegeman,L.

AU - op den Akker,Hendrikus J.A.

PY - 2007/9

Y1 - 2007/9

N2 - Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a large corpus of annotated transcriptions of spoken Dutch. Special attention is paid to the decomposition of the large data set into parts for common, uncommon and unknown words. This does not only solve the space problems caused by the amount of data, it also improves the tagging time. The performance of the resulting tagger in terms of accuracy is 97.54%, which is quite good, where the speed of the tagger is reasonably good.

AB - Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a large corpus of annotated transcriptions of spoken Dutch. Special attention is paid to the decomposition of the large data set into parts for common, uncommon and unknown words. This does not only solve the space problems caused by the amount of data, it also improves the tagging time. The performance of the resulting tagger in terms of accuracy is 97.54%, which is quite good, where the speed of the tagger is reasonably good.

KW - EWI-11050

KW - IR-61912

KW - METIS-241907

KW - HMI-CI: Computational Intelligence

U2 - 10.1007/978-3-540-74825-0_25

DO - 10.1007/978-3-540-74825-0_25

M3 - Conference contribution

SN - 978-3-540-74824-3

T3 - Lecture Notes in Computer Science

SP - 274

EP - 283

BT - Advances in Intelligent Data Analysis VII. Proceedings of the 7th International Symposium on Intelligent Data Analysis, IDA 2007

PB - Springer Verlag

ER -

Poel M, Stegeman L, op den Akker HJA. A Support Vector Machine Approach to Dutch Part-of-Speech Tagging. In Berthold MR, Shawe-Taylor J, Lavrac N, editors, Advances in Intelligent Data Analysis VII. Proceedings of the 7th International Symposium on Intelligent Data Analysis, IDA 2007. London: Springer Verlag. 2007. p. 274-283. 10.1007/978-3-540-74825-0_25. (Lecture Notes in Computer Science; LNCS4549). Available from, DOI: 10.1007/978-3-540-74825-0_25