A Neural Network Based Dutch Part of Speech Tagger

Mannes Poel, Egwin Boschman, Rieks op den Akker

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    4 Citations (Scopus)
    645 Downloads (Pure)

    Abstract

    In this paper a Neural Network is designed for Part-of-Speech Tagging of Dutch text. Our approach uses the Corpus Gesproken Nederlands (CGN) consisting of almost 9 million transcribed words of spoken Dutch, divided into 15 different categories. The outcome of the design is a Neural Network with an input window of size 8 (4 words back and 3 words ahead) and a hidden layer of 370 neurons. The words ahead are coded based on the relative frequency of the tags in the training set for the word. Special attention is paid to unknown words (words not in the training set) for which such a relative frequency cannot be determined. Based on a 10-fold cross validation an approximation of the relative frequency of tags for unknown words is determined. The performance of the Neural Network is 97.35%, 97.88% on known words and 41.67% on unknown words. This is comparable to state of the art performances found in the literature. The special coding of unknown words resulted of an increase of almost 13% for the tagging of unknown words.
    Original languageEnglish
    Title of host publicationBNAIC 2008
    Subtitle of host publicationProceedings of BNAIC 2008, the twentieth Belgian-Dutch Artificial Intelligence Conference, Enschede/Bad Boekelo, October 30-31, 2008
    EditorsAnton Nijholt, Maja Pantic, Mannes Poel, Hendri Hondorp
    Place of PublicationEnschede
    PublisherTwente University Press (TUP)
    Pages217-224
    Number of pages8
    Publication statusPublished - 2008
    Event20th Benelux Conference on Artificial Intelligence, BNAIC 2008 - Boekelo, Netherlands
    Duration: 30 Oct 200831 Oct 2008
    Conference number: 20

    Publication series

    NameBNAIC: proceedings of the ... Belgium/Netherlands Artificial Intelligence Conference
    PublisherTwente University Press
    Number20
    ISSN (Print)1568-7805

    Conference

    Conference20th Benelux Conference on Artificial Intelligence, BNAIC 2008
    Abbreviated titleBNAIC
    Country/TerritoryNetherlands
    CityBoekelo
    Period30/10/0831/10/08

    Keywords

    • IR-65237
    • METIS-255028
    • EWI-14662

    Fingerprint

    Dive into the research topics of 'A Neural Network Based Dutch Part of Speech Tagger'. Together they form a unique fingerprint.

    Cite this