Designing HMM-based part-of-speech tagger for Lithuanian language

G Pajarskaite, V. Griciute, G. Raskinis, Jan Kuper

    Research output: Contribution to journalArticleAcademicpeer-review

    2 Citations (Scopus)

    Abstract

    This paper describes a preliminary experiment in designing a Hidden Markov Model (HMM)-based part-of-speech tagger for the Lithuanian language. Part-of-speech tagging is the problem of assigning to each word of a text the proper tag in its context of appearance. It is accomplished in two basic steps: morphological analysis and disambiguation. In this paper, we focus on the problem of disambiguation, i.e., on the problem of choosing the correct tag for each word in the context of a set of possible tags. We constructed a stochastic disambiguation algorithm, based on supervised learning techniques, to learn hidden Markov model's parameters from hand-annotated corpora. The Viterbi algorithm is used to assign the most probable tag to each word in the text.
    Original languageUndefined
    Pages (from-to)231-242
    Number of pages12
    JournalInformatica
    Volume15
    Issue number2
    Publication statusPublished - 2004

    Keywords

    • IR-63349
    • EWI-6619

    Cite this