Part-of-Speech Tagging for Northern Kurdish

Peshmerge Morad, Sina Ahmadi, Lorenzo Gatti

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
85 Downloads (Pure)

Abstract

In the growing domain of natural language processing, low-resourced languages like Northern Kurdish remain largely unexplored due to the lack of resources needed to be part of this growth. In particular, the tasks of part-of-speech tagging and tokenization for Northern Kurdish are still insufficiently addressed. In this study, we aim to bridge this gap by evaluating a range of statistical, neural, and fine-tuned-based models specifically tailored for Northern Kurdish. Leveraging limited but valuable datasets, including the Universal Dependency Kurmanji treebank and a novel manually annotated and tokenized gold-standard dataset consisting of 136 sentences (2, 937 tokens). We evaluate several POS tagging models and report that the fine-tuned transformer-based model outperforms others, achieving an accuracy of 0.87 and a macro-averaged F1 score of 0.77. Data and models are publicly available under an open license at https://github.com/peshmerge/northern-kurdish-pos-tagging.

Original languageEnglish
Title of host publicationJoint Workshop on Multiword Expressions and Universal Dependencies, MWE-UD 2024 at LREC-COLING 2024 - Workshop Proceedings
EditorsArchna Bhatia, Gosse Bouma, A. Seza Dogruoz, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Joakim Nivre, Alexandre Rademacher
PublisherEuropean Language Resources Association (ELRA)
Pages70-80
Number of pages11
ISBN (Electronic)9782493814203
Publication statusPublished - 2024
Event2024 Joint Workshop on Multiword Expressions and Universal Dependencies, MWE-UD 2024 - Torino, Italy
Duration: 25 May 202425 May 2024

Workshop

Workshop2024 Joint Workshop on Multiword Expressions and Universal Dependencies, MWE-UD 2024
Abbreviated titleMWE-UD 2024
Country/TerritoryItaly
CityTorino
Period25/05/2425/05/24

Keywords

  • low-resource NLP
  • morphosyntactic analysis
  • Northern Kurdish
  • Part-of-Speech tagging

Fingerprint

Dive into the research topics of 'Part-of-Speech Tagging for Northern Kurdish'. Together they form a unique fingerprint.

Cite this