MucLex: A German Lexicon for Surface Realisation

Kira Klimt, Daniel Braun, Daniela Schneider, Florian Matthes

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Language resources for languages other than English are often scarce. Rule-based surface realisers need elaborate lexica in order to be able to generate correct language, especially in languages like German, which include many irregular word forms. In this paper, we present MucLex, a German lexicon for the Natural Language Generation task of surface realisation, based on the crowd-sourced online lexicon Wiktionary. MucLex contains more than 100,000 lemmata and more than 670,000 different word forms in a well-structured XML file and is available under the Creative Commons BY-SA 3.0 license.
Original languageEnglish
Title of host publicationProceedings of The 12th Language Resources and Evaluation Conference (LREC 2020), Marseille, 11-16 May 2020
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Pages4653-4657
Number of pages5
Publication statusPublished - 2020
Externally publishedYes
Event12th International Conference on Language Resources and Evaluation, LREC 2020 - Marseille, France
Duration: 11 May 202016 May 2020

Conference

Conference12th International Conference on Language Resources and Evaluation, LREC 2020
CountryFrance
CityMarseille
Period11/05/2016/05/20

Fingerprint

Dive into the research topics of 'MucLex: A German Lexicon for Surface Realisation'. Together they form a unique fingerprint.

Cite this