MucLex: A German Lexicon for Surface Realisation

Kira Klimt, Daniel Braun, Daniela Schneider, Florian Matthes

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

25 Downloads (Pure)

Abstract

Language resources for languages other than English are often scarce. Rule-based surface realisers need elaborate lexica in order to be able to generate correct language, especially in languages like German, which include many irregular word forms. In this paper, we present MucLex, a German lexicon for the Natural Language Generation task of surface realisation, based on the crowd-sourced online lexicon Wiktionary. MucLex contains more than 100,000 lemmata and more than 670,000 different word forms in a well-structured XML file and is available under the Creative Commons BY-SA 3.0 license.
Original languageEnglish
Title of host publicationProceedings of The 12th Language Resources and Evaluation Conference (LREC 2020), Marseille, 11-16 May 2020
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Pages4653-4657
Number of pages5
Publication statusPublished - 2020
Externally publishedYes
Event12th Language Resources and Evaluation Conference, LREC 2020 - Palais du Pharo, Marseille, France
Duration: 11 May 202016 May 2020
Conference number: 12
https://lrec2020.lrec-conf.org/en/

Conference

Conference12th Language Resources and Evaluation Conference, LREC 2020
Abbreviated titleLREC
Country/TerritoryFrance
CityMarseille
Period11/05/2016/05/20
Internet address

Fingerprint

Dive into the research topics of 'MucLex: A German Lexicon for Surface Realisation'. Together they form a unique fingerprint.

Cite this