Abstract
Language resources for languages other than English are often scarce. Rule-based surface realisers need elaborate lexica in order to be able to generate correct language, especially in languages like German, which include many irregular word forms. In this paper, we present MucLex, a German lexicon for the Natural Language Generation task of surface realisation, based on the crowd-sourced online lexicon Wiktionary. MucLex contains more than 100,000 lemmata and more than 670,000 different word forms in a well-structured XML file and is available under the Creative Commons BY-SA 3.0 license.
Original language | English |
---|---|
Title of host publication | Proceedings of The 12th Language Resources and Evaluation Conference (LREC 2020), Marseille, 11-16 May 2020 |
Place of Publication | Paris |
Publisher | European Language Resources Association (ELRA) |
Pages | 4653-4657 |
Number of pages | 5 |
Publication status | Published - 2020 |
Externally published | Yes |
Event | 12th International Conference on Language Resources and Evaluation, LREC 2020 - Marseille, France Duration: 11 May 2020 → 16 May 2020 |
Conference
Conference | 12th International Conference on Language Resources and Evaluation, LREC 2020 |
---|---|
Country/Territory | France |
City | Marseille |
Period | 11/05/20 → 16/05/20 |