Abstract
Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.
Original language | Undefined |
---|---|
Title of host publication | Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing |
Place of Publication | Michigan, USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 857-862 |
Number of pages | 6 |
ISBN (Print) | 978-1-937284-97-8 |
Publication status | Published - 18 Oct 2013 |
Event | Conference on Empirical Methods in Natural Language Processing 2013 - Grand Hyatt Seattle, Seattle, United States Duration: 18 Oct 2013 → 21 Oct 2013 http://mirror.aclweb.org/emnlp2013/ |
Publication series
Name | |
---|---|
Publisher | Association for Computational Linguistics |
Conference
Conference | Conference on Empirical Methods in Natural Language Processing 2013 |
---|---|
Abbreviated title | EMNLP 2013 |
Country/Territory | United States |
City | Seattle |
Period | 18/10/13 → 21/10/13 |
Internet address |
Keywords
- Language identification
- multilingual
- METIS-302564
- EWI-24092
- IR-88555
- Social Media