Abstract
Many people are multilingual and they may draw from multiple language varieties when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology |
| Place of Publication | Stroudsburg, PA, USA |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 82-86 |
| Number of pages | 5 |
| Publication status | Published - 11 Aug 2016 |
| Event | 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology 2016 - Berlin, Germany Duration: 11 Aug 2016 → 11 Aug 2016 Conference number: 14 |
Workshop
| Workshop | 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology 2016 |
|---|---|
| Country/Territory | Germany |
| City | Berlin |
| Period | 11/08/16 → 11/08/16 |
Keywords
- CR-I.2.7
- Social Media
- code-switching
- Computational Linguistics
Fingerprint
Dive into the research topics of 'Automatic Detection of Intra-Word Code-Switching'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver