Automatic Detection of Intra-Word Code-Switching

Dong-Phuong Nguyen, Leonie Cornips

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    11 Citations (Scopus)
    139 Downloads (Pure)

    Abstract

    Many people are multilingual and they may draw from multiple language varieties when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.
    Original languageUndefined
    Title of host publicationProceedings of the 14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
    Place of PublicationStroudsburg, PA, USA
    PublisherAssociation for Computational Linguistics (ACL)
    Pages82-86
    Number of pages5
    Publication statusPublished - 11 Aug 2016
    Event14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology - Berlin, Germany
    Duration: 11 Aug 201611 Aug 2016

    Publication series

    Name
    PublisherAssociation for Computational Linguistics

    Workshop

    Workshop14th Annual SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
    Period11/08/1611/08/16
    Other11 August 2016

    Keywords

    • CR-I.2.7
    • Social Media
    • code-switching
    • METIS-320915
    • IR-102942
    • Computational Linguistics
    • EWI-27518

    Cite this