Word level language identification in online multilingual communication

Dong-Phuong Nguyen, A. Seza Dogruoz

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    86 Citations (Scopus)
    50 Downloads (Pure)

    Abstract

    Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.
    Original languageUndefined
    Title of host publicationProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
    Place of PublicationMichigan, USA
    PublisherAssociation for Computational Linguistics (ACL)
    Pages857-862
    Number of pages6
    ISBN (Print)978-1-937284-97-8
    Publication statusPublished - 18 Oct 2013
    EventConference on Empirical Methods in Natural Language Processing 2013 - Grand Hyatt Seattle, Seattle, United States
    Duration: 18 Oct 201321 Oct 2013
    http://mirror.aclweb.org/emnlp2013/

    Publication series

    Name
    PublisherAssociation for Computational Linguistics

    Conference

    ConferenceConference on Empirical Methods in Natural Language Processing 2013
    Abbreviated titleEMNLP 2013
    Country/TerritoryUnited States
    CitySeattle
    Period18/10/1321/10/13
    Internet address

    Keywords

    • Language identification
    • multilingual
    • METIS-302564
    • EWI-24092
    • IR-88555
    • Social Media

    Cite this