Word level language identification in online multilingual communication

Dong-Phuong Nguyen, A. Seza Dogruoz

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    83 Citations (Scopus)
    28 Downloads (Pure)


    Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.
    Original languageUndefined
    Title of host publicationProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
    Place of PublicationMichigan, USA
    PublisherAssociation for Computational Linguistics (ACL)
    Number of pages6
    ISBN (Print)978-1-937284-97-8
    Publication statusPublished - 18 Oct 2013
    EventConference on Empirical Methods in Natural Language Processing 2013 - Grand Hyatt Seattle, Seattle, United States
    Duration: 18 Oct 201321 Oct 2013

    Publication series

    PublisherAssociation for Computational Linguistics


    ConferenceConference on Empirical Methods in Natural Language Processing 2013
    Abbreviated titleEMNLP 2013
    Country/TerritoryUnited States
    Internet address


    • Language identification
    • multilingual
    • METIS-302564
    • EWI-24092
    • IR-88555
    • Social Media

    Cite this