Explaining Topical Distances Using Word Embeddings

Nils Witt, Christin Seifert, Michael Granitzer

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    3 Citations (Scopus)

    Abstract

    Word and document embeddings have gained a lot of attention recently, because they tend to work well in text mining tasks. Yet, they elude humans intuition. In this paper we are making the attempt to explain the arithmetic difference between two document embeddings by a series of word embeddings. We present an algorithm that iteratively picks words from a vocabulary that closes the topical gap between the documents. Moreover, we present the Econstor16 corpus that was used for the experiments. Although not all words that are found are great matches, the algorithm is able to find sets of words that are reasonable to a human that reads both documents. Remarkably, some of the well-explaining words are mentioned in neither documents.
    Original languageEnglish
    Title of host publication2016 27th International Workshop on Database and Expert Systems Applications (DEXA)
    ISBN (Electronic)978-1-5090-3635-6
    DOIs
    Publication statusPublished - 1 Sep 2016
    Event27th International Conference on Database and Expert Systems Applications, DEXA 2016 - Instituto Superior de Engenharia do Porto, Porto, Portugal
    Duration: 5 Sep 20168 Sep 2016
    Conference number: 27
    http://www.dexa.org/previous/dexa2016/dexa2016.html

    Conference

    Conference27th International Conference on Database and Expert Systems Applications, DEXA 2016
    Abbreviated titleDEXA
    CountryPortugal
    CityPorto
    Period5/09/168/09/16
    Internet address

    Fingerprint Dive into the research topics of 'Explaining Topical Distances Using Word Embeddings'. Together they form a unique fingerprint.

    Cite this