Word and document embeddings have gained a lot of attention recently, because they tend to work well in text mining tasks. Yet, they elude humans intuition. In this paper we are making the attempt to explain the arithmetic difference between two document embeddings by a series of word embeddings. We present an algorithm that iteratively picks words from a vocabulary that closes the topical gap between the documents. Moreover, we present the Econstor16 corpus that was used for the experiments. Although not all words that are found are great matches, the algorithm is able to find sets of words that are reasonable to a human that reads both documents. Remarkably, some of the well-explaining words are mentioned in neither documents.
|Title of host publication||2016 27th International Workshop on Database and Expert Systems Applications (DEXA)|
|Publication status||Published - 1 Sep 2016|
|Event||27th International Conference on Database and Expert Systems Applications, DEXA 2016 - Instituto Superior de Engenharia do Porto, Porto, Portugal|
Duration: 5 Sep 2016 → 8 Sep 2016
Conference number: 27
|Conference||27th International Conference on Database and Expert Systems Applications, DEXA 2016|
|Period||5/09/16 → 8/09/16|