Abstract
Word and document embeddings have gained a lot of attention recently, because they tend to work well in text mining tasks. Yet, they elude humans intuition. In this paper we are making the attempt to explain the arithmetic difference between two document embeddings by a series of word embeddings. We present an algorithm that iteratively picks words from a vocabulary that closes the topical gap between the documents. Moreover, we present the Econstor16 corpus that was used for the experiments. Although not all words that are found are great matches, the algorithm is able to find sets of words that are reasonable to a human that reads both documents. Remarkably, some of the well-explaining words are mentioned in neither documents.
Original language | English |
---|---|
Title of host publication | 2016 27th International Workshop on Database and Expert Systems Applications (DEXA) |
Place of Publication | Piscataway, NJ |
Publisher | IEEE |
Pages | 212-217 |
ISBN (Electronic) | 978-1-5090-3635-6 |
ISBN (Print) | 978-1-5090-3636-3 |
DOIs | |
Publication status | Published - 1 Sept 2016 |
Externally published | Yes |
Event | 27th International Conference on Database and Expert Systems Applications, DEXA 2016 - Instituto Superior de Engenharia do Porto, Porto, Portugal Duration: 5 Sept 2016 → 8 Sept 2016 Conference number: 27 http://www.dexa.org/previous/dexa2016/dexa2016.html |
Conference
Conference | 27th International Conference on Database and Expert Systems Applications, DEXA 2016 |
---|---|
Abbreviated title | DEXA |
Country/Territory | Portugal |
City | Porto |
Period | 5/09/16 → 8/09/16 |
Internet address |
Keywords
- n/a OA procedure