Fast and discriminative semantic embedding

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

4 Citations (Scopus)
6 Downloads (Pure)

Abstract

The embedding of words and documents in compact, semantically meaningful vector spaces is a crucial part of modern information systems. Deep Learning models are powerful but their hyperparameter selection is often complex and they are expensive to train, and while pre-trained models are available, embeddings trained on general corpora are not necessarily well-suited to domain specific tasks. We propose a novel embedding method which extends random projection by weighting and projecting raw term embeddings orthogonally to an average language vector, thus improving the discriminating power of resulting term embeddings, and build more meaningful document embeddings by assigning appropriate weights to individual terms. We describe how updating the term embeddings online as we process the training data results in an extremely efficient method, in terms of both computational and memory requirements. Our experiments show highly competitive results with various state-of-the-art embedding methods on different tasks, including the standard STS benchmark and a subject prediction task, at a fraction of the computational cost.

Original languageEnglish
Title of host publicationIWCS 2019 - Proceedings of the 13th International Conference on Computational Semantics - Long Papers
EditorsSimon Dobnik, Stergios Chatzikyriakidis, Vera Demberg
PublisherAssociation for Computational Linguistics (ACL)
Pages235-246
Number of pages12
ISBN (Electronic)978-1-950737-19-2
Publication statusPublished - 2019
Event13th International Conference on Computational Semantics, IWCS 2019 - Gothenburg, Sweden
Duration: 23 May 201927 May 2019
Conference number: 13

Conference

Conference13th International Conference on Computational Semantics, IWCS 2019
Abbreviated titleIWCS 2019
Country/TerritorySweden
CityGothenburg
Period23/05/1927/05/19

Fingerprint

Dive into the research topics of 'Fast and discriminative semantic embedding'. Together they form a unique fingerprint.

Cite this