A novel textual data augmentation method for identifying comparative text from user-generated content

Research output: Contribution to conferenceAbstractAcademic

Abstract

Mining user-generated content on e-commerce platforms and social media is timely and more objective compared with other information access channels for gaining competitive intelligence. Identifying comparative text from large volumes of non-comparative text is an important but challenging task. On one hand, existing methods are time-consuming and not generalizable across different domains. On the other hand, the datasets for the task generally suffer from the severe imbalance issue. To address abovementioned problems, we propose a framework adopting advanced deep learning methods to automatically learn features and a novel textual data augmentation method named TA3S to deal with the data imbalance issue. Specifically, the TA3S method simultaneously considers the syntactic structure and semantic information of comparative text samples. Moreover, in order to support the successful implementation of TA3S, we develop a novel method based on word embedding and label propagation algorithm to distinguish between synonymous and antonymous substitute words. The experiments on two real-world datasets demonstrate the feasibility and effectiveness of our framework, and present that our framework outperforms state-of-the-art methods in identifying comparative text from user-generated content.
Original languageEnglish
Publication statusPublished - 17 Jun 2022
Event32nd Meeting of Computational Linguistics in The Netherlands, CLIN 2022 - Willem II Stadium, Tilburg, Netherlands
Duration: 17 Jun 202217 Jun 2022
Conference number: 32
https://clin2022.uvt.nl/

Conference

Conference32nd Meeting of Computational Linguistics in The Netherlands, CLIN 2022
Abbreviated titleCLIN 2022
Country/TerritoryNetherlands
CityTilburg
Period17/06/2217/06/22
Internet address

Fingerprint

Dive into the research topics of 'A novel textual data augmentation method for identifying comparative text from user-generated content'. Together they form a unique fingerprint.

Cite this