A novel textual data augmentation method for identifying comparative text from user-generated content

Na Wei, Songzheng Zhao, Jing Liu*, Shenghui Wang

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

8 Citations (Scopus)
220 Downloads (Pure)

Abstract

Mining user-generated content on e-commerce platforms and social media is timely and more objective compared with other information access channels for gaining competitive intelligence. Identifying comparative text from large volumes of non-comparative text is an important but challenging task. On one hand, existing methods are time-consuming and not generalizable across different domains. On the other hand, the datasets for the task generally suffer from the severe imbalance issue. To address abovementioned problems, we propose a framework adopting advanced deep learning methods to automatically learn features and a novel textual data augmentation method named TA3S to deal with the data imbalance issue. Specifically, the TA3S method simultaneously considers the syntactic structure and semantic information of comparative text samples. Moreover, in order to support the successful implementation of TA3S, we develop a novel method based on word embedding and label propagation algorithm to distinguish between synonymous and antonymous substitute words. The experiments on two real-world datasets demonstrate the feasibility and effectiveness of our framework, and present that our framework outperforms state-of-the-art methods in identifying comparative text from user-generated content.
Original languageEnglish
Article number101143
Number of pages14
JournalElectronic commerce research and applications
Volume53
Early online date24 Mar 2022
DOIs
Publication statusPublished - 1 May 2022

Keywords

  • Deep learning
  • Textual data augmentation
  • Substitute word generation
  • Comparative text identification
  • 22/2 OA procedure

Fingerprint

Dive into the research topics of 'A novel textual data augmentation method for identifying comparative text from user-generated content'. Together they form a unique fingerprint.

Cite this