Care more about customers: unsupervised domain-independent aspect detection for sentiment analysis of customer reviews

Ayoub Bagheri, Mohamad Saraee, Franciska de Jong

Research output: Contribution to journalArticleAcademicpeer-review

68 Citations (Scopus)

Abstract

With the rapid growth of user-generated content on the internet, automatic sentiment analysis of online customer reviews has become a hot research topic recently, but due to variety and wide range of products and services being reviewed on the internet, the supervised and domain-specific models are often not practical. As the number of reviews expands, it is essential to develop an efficient sentiment analysis model that is capable of extracting product aspects and determining the sentiments for these aspects. In this paper, we propose a novel unsupervised and domain-independent model for detecting explicit and implicit aspects in reviews for sentiment analysis. In the model, first a generalized method is proposed to learn multi-word aspects and then a set of heuristic rules is employed to take into account the influence of an opinion word on detecting the aspect. Second a new metric based on mutual information and aspect frequency is proposed to score aspects with a new bootstrapping iterative algorithm. The presented bootstrapping algorithm works with an unsupervised seed set. Third, two pruning methods based on the relations between aspects in reviews are presented to remove incorrect aspects. Finally the model employs an approach which uses explicit aspects and opinion words to identify implicit aspects. Utilizing extracted polarity lexicon, the approach maps each opinion word in the lexicon to the set of pre-extracted explicit aspects with a co-occurrence metric. The proposed model was evaluated on a collection of English product review datasets. The model does not require any labeled training data and it can be easily applied to other languages or other domains such as movie reviews. Experimental results show considerable improvements of our model over conventional unsupervised and supervised approaches.
Original languageEnglish
Pages (from-to)201-213
Number of pages13
JournalKnowledge-based systems
Volume52
Issue numberNovember
DOIs
Publication statusPublished - Nov 2013

Fingerprint

Internet
Sentiment analysis
Seed
World Wide Web
Bootstrapping
Sentiment
Mutual information
Product review
Pruning
User-generated content
Heuristics
Language
Movies

Keywords

  • EWI-23618
  • IR-87217
  • Sentiment Analysis
  • aspect detection

Cite this

@article{b7eb5601bf8b4a57bdbd5d153c20fd91,
title = "Care more about customers: unsupervised domain-independent aspect detection for sentiment analysis of customer reviews",
abstract = "With the rapid growth of user-generated content on the internet, automatic sentiment analysis of online customer reviews has become a hot research topic recently, but due to variety and wide range of products and services being reviewed on the internet, the supervised and domain-specific models are often not practical. As the number of reviews expands, it is essential to develop an efficient sentiment analysis model that is capable of extracting product aspects and determining the sentiments for these aspects. In this paper, we propose a novel unsupervised and domain-independent model for detecting explicit and implicit aspects in reviews for sentiment analysis. In the model, first a generalized method is proposed to learn multi-word aspects and then a set of heuristic rules is employed to take into account the influence of an opinion word on detecting the aspect. Second a new metric based on mutual information and aspect frequency is proposed to score aspects with a new bootstrapping iterative algorithm. The presented bootstrapping algorithm works with an unsupervised seed set. Third, two pruning methods based on the relations between aspects in reviews are presented to remove incorrect aspects. Finally the model employs an approach which uses explicit aspects and opinion words to identify implicit aspects. Utilizing extracted polarity lexicon, the approach maps each opinion word in the lexicon to the set of pre-extracted explicit aspects with a co-occurrence metric. The proposed model was evaluated on a collection of English product review datasets. The model does not require any labeled training data and it can be easily applied to other languages or other domains such as movie reviews. Experimental results show considerable improvements of our model over conventional unsupervised and supervised approaches.",
keywords = "EWI-23618, IR-87217, Sentiment Analysis, aspect detection",
author = "Ayoub Bagheri and Mohamad Saraee and {de Jong}, Franciska",
year = "2013",
month = "11",
doi = "10.1016/j.knosys.2013.08.011",
language = "English",
volume = "52",
pages = "201--213",
journal = "Knowledge-based systems",
issn = "0950-7051",
publisher = "Elsevier",
number = "November",

}

Care more about customers : unsupervised domain-independent aspect detection for sentiment analysis of customer reviews. / Bagheri, Ayoub; Saraee, Mohamad; de Jong, Franciska.

In: Knowledge-based systems, Vol. 52, No. November, 11.2013, p. 201-213.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Care more about customers

T2 - unsupervised domain-independent aspect detection for sentiment analysis of customer reviews

AU - Bagheri, Ayoub

AU - Saraee, Mohamad

AU - de Jong, Franciska

PY - 2013/11

Y1 - 2013/11

N2 - With the rapid growth of user-generated content on the internet, automatic sentiment analysis of online customer reviews has become a hot research topic recently, but due to variety and wide range of products and services being reviewed on the internet, the supervised and domain-specific models are often not practical. As the number of reviews expands, it is essential to develop an efficient sentiment analysis model that is capable of extracting product aspects and determining the sentiments for these aspects. In this paper, we propose a novel unsupervised and domain-independent model for detecting explicit and implicit aspects in reviews for sentiment analysis. In the model, first a generalized method is proposed to learn multi-word aspects and then a set of heuristic rules is employed to take into account the influence of an opinion word on detecting the aspect. Second a new metric based on mutual information and aspect frequency is proposed to score aspects with a new bootstrapping iterative algorithm. The presented bootstrapping algorithm works with an unsupervised seed set. Third, two pruning methods based on the relations between aspects in reviews are presented to remove incorrect aspects. Finally the model employs an approach which uses explicit aspects and opinion words to identify implicit aspects. Utilizing extracted polarity lexicon, the approach maps each opinion word in the lexicon to the set of pre-extracted explicit aspects with a co-occurrence metric. The proposed model was evaluated on a collection of English product review datasets. The model does not require any labeled training data and it can be easily applied to other languages or other domains such as movie reviews. Experimental results show considerable improvements of our model over conventional unsupervised and supervised approaches.

AB - With the rapid growth of user-generated content on the internet, automatic sentiment analysis of online customer reviews has become a hot research topic recently, but due to variety and wide range of products and services being reviewed on the internet, the supervised and domain-specific models are often not practical. As the number of reviews expands, it is essential to develop an efficient sentiment analysis model that is capable of extracting product aspects and determining the sentiments for these aspects. In this paper, we propose a novel unsupervised and domain-independent model for detecting explicit and implicit aspects in reviews for sentiment analysis. In the model, first a generalized method is proposed to learn multi-word aspects and then a set of heuristic rules is employed to take into account the influence of an opinion word on detecting the aspect. Second a new metric based on mutual information and aspect frequency is proposed to score aspects with a new bootstrapping iterative algorithm. The presented bootstrapping algorithm works with an unsupervised seed set. Third, two pruning methods based on the relations between aspects in reviews are presented to remove incorrect aspects. Finally the model employs an approach which uses explicit aspects and opinion words to identify implicit aspects. Utilizing extracted polarity lexicon, the approach maps each opinion word in the lexicon to the set of pre-extracted explicit aspects with a co-occurrence metric. The proposed model was evaluated on a collection of English product review datasets. The model does not require any labeled training data and it can be easily applied to other languages or other domains such as movie reviews. Experimental results show considerable improvements of our model over conventional unsupervised and supervised approaches.

KW - EWI-23618

KW - IR-87217

KW - Sentiment Analysis

KW - aspect detection

U2 - 10.1016/j.knosys.2013.08.011

DO - 10.1016/j.knosys.2013.08.011

M3 - Article

VL - 52

SP - 201

EP - 213

JO - Knowledge-based systems

JF - Knowledge-based systems

SN - 0950-7051

IS - November

ER -