Luhn Revisited: Significant Words Language Models

M. Dehghani, H. Azarbonyad, J. Kamps, Djoerd Hiemstra, M. Marx

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

17 Citations (Scopus)
73 Downloads (Pure)

Abstract

Users tend to articulate their complex information needs in only a few keywords, making underspecified statements of request the main bottleneck for retrieval effectiveness. Taking advantage of feedback information is one of the best ways to enrich the query representation, but can also lead to loss of query focus and harm performance - in particular when the initial query retrieves only little relevant information - when overfitting to accidental features of the particular observed feedback documents. Inspired by the early work of Hans Peter Luhn, we propose significant words language models of feedback documents that capture all, and only, the significant shared terms from feedback documents. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of rare terms that are only explained by specific feedback documents, which eventually results in having only the significant terms left in the feedback model. Establishing a set of 'Significant Words' Our main contributions are the following. First, we present significant words language models as the effective models capturing the essential terms and their probabilities. Second, we apply the resulting models to the relevance feedback task, and see a better performance over the state-of-the-art methods. Third, we see that the estimation method is remarkably robust making the models insensitive to noisy non-relevant terms in feedback documents. Our general observation is that the significant words language models more accurately capture relevance by excluding general terms and feedback document specific terms.
Original languageUndefined
Title of host publicationProceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016)
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages1301-1310
Number of pages10
ISBN (Print)978-1-4503-4073-1
DOIs
Publication statusPublished - Oct 2016
Event25th ACM International on Conference on Information and Knowledge Management, CIKM 2016 - Indianapolis, United States
Duration: 24 Oct 201628 Oct 2016
Conference number: 25

Conference

Conference25th ACM International on Conference on Information and Knowledge Management, CIKM 2016
Abbreviated titleCIKM
CountryUnited States
CityIndianapolis
Period24/10/1628/10/16

Keywords

  • EWI-27808

Cite this

Dehghani, M., Azarbonyad, H., Kamps, J., Hiemstra, D., & Marx, M. (2016). Luhn Revisited: Significant Words Language Models. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016) (pp. 1301-1310). New York: Association for Computing Machinery (ACM). https://doi.org/10.1145/2983323.2983814
Dehghani, M. ; Azarbonyad, H. ; Kamps, J. ; Hiemstra, Djoerd ; Marx, M. / Luhn Revisited: Significant Words Language Models. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016). New York : Association for Computing Machinery (ACM), 2016. pp. 1301-1310
@inproceedings{53636b1d23244b7d85a1c8d7db0b561e,
title = "Luhn Revisited: Significant Words Language Models",
abstract = "Users tend to articulate their complex information needs in only a few keywords, making underspecified statements of request the main bottleneck for retrieval effectiveness. Taking advantage of feedback information is one of the best ways to enrich the query representation, but can also lead to loss of query focus and harm performance - in particular when the initial query retrieves only little relevant information - when overfitting to accidental features of the particular observed feedback documents. Inspired by the early work of Hans Peter Luhn, we propose significant words language models of feedback documents that capture all, and only, the significant shared terms from feedback documents. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of rare terms that are only explained by specific feedback documents, which eventually results in having only the significant terms left in the feedback model. Establishing a set of 'Significant Words' Our main contributions are the following. First, we present significant words language models as the effective models capturing the essential terms and their probabilities. Second, we apply the resulting models to the relevance feedback task, and see a better performance over the state-of-the-art methods. Third, we see that the estimation method is remarkably robust making the models insensitive to noisy non-relevant terms in feedback documents. Our general observation is that the significant words language models more accurately capture relevance by excluding general terms and feedback document specific terms.",
keywords = "EWI-27808",
author = "M. Dehghani and H. Azarbonyad and J. Kamps and Djoerd Hiemstra and M. Marx",
year = "2016",
month = "10",
doi = "10.1145/2983323.2983814",
language = "Undefined",
isbn = "978-1-4503-4073-1",
pages = "1301--1310",
booktitle = "Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016)",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",

}

Dehghani, M, Azarbonyad, H, Kamps, J, Hiemstra, D & Marx, M 2016, Luhn Revisited: Significant Words Language Models. in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016). Association for Computing Machinery (ACM), New York, pp. 1301-1310, 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, United States, 24/10/16. https://doi.org/10.1145/2983323.2983814

Luhn Revisited: Significant Words Language Models. / Dehghani, M.; Azarbonyad, H.; Kamps, J.; Hiemstra, Djoerd; Marx, M.

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016). New York : Association for Computing Machinery (ACM), 2016. p. 1301-1310.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Luhn Revisited: Significant Words Language Models

AU - Dehghani, M.

AU - Azarbonyad, H.

AU - Kamps, J.

AU - Hiemstra, Djoerd

AU - Marx, M.

PY - 2016/10

Y1 - 2016/10

N2 - Users tend to articulate their complex information needs in only a few keywords, making underspecified statements of request the main bottleneck for retrieval effectiveness. Taking advantage of feedback information is one of the best ways to enrich the query representation, but can also lead to loss of query focus and harm performance - in particular when the initial query retrieves only little relevant information - when overfitting to accidental features of the particular observed feedback documents. Inspired by the early work of Hans Peter Luhn, we propose significant words language models of feedback documents that capture all, and only, the significant shared terms from feedback documents. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of rare terms that are only explained by specific feedback documents, which eventually results in having only the significant terms left in the feedback model. Establishing a set of 'Significant Words' Our main contributions are the following. First, we present significant words language models as the effective models capturing the essential terms and their probabilities. Second, we apply the resulting models to the relevance feedback task, and see a better performance over the state-of-the-art methods. Third, we see that the estimation method is remarkably robust making the models insensitive to noisy non-relevant terms in feedback documents. Our general observation is that the significant words language models more accurately capture relevance by excluding general terms and feedback document specific terms.

AB - Users tend to articulate their complex information needs in only a few keywords, making underspecified statements of request the main bottleneck for retrieval effectiveness. Taking advantage of feedback information is one of the best ways to enrich the query representation, but can also lead to loss of query focus and harm performance - in particular when the initial query retrieves only little relevant information - when overfitting to accidental features of the particular observed feedback documents. Inspired by the early work of Hans Peter Luhn, we propose significant words language models of feedback documents that capture all, and only, the significant shared terms from feedback documents. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of rare terms that are only explained by specific feedback documents, which eventually results in having only the significant terms left in the feedback model. Establishing a set of 'Significant Words' Our main contributions are the following. First, we present significant words language models as the effective models capturing the essential terms and their probabilities. Second, we apply the resulting models to the relevance feedback task, and see a better performance over the state-of-the-art methods. Third, we see that the estimation method is remarkably robust making the models insensitive to noisy non-relevant terms in feedback documents. Our general observation is that the significant words language models more accurately capture relevance by excluding general terms and feedback document specific terms.

KW - EWI-27808

U2 - 10.1145/2983323.2983814

DO - 10.1145/2983323.2983814

M3 - Conference contribution

SN - 978-1-4503-4073-1

SP - 1301

EP - 1310

BT - Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016)

PB - Association for Computing Machinery (ACM)

CY - New York

ER -

Dehghani M, Azarbonyad H, Kamps J, Hiemstra D, Marx M. Luhn Revisited: Significant Words Language Models. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016). New York: Association for Computing Machinery (ACM). 2016. p. 1301-1310 https://doi.org/10.1145/2983323.2983814