Identifying unclear questions in community question answering websites

Jan Trienes*, Krisztian Balog

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

17 Citations (Scopus)
17 Downloads (Pure)


Thousands of complex natural language questions are submitted to community question answering websites on a daily basis, rendering them as one of the most important information sources these days. However, oftentimes submitted questions are unclear and cannot be answered without further clarification questions by expert community members. This study is the first to investigate the complex task of classifying a question as clear or unclear, i.e., if it requires further clarification. We construct a novel dataset and propose a classification approach that is based on the notion of similar questions. This approach is compared to state-of-the-art text classification baselines. Our main finding is that the similar questions approach is a viable alternative that can be used as a stepping stone towards the development of supportive user interfaces for question formulation.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings
EditorsBenno Stein, Philipp Mayr, Leif Azzopardi, Djoerd Hiemstra, Norbert Fuhr, Claudia Hauff
Number of pages14
ISBN (Print)9783030157111
Publication statusPublished - 7 Apr 2019
Event41st European Conference on Information Retrieval, ECIR 2019 - Cologne, Germany
Duration: 14 Apr 201918 Apr 2019
Conference number: 41

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11437 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference41st European Conference on Information Retrieval, ECIR 2019
Abbreviated titleECIR 2019
Internet address


Dive into the research topics of 'Identifying unclear questions in community question answering websites'. Together they form a unique fingerprint.

Cite this