Towards a Feature-Rich Data Set for Personalized Access to Long-Tail Content

Christin Seifert, Jörg Schlötterer, Michael Granitzer

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

4 Citations (Scopus)
17 Downloads (Pure)


Personalized data access has become one of the core challenges for intelligent information access, especially for nonmainstream long-tail content, as can be found in digital libraries. One of the main reasons that personalization remains a difficult task is the lack of standardized test corpora. In this paper we provide a comprehensive analysis of feature requirements for personalization together with a data collection tool for generating user models and collecting data for personalization of search and recommender system optimization in the long-tail. Based on the feature analysis, we provide a feature-rich publicly available data set, covering web content consumption and creation tasks. Our data set contains user models for eight users, including performed tasks, relevant topics for each task, relevance ratings, and relations between focus text and search queries. Altogether, the data set consists of 217 tasks, 4562 queries and over 15.000 ratings. On this data we perform automatic query prediction from web page content, achieving an accuracy of 89% using term identity, capitalization and part-of-speech tags as features. The results of the feature analysis can serve as guideline for feature collection for long-tail content personalization, and the provided data set as a gold standard for learning and evaluation of user models as well as for optimizing recommender or search engines for long-tail domains.
Original languageEnglish
Title of host publicationSAC'15
Subtitle of host publicationProceedings of the 30th Annual ACM Symposium on Applied Computing
Place of PublicationNew York, NY, USA
PublisherACM Press
ISBN (Print)978-1-4503-3196-8
Publication statusPublished - 1 Apr 2015
Externally publishedYes
Event30th Annual ACM Symposium on Applied Computing, SAC 2015 - Salamanca, Spain
Duration: 13 Apr 201517 Apr 2015
Conference number: 30


Conference30th Annual ACM Symposium on Applied Computing, SAC 2015
Abbreviated titleSAC


  • n/a OA procedure


Dive into the research topics of 'Towards a Feature-Rich Data Set for Personalized Access to Long-Tail Content'. Together they form a unique fingerprint.

Cite this