Distillation vs. Sampling for Efficient Training of Learning to Rank Models

Pooya Khandel, Andrew Yates, Ana Lucia Varbanescu, Maarten De Rijke, Andy Pimentel

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

5 Downloads (Pure)

Abstract

In real-world search settings, learning to rank (LtR) models are trained and tuned repeatedly using large amounts of data, thus consuming significant time and computing resources, and raising efficiency and sustainability concerns. One way to address these concerns is to reduce the size of training datasets. Dataset sampling and distillation are two classes of method introduced to enable a significant reduction in dataset size, while achieving comparable performance to training with complete data. In this work, we perform a comparative analysis of dataset distillation and sampling methods in the context of LtR. We evaluate gradient matching and distribution matching dataset distillation approaches - shown to be effective in computer vision - and show how these algorithms can be adjusted for the LtR task. Our empirical analysis, using three LtR datasets, indicates that, in contrast to previous studies in computer vision, the selected distillation methods do not outperform random sampling. Our code and experimental settings are released alongside the paper.

Original languageEnglish
Title of host publicationICTIR 2024 - Proceedings of the 2024 ACM SIGIR International Conference on the Theory of Information Retrieval
PublisherAssociation for Computing Machinery
Pages51-60
Number of pages10
ISBN (Electronic)9798400706813
DOIs
Publication statusPublished - 5 Aug 2024
Event10th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2024 - Washington, United States
Duration: 13 Jul 202413 Jul 2024
Conference number: 10

Conference

Conference10th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2024
Abbreviated titleICTIR 2024
Country/TerritoryUnited States
CityWashington
Period13/07/2413/07/24
OtherCo-located with ACM SIGIR 2024

Keywords

  • dataset distillation
  • learning-to-rank
  • sampling

Fingerprint

Dive into the research topics of 'Distillation vs. Sampling for Efficient Training of Learning to Rank Models'. Together they form a unique fingerprint.

Cite this