DescriptionLike most information retrieval methods, learning-to-rank methods are evaluated on benchmark datasets, such as the many datasets provided by Microsoft and the datasets provided by Yahoo and Yandex. Many of the learning-to-rank datasets offer feature set representations of the to-be-ranked documents instead of the documents themselves. Therefore, any difference in ranking performance is due to the ranking algorithm and not the features used. This opens up a unique opportunity for cross-benchmark comparison of learning-to-rank methods.
In this talk, I propose a way to compare learning to rank methods based on a sparse set of evaluation results on many benchmark datasets. Our comparison methodology consists of two components: (1) the Normalized Winning Number, a measure that gives insight in the ranking accuracy of the learning to rank method, and (2) the Ideal Winning Number, which gives insight in the degree of certainty concerning the ranking accuracy.
Evaluation results of 87 learning-to-rank methods on 20 well-known benchmark datasets are collected. I report on the best performing methods by Normalized Winning Number and Ideal Winner Number and suggest what methods need more research to make our analysis more robust. Finally, we test the robustness of our results by comparing the results to situations where one of the datasets is not included in the analysis.
|Period||1 Oct 2017|
|Event title||1st International Workshop on LEARning Next gEneration Rankers, LEARNER 2017|
|Degree of Recognition||International|
- International Workshop