Terms and Conditions in online shops are arguably among the most important (or at least the most widely used) forms of consumer contracts. At the same time, they are probably among the least read documents. Thus, their automated analysis is of great interest, not just for research, but also from a consumer protection perspective. To be able to automatically process large amounts of Terms and Conditions and build the corpora which are necessary to train data-driven systems, we need means to identify Terms and Conditions automatically. In this paper, we present and evaluate four different approaches to the automatic detection of Terms and Conditions pages in German and English online shops. We treat the problem as a binary document classification problem for web-pages and report an approach which achieves precision, recall, and F1-score above 0.9 in German and close to 0.9 in English, by analysing the URL of the page.
|Title of host publication||Proceedings of the 16th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST 2020, Budapest, Hungary, November 3-5, 2020|
|Editors||Massimo Marchiori, Francisco Dominguez Mayo, Joaquim Filipe|
|Place of Publication||Setúbal, Portugal|
|Number of pages||5|
|Publication status||Published - 16 Nov 2020|