Automatic Detection of Terms and Conditions in German and English Online Shops

Daniel Braun, Florian Matthes

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)

Abstract

Terms and Conditions in online shops are arguably among the most important (or at least the most widely used) forms of consumer contracts. At the same time, they are probably among the least read documents. Thus, their automated analysis is of great interest, not just for research, but also from a consumer protection perspective. To be able to automatically process large amounts of Terms and Conditions and build the corpora which are necessary to train data-driven systems, we need means to identify Terms and Conditions automatically. In this paper, we present and evaluate four different approaches to the automatic detection of Terms and Conditions pages in German and English online shops. We treat the problem as a binary document classification problem for web-pages and report an approach which achieves precision, recall, and F1-score above 0.9 in German and close to 0.9 in English, by analysing the URL of the page.
Original languageEnglish
Title of host publicationProceedings of the 16th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST 2020, Budapest, Hungary, November 3-5, 2020
EditorsMassimo Marchiori, Francisco Dominguez Mayo, Joaquim Filipe
Place of PublicationSetúbal, Portugal
PublisherSCITEPRESS
Pages233-237
Number of pages5
ISBN (Print)978-989-758-478-7
DOIs
Publication statusPublished - 16 Nov 2020
Externally publishedYes

Cite this