Clause Topic Classification in German and English Standard Form Contracts

Daniel Braun, Florian Matthes

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
10 Downloads (Pure)


So-called standard form contracts, i.e. contracts that are drafted unilaterally by one party, like terms and conditions of online shops or terms of services of social networks, are cornerstones of our modern economy. Their processing is, therefore, of significant practical value. Often, the sheer size of these contracts allows the drafting party to hide unfavourable terms from the other party. In this paper, we compare different approaches for automatically classifying the topics of clauses in standard form contracts, based on a data-set of more than 6,000 clauses from more than 170 contracts, which we collected from German and English online shops and annotated based on a taxonomy of clause topics, that we developed together with legal experts. We will show that, in our comparison of seven approaches, from simple keyword matching to transformer language models, BERT performed best with an F1-score of up to 0.91, however much simpler and computationally cheaper models like logistic regression also achieved similarly good results of up to 0.87.
Original languageEnglish
Title of host publicationProceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)
EditorsShervin Malmasi, Oleg Rokhlenko, Nicola Ueffing, Ido Guy, Eugene Agichtein, Surya Kallumadi
Place of PublicationDublin, Ireland
PublisherAssociation for Computational Linguistics (ACL)
Number of pages11
ISBN (Electronic)978-1-955917-35-3
Publication statusPublished - 1 May 2022
EventThe 5th Workshop on e-Commerce and NLP, ECNLP 2022 - Dublin, Ireland
Duration: 26 May 202226 May 2022


WorkshopThe 5th Workshop on e-Commerce and NLP, ECNLP 2022
Abbreviated titleECNLP 2022

Cite this