Automated Semantic Analysis, Legal Assessment, and Summarization of Standard Form Contracts

Research output: ThesisPhD Thesis - Research external, graduation external

1383 Downloads (Pure)


Consumers are confronted with standard form contracts on a daily basis, for example, when shopping online, registering for online platforms, or opening bank accounts. With expected revenue of more than 343 billion Euro in 2020, e-commerce is an ever more important branch of the European economy. Accepting standard form contracts often is a prerequisite to access products or services, and consumers frequently do so without reading, let alone understanding, them. Consumer protection organizations can advise and represent consumers in such situations of power imbalance. However, with increasing demand, limited budgets, and ever more complex regulations, they struggle to provide the necessary support.

This thesis investigates techniques for the automated semantic analysis, legal assessment, and summarization of standard form contracts in German and English, which can be used to support consumers and those who protect them. We focus on Terms and Conditions from the fast growing market of European e-commerce, but also show that the developed techniques can in parts be applied to other types of standard form contracts.

We elicited requirements from consumers and consumer advocates to understand their needs, identified the most relevant clause topics, and analyzed the processes in consumer protection organizations concerning the handling of standard form contracts. Based on these insights, a pipeline for the automated semantic analysis, legal assessment, and summarization of standard form contracts was developed. The components of this pipeline can automatically identify and extract standard form contracts from the internet and hierarchically structure them into their individual clauses. Clause topics can be automatically identified, and relevant information can be extracted. Clauses can then be legally assessed, either using a knowledge-base we constructed or through binary classification by a transformer model. This information is then used to create
summaries that are tailored to the needs of the different user groups. For each step of the pipeline, different approaches were developed and compared, from classical rule-based systems to deep learning techniques. Each approach was evaluated on German and English corpora containing more than 10,000 clauses, which were annotated as part of this thesis. The developed pipeline was prototypically implemented as part of a web-based tool to support consumer advocates in analyzing and assessing standard form contracts. The implementation was evaluated with experts from two German consumer protection organizations with questionnaires and task-based evaluations.

The results of the evaluation show that our system can identify over 50 different types of clauses, which cover more than 90% of the clauses typically occurring in Terms and Conditions from online shops, with an accuracy of 0.80 to 0.84. The system can also automatically extract 21 relevant data points from these clauses with a precision of 0.91 and a recall of 0.86. On a corpus of more than 200 German clauses, the system was also able to assess the legality of clauses with an accuracy of 0.90. The expert evaluation has shown that the system is indeed able to support consumer advocates in their daily work by reducing the time they need to analyze and assess clauses in standard form contracts.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Technische Universitat Munchen
  • Matthes, Florian, Supervisor, External person
  • Schäfer, Burkhard, Supervisor, External person
Award date25 May 2021
Publication statusPublished - 2021

Cite this