Abstract
Gender bias in natural language is pervasive, but easily overlooked. Current research mostly focuses on using statistical methods to uncover patterns of gender bias in textual corpora. In order to study gender bias in a more controlled manner, we propose to build a parallel corpus in which gender and other characteristics of the characters in the same story switch between their opposite alternatives. In this paper, we present a two-step fiction rewriting model to automatically construct such a parallel corpus at scale. In the first step, we paraphrase the original text, i.e., the same storyline is expressed differently, in order to ensure linguistic diversity in the corpus. In the second step, we replace the gender of the characters with their opposites and modify their characteristics by either using synonyms or antonyms. We evaluate our fiction rewriting model by checking the readability of the rewritten texts and measuring readers’ acceptance in a user study. Results show that rewriting with antonyms and synonyms barely changes the original readability level; and human readers perceive synonymously rewritten texts mostly reasonable. Antonymously rewritten texts were perceived less reasonable in the user study and a post-hoc evaluation indicates that this might be mostly due to grammar and spelling issues introduced by the rewriting. Hence, our proposed approach allows the automated generation of a synonymous parallel corpus to study bias in a controlled way, but needs improvement for antonymous rewritten texts.
Original language | English |
---|---|
Title of host publication | Text, Speech, and Dialogue |
Subtitle of host publication | 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6-9, 2021 Proceedings |
Editors | Kamil Ekštein, František Pártl, Miloslav Konopík |
Place of Publication | Springer, Cham |
Publisher | Springer |
Pages | 73-85 |
Number of pages | 13 |
ISBN (Electronic) | 978-3-030-83527-9 |
ISBN (Print) | 978-3-030-83526-2 |
DOIs | |
Publication status | Published - 30 Aug 2021 |
Event | 24th International Conference on Text, Speech, & Dialogue, TSD 2021 - Olomouc, Czech Republic Duration: 6 Sept 2021 → 9 Sept 2021 Conference number: 24 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 12848 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 24th International Conference on Text, Speech, & Dialogue, TSD 2021 |
---|---|
Abbreviated title | TSD |
Country/Territory | Czech Republic |
City | Olomouc |
Period | 6/09/21 → 9/09/21 |
Keywords
- 2024 OA procedure