A CLARIN transcription portal for interview data

Christoph Draxler, Henk van den Heuvel, Arjan van Hessen, Silvia Calamai, Louise Corti, Stefania Scagliola

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

8 Citations (Scopus)
6 Downloads (Pure)

Abstract

In this paper we present a first version of a transcription portal for audio files based on automatic speech recognition (ASR) in various languages. The portal is implemented in the CLARIN resources research network and intended for use by non-technical scholars. We explain the background and interdisciplinary nature of interview data, the perks and quirks of using ASR for transcribing the audio in a research context, the dos and don'ts for optimal use of the portal, and future developments foreseen. The portal is promoted in a range of workshops, but there are a number of challenges that have to be met. These challenges concern privacy issues, ASR quality, and cost, amongst others.

Original languageEnglish
Title of host publicationLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
PublisherEuropean Language Resources Association (ELRA)
Pages3353-3359
Number of pages7
ISBN (Electronic)9791095546344
Publication statusPublished - May 2020
Event12th International Conference on Language Resources and Evaluation, LREC 2020 - Marseille, France
Duration: 11 May 202016 May 2020
Conference number: 12

Conference

Conference12th International Conference on Language Resources and Evaluation, LREC 2020
Abbreviated titleLREC 2020
Country/TerritoryFrance
CityMarseille
Period11/05/2016/05/20

Keywords

  • Automatic speech recognition
  • Digital humanities
  • Interviews
  • Research infrastructure
  • Social sciences

Fingerprint

Dive into the research topics of 'A CLARIN transcription portal for interview data'. Together they form a unique fingerprint.

Cite this