Who spoke when? Audio-based speaker location estimation for diarization

M. Dadvar

Research output: Book/ReportBookAcademic

Abstract

Speaker diarization is the process which detects active speakers and groups those speech signals which has been uttered by the same speaker. Generally we can find two main applications for speaker diarization. Automatic Speech Recognition systems make use of the speaker homogeneous clusters to adapt the acoustic models to be speaker dependent and therefore increase recognition performance. Speaker indexing and rich transcription systems also use the speaker diarization output as one of information extracted from a recording, which allow its automatic indexation and other further processing. In this study a speaker diarization application is developed – using multiparty binaural speech recordings – to track speaker activity based on interaural time difference (ITD) cues. These cues, for a given speech signal frame, are computed using gammatone filtering and cross-correlation technique. Their values are used to determine which speaker in the recording produce the considered speech fragment. This study has been supervised by Dr. Jon Barker, and defended to fulfill the requirements for the degree of Master in Advanced Computer Science, University of Sheffield, United Kingdom, 20
Original languageUndefined
Place of PublicationUK
PublisherLAP LAMBERT Academic Publishing
Number of pages54
ISBN (Print)978-3-8443-8628-8
Publication statusPublished - 1 Jul 2011

Publication series

Name
PublisherLAP Lambert Academic Publishing

Keywords

  • IR-77743
  • METIS-277720
  • EWI-20342

Cite this

Dadvar, M. (2011). Who spoke when? Audio-based speaker location estimation for diarization. UK: LAP LAMBERT Academic Publishing.
Dadvar, M. / Who spoke when? Audio-based speaker location estimation for diarization. UK : LAP LAMBERT Academic Publishing, 2011. 54 p.
@book{17a49a8fb79144c9869a65acd28b0f60,
title = "Who spoke when? Audio-based speaker location estimation for diarization",
abstract = "Speaker diarization is the process which detects active speakers and groups those speech signals which has been uttered by the same speaker. Generally we can find two main applications for speaker diarization. Automatic Speech Recognition systems make use of the speaker homogeneous clusters to adapt the acoustic models to be speaker dependent and therefore increase recognition performance. Speaker indexing and rich transcription systems also use the speaker diarization output as one of information extracted from a recording, which allow its automatic indexation and other further processing. In this study a speaker diarization application is developed – using multiparty binaural speech recordings – to track speaker activity based on interaural time difference (ITD) cues. These cues, for a given speech signal frame, are computed using gammatone filtering and cross-correlation technique. Their values are used to determine which speaker in the recording produce the considered speech fragment. This study has been supervised by Dr. Jon Barker, and defended to fulfill the requirements for the degree of Master in Advanced Computer Science, University of Sheffield, United Kingdom, 20",
keywords = "IR-77743, METIS-277720, EWI-20342",
author = "M. Dadvar",
year = "2011",
month = "7",
day = "1",
language = "Undefined",
isbn = "978-3-8443-8628-8",
publisher = "LAP LAMBERT Academic Publishing",

}

Dadvar, M 2011, Who spoke when? Audio-based speaker location estimation for diarization. LAP LAMBERT Academic Publishing, UK.

Who spoke when? Audio-based speaker location estimation for diarization. / Dadvar, M.

UK : LAP LAMBERT Academic Publishing, 2011. 54 p.

Research output: Book/ReportBookAcademic

TY - BOOK

T1 - Who spoke when? Audio-based speaker location estimation for diarization

AU - Dadvar, M.

PY - 2011/7/1

Y1 - 2011/7/1

N2 - Speaker diarization is the process which detects active speakers and groups those speech signals which has been uttered by the same speaker. Generally we can find two main applications for speaker diarization. Automatic Speech Recognition systems make use of the speaker homogeneous clusters to adapt the acoustic models to be speaker dependent and therefore increase recognition performance. Speaker indexing and rich transcription systems also use the speaker diarization output as one of information extracted from a recording, which allow its automatic indexation and other further processing. In this study a speaker diarization application is developed – using multiparty binaural speech recordings – to track speaker activity based on interaural time difference (ITD) cues. These cues, for a given speech signal frame, are computed using gammatone filtering and cross-correlation technique. Their values are used to determine which speaker in the recording produce the considered speech fragment. This study has been supervised by Dr. Jon Barker, and defended to fulfill the requirements for the degree of Master in Advanced Computer Science, University of Sheffield, United Kingdom, 20

AB - Speaker diarization is the process which detects active speakers and groups those speech signals which has been uttered by the same speaker. Generally we can find two main applications for speaker diarization. Automatic Speech Recognition systems make use of the speaker homogeneous clusters to adapt the acoustic models to be speaker dependent and therefore increase recognition performance. Speaker indexing and rich transcription systems also use the speaker diarization output as one of information extracted from a recording, which allow its automatic indexation and other further processing. In this study a speaker diarization application is developed – using multiparty binaural speech recordings – to track speaker activity based on interaural time difference (ITD) cues. These cues, for a given speech signal frame, are computed using gammatone filtering and cross-correlation technique. Their values are used to determine which speaker in the recording produce the considered speech fragment. This study has been supervised by Dr. Jon Barker, and defended to fulfill the requirements for the degree of Master in Advanced Computer Science, University of Sheffield, United Kingdom, 20

KW - IR-77743

KW - METIS-277720

KW - EWI-20342

M3 - Book

SN - 978-3-8443-8628-8

BT - Who spoke when? Audio-based speaker location estimation for diarization

PB - LAP LAMBERT Academic Publishing

CY - UK

ER -

Dadvar M. Who spoke when? Audio-based speaker location estimation for diarization. UK: LAP LAMBERT Academic Publishing, 2011. 54 p.