Skip to main navigation Skip to search Skip to main content

Voice Privacy in Speech Systems: A Comparative Study of Pitch Shifting and StarGAN-VC

  • Mehmet Arif Taşlı
  • , Eren Akyürek
  • , Funda Yıldırım*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Voice recognition systems facilitate naturalistic human−com-puter interaction. However, spoken input may inherently expose sensitive acoustic features that can threaten user privacy. In particular, raw spoken language data can reveal paralinguistic information such as emotional state, health condition, and speaker identity, which poses a significant privacy risk when the speaker’s voice is recognizable, especially within identifiable communities or groups. This study aims to investigate the preservation of acoustic privacy by evaluating two voice transformation techniques: traditional pitch shifting and the StarGAN-VC deep generative model [3], in terms of their effectiveness in obfuscating speaker identity while preserving lexical intelligibility. We measure their performance along two dimensions: lexical accuracy, assessed via an automatic speech recognition (ASR) application programming interface (API), and speaker identifiability, evaluated through subjective human listener studies. Our results show that although both methods degrade ASR performance, StarGAN-VC offers significantly greater privacy protection among individuals within the same social circle, by reducing speaker recognizability with minimal impact on lexical intelligibility. These findings highlight deep generative voice conversion models as viable tools for privacy-preserving solutions in voice-enabled technologies.

Original languageEnglish
Title of host publicationSensor-Based Activity Recognition and Artificial Intelligence
Subtitle of host publication10th International Workshop, iWOAR 2025, Enschede, The Netherlands, September 18–19, 2025, Proceedings
EditorsÖzlem Durmaz Incel, Jingwen Qin, Gerald Bieber, Arjan Kuijper
Place of PublicationCham
PublisherSpringer
Pages422-429
Number of pages8
ISBN (Electronic)978-3-032-13312-0
ISBN (Print)978-3-032-13311-3
DOIs
Publication statusPublished - 2 Jan 2026
Event10th international Workshop on Sensor-Based Activity Recognition and Artificial Intelligence, iWOAR 2025 - University of Twente, Enschede, Netherlands
Duration: 18 Sept 202519 Sept 2025
Conference number: 10
https://iwoar.org/2025/index.html
https://iwoar.org/2025/cfp.html

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume16292
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Workshop

Workshop10th international Workshop on Sensor-Based Activity Recognition and Artificial Intelligence, iWOAR 2025
Abbreviated titleiWOAR 2025
Country/TerritoryNetherlands
CityEnschede
Period18/09/2519/09/25
Internet address

Keywords

  • 2026 OA procedure
  • Pitch shifting
  • Speaker identity obfuscation
  • Voice conversion
  • Voice privacy
  • Automatic speech recognition

Fingerprint

Dive into the research topics of 'Voice Privacy in Speech Systems: A Comparative Study of Pitch Shifting and StarGAN-VC'. Together they form a unique fingerprint.

Cite this