Abstract
Voice recognition systems facilitate naturalistic human−com-puter interaction. However, spoken input may inherently expose sensitive acoustic features that can threaten user privacy. In particular, raw spoken language data can reveal paralinguistic information such as emotional state, health condition, and speaker identity, which poses a significant privacy risk when the speaker’s voice is recognizable, especially within identifiable communities or groups. This study aims to investigate the preservation of acoustic privacy by evaluating two voice transformation techniques: traditional pitch shifting and the StarGAN-VC deep generative model [3], in terms of their effectiveness in obfuscating speaker identity while preserving lexical intelligibility. We measure their performance along two dimensions: lexical accuracy, assessed via an automatic speech recognition (ASR) application programming interface (API), and speaker identifiability, evaluated through subjective human listener studies. Our results show that although both methods degrade ASR performance, StarGAN-VC offers significantly greater privacy protection among individuals within the same social circle, by reducing speaker recognizability with minimal impact on lexical intelligibility. These findings highlight deep generative voice conversion models as viable tools for privacy-preserving solutions in voice-enabled technologies.
| Original language | English |
|---|---|
| Title of host publication | Sensor-Based Activity Recognition and Artificial Intelligence |
| Subtitle of host publication | 10th International Workshop, iWOAR 2025, Enschede, The Netherlands, September 18–19, 2025, Proceedings |
| Editors | Özlem Durmaz Incel, Jingwen Qin, Gerald Bieber, Arjan Kuijper |
| Place of Publication | Cham |
| Publisher | Springer |
| Pages | 422-429 |
| Number of pages | 8 |
| ISBN (Electronic) | 978-3-032-13312-0 |
| ISBN (Print) | 978-3-032-13311-3 |
| DOIs | |
| Publication status | Published - 2 Jan 2026 |
| Event | 10th international Workshop on Sensor-Based Activity Recognition and Artificial Intelligence, iWOAR 2025 - University of Twente, Enschede, Netherlands Duration: 18 Sept 2025 → 19 Sept 2025 Conference number: 10 https://iwoar.org/2025/index.html https://iwoar.org/2025/cfp.html |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Publisher | Springer |
| Volume | 16292 |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Workshop
| Workshop | 10th international Workshop on Sensor-Based Activity Recognition and Artificial Intelligence, iWOAR 2025 |
|---|---|
| Abbreviated title | iWOAR 2025 |
| Country/Territory | Netherlands |
| City | Enschede |
| Period | 18/09/25 → 19/09/25 |
| Internet address |
Keywords
- 2026 OA procedure
- Pitch shifting
- Speaker identity obfuscation
- Voice conversion
- Voice privacy
- Automatic speech recognition
Fingerprint
Dive into the research topics of 'Voice Privacy in Speech Systems: A Comparative Study of Pitch Shifting and StarGAN-VC'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver