Abstract
Podcast are a growing spoken medium that is listened to for various reasons in various situations (e.g., for entertainment or
educational purposes, on the train or at home) consisting of various types of audio such as (unstructured) speech, music, and other sounds. Traditionally, search and recommendation of spoken content focuses on topical content, derived from text transcriptions, ignoring paralinguistic aspects in spoken language. Instead, we propose to model these paralinguistic aspects, such as speaking style, in podcasts to address both the heterogeneity of type of audio in podcasts and user needs to enable enriched access to this medium. In this paper, we take a first step towards this goal and explore audio-based stylistic variation in podcasts by 1) investigating what facets of stylistic variation are salient and of interest to listeners, and 2) gathering more insights into the kind of stylistic variation that is currently feasible to model with open-source audio tools and that is present in podcasts. We find that much of the stylistic variation mentioned by the users is related to speaking style and music, and we show, using open-source tools, how audio-based stylistic aspects vary across episodes, shows, and genres.
educational purposes, on the train or at home) consisting of various types of audio such as (unstructured) speech, music, and other sounds. Traditionally, search and recommendation of spoken content focuses on topical content, derived from text transcriptions, ignoring paralinguistic aspects in spoken language. Instead, we propose to model these paralinguistic aspects, such as speaking style, in podcasts to address both the heterogeneity of type of audio in podcasts and user needs to enable enriched access to this medium. In this paper, we take a first step towards this goal and explore audio-based stylistic variation in podcasts by 1) investigating what facets of stylistic variation are salient and of interest to listeners, and 2) gathering more insights into the kind of stylistic variation that is currently feasible to model with open-source audio tools and that is present in podcasts. We find that much of the stylistic variation mentioned by the users is related to speaking style and music, and we show, using open-source tools, how audio-based stylistic aspects vary across episodes, shows, and genres.
Original language | English |
---|---|
Pages | 2343-2347 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 2022 |
Event | 23rd INTERSPEECH 2022 - Songdo ConvensiA, Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022 Conference number: 23 https://www.interspeech2022.org/ |
Conference
Conference | 23rd INTERSPEECH 2022 |
---|---|
Country/Territory | Korea, Republic of |
City | Incheon |
Period | 18/09/22 → 22/09/22 |
Internet address |