Abstract
One problem with speech synthesis impeding high quality is the occurrence of audible discontinuities at segment boundaries. Formant jumps across concatenation points suggest the problem to be due to spectral differences. The problem is most apparent in vowels and semi-vowels. We propose to reduce the number of audible discontinuities by adding context-sensitive diphones to the database. The number of additional diphones is limited by clustering contexts with similar spectral effects on the neighbouring vowels, using the Kullback-Leibler distance. A listening experiment has shown that the percentage of perceived discontinuities has significantly decreased.
Original language | English |
---|---|
Title of host publication | 6th International Conference on Spoken Language Processing, ICSLP 2000 |
Place of Publication | Beijing |
Publisher | China Military Friendship Pub. |
Pages | 474-477 |
Volume | 3 |
ISBN (Print) | 7801501144 |
Publication status | Published - 1 Jan 2000 |
Externally published | Yes |
Event | 6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, China Duration: 16 Oct 2000 → 20 Oct 2000 Conference number: 6 |
Conference
Conference | 6th International Conference on Spoken Language Processing, ICSLP 2000 |
---|---|
Abbreviated title | ICSLP |
Country/Territory | China |
City | Beijing |
Period | 16/10/00 → 20/10/00 |