-
1
-
-
0027698436
-
Speech synthesis in telecommunications
-
Nov.
-
S.E. Levinson, J.P. Olive, J.S. Tschirgi, “Speech synthesis in telecommunications”, IEEE Communications Magazine, vol. 31, issue 11, pp. 46–53, Nov. 1993.
-
(1993)
IEEE Communications Magazine
, vol.31
, Issue.11
, pp. 46-53
-
-
Levinson, S.E.1
Olive, J.P.2
Tschirgi, J.S.3
-
2
-
-
0003522447
-
Speech Communication: Human and Machine
-
2nd ed., IEEE Press
-
D. O'shaughnessy, Speech Communication: Human and Machine, 2nd ed., IEEE Press, 2000.
-
(2000)
-
-
O'shaughnessy, D.1
-
3
-
-
28844488500
-
Speech coding methods, standards, and applications
-
J.D. Gibson, “Speech coding methods, standards, and applications”, IEEE Circuits and Systems Magazine, vol. 5, issue 4, pp. 30–49, 2005.
-
(2005)
IEEE Circuits and Systems Magazine
, vol.5
, Issue.4
, pp. 30-49
-
-
Gibson, J.D.1
-
4
-
-
0023407575
-
Review of text-to-speech conversion for English
-
D. Klatt, “Review of text-to-speech conversion for English”, JASA, vol. 82, pp. 737–793, 1987.
-
(1987)
JASA
, vol.82
, pp. 737-793
-
-
Klatt, D.1
-
5
-
-
0003418124
-
Acoustic Theory of Speech Production
-
Mouton, ‘s-Graven-hage, The Netherlands
-
G. Fant, Acoustic Theory of Speech Production, Mouton, ‘s-Graven-hage, The Netherlands, 1960.
-
(1960)
-
-
Fant, G.1
-
6
-
-
0016940126
-
A model of articulatory dynamics and control
-
C. Coker, “A model of articulatory dynamics and control”, Proc. IEEE, vol. 64, pp. 452–460, 1976.
-
(1976)
Proc. IEEE
, vol.64
, pp. 452-460
-
-
Coker, C.1
-
7
-
-
21244495453
-
From Text to Speech: A Concatenative Approach
-
Kluwer
-
T. Dutoit, From Text to Speech: A Concatenative Approach, Kluwer, 1997.
-
(1997)
-
-
Dutoit, T.1
-
8
-
-
85008012847
-
Domain adaptation methods in the IBM Trainable Speech Synthesis System
-
V. Fishcer, J. Botella Ordinas, S. Kunzmann, “Domain adaptation methods in the IBM Trainable Speech Synthesis System”, ICSLP, paper WeB1403p, 2004.
-
(2004)
ICSLP, paper WeB1403p
-
-
Fishcer, V.1
Ordinas, J.B.2
Kunzmann, S.3
-
9
-
-
33745216013
-
Small Footprint Concatenative Text-to-Speech Synthesis System using Complex Spectral Envelope Modeling
-
D. Chazan, R. Hoory, Z. Kons, A. Sagi, S. Shechtman and A. Sorin, “Small Footprint Concatenative Text-to-Speech Synthesis System using Complex Spectral Envelope Modeling”, ICSLP, 2005, pp. 2569–2572.
-
(2005)
ICSLP
, pp. 2569-2572
-
-
Chazan, D.1
Hoory, R.2
Kons, Z.3
Sagi, A.4
Shechtman, S.5
Sorin, A.6
-
10
-
-
0141479960
-
A multilingual TTS system with less than 1 mbyte footprint for embedded applications
-
R. Hoffmann, O. Jokisch, D. Hirschfeld, G. Strecha, H. Kruschke, U. Kordon, U. Koloska, “A multilingual TTS system with less than 1 mbyte footprint for embedded applications”, ICASSP, vol. I, pp. 532–535, 2003.
-
(2003)
ICASSP
, vol.1
, pp. 532-535
-
-
Hoffmann, R.1
Jokisch, O.2
Hirschfeld, D.3
Strecha, G.4
Kruschke, H.5
Kordon, U.6
Koloska, U.7
-
11
-
-
0025543906
-
Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones
-
E. Moulines, F. Charpentier, “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Communication, vol. 9, pp. 453–467, 1990.
-
(1990)
Speech Communication
, vol.9
, pp. 453-467
-
-
Moulines, E.1
Charpentier, F.2
-
12
-
-
0036497601
-
A comparison of spectral smoothing methods for segment concatenation based speech synthesis
-
Mar.
-
D.T. Chappell and J.H.L. Hansen, “A comparison of spectral smoothing methods for segment concatenation based speech synthesis”, Speech Communication, vol. 36, issues 3–4, pp. 343–373, Mar. 2002.
-
(2002)
Speech Communication
, vol.36
, Issue.3-4
, pp. 343-373
-
-
Chappell, D.T.1
Hansen, J.H.L.2
-
13
-
-
0003559057
-
Computational Analysis of Present-Day American English
-
Brown Univ.: Providence, RI
-
H. Kucera, W. Francis, “Computational Analysis of Present-Day American English, Brown Univ.: Providence, RI, 1967.
-
(1967)
-
-
Kucera, H.1
Francis, W.2
-
14
-
-
0035127353
-
Reducing audible spectral discontinuities
-
E. Klabbers, R. Veldhuis, “Reducing audible spectral discontinuities”, TSAP, vol. 9, no. 1, pp. 39–51, 2000.
-
(2000)
TSAP
, vol.9
, Issue.1
, pp. 39-51
-
-
Klabbers, E.1
Veldhuis, R.2
-
15
-
-
33745213765
-
Articulatory synthesis using corpus-based estimation of line spectrum pairs
-
O. Engwall, “Articulatory synthesis using corpus-based estimation of line spectrum pairs”, Interspeech, 2005, pp. 1909–1912.
-
(2005)
Interspeech
, pp. 1909-1912
-
-
Engwall, O.1
-
16
-
-
0015008817
-
Effect of glottal pulse shape on the quality of natural vowels
-
A. Rosenberg, “Effect of glottal pulse shape on the quality of natural vowels”, JASA, vol. 49, pp. 583–590, 1971.
-
(1971)
JASA
, vol.49
, pp. 583-590
-
-
Rosenberg, A.1
-
17
-
-
0018986665
-
Software for a cascade/parallel formant synthesizer
-
D. Klatt, “Software for a cascade/parallel formant synthesizer”, JASA, vol. 67, pp. 971–995, 1980.
-
(1980)
JASA
, vol.67
, pp. 971-995
-
-
Klatt, D.1
-
18
-
-
0020905802
-
Formant synthesizers—cascade or parallel?
-
J. Holmes, “Formant synthesizers—cascade or parallel?”, Speech Comm., volume 2, pp. 251–273, 1983.
-
(1983)
Speech Comm.
, vol.2
, pp. 251-273
-
-
Holmes, J.1
-
19
-
-
34047254509
-
Quality-enhanced voice morphing using maximum likelihood transformations
-
Jul.
-
H. Ye, S. Young, “Quality-enhanced voice morphing using maximum likelihood transformations”, TSAP, vol. 14, no. 4, pp. 1301–1312, Jul. 2006.
-
(2006)
TSAP
, vol.14
, Issue.4
, pp. 1301-1312
-
-
Ye, H.1
Young, S.2
-
20
-
-
0035279124
-
Removing linear phase mismatches in concatenative speech synthesis
-
Y. Stylianou, “Removing linear phase mismatches in concatenative speech synthesis”, TSAP vol. 9, no. 3, pp. 232–239, 2001.
-
(2001)
TSAP
, vol.9
, Issue.3
, pp. 232-239
-
-
Stylianou, Y.1
-
21
-
-
0043095309
-
Perceptual phase quantization of speech
-
Jul.
-
Doh-Suk Kim, “Perceptual phase quantization of speech”, IEEE Tr. Speech and Audio Processing, vol. 11, issue 4, pp. 355–364, Jul. 2003.
-
(2003)
IEEE Tr. Speech and Audio Processing
, vol.11
, Issue.4
, pp. 355-364
-
-
Kim, D.-S.1
-
22
-
-
85009159765
-
A speech model of acoustic inventories based on asynchronous interpolation
-
A.B. Kain and J.P.H. Van Santen, “A speech model of acoustic inventories based on asynchronous interpolation”, Eurospeech, 2003, pp. 329–332.
-
(2003)
Eurospeech
, pp. 329-332
-
-
Kain, A.B.1
Van Santen, J.P.H.2
-
23
-
-
85008012860
-
A Global, Boundary-Centric Framework for Unit Selection Text-to-Speech Synthesis
-
J.R. Bellegarda, A Global, Boundary-Centric Framework for Unit Selection Text-to-Speech Synthesis, TSAP, 2006.
-
(2006)
TSAP
-
-
Bellegarda, J.R.1
-
24
-
-
0037850986
-
Phonetic alignment: speech synthesis-based vs. Viterbi-based
-
Jun.
-
F. Malfrère, O. Deroo, T. Dutoit and C. Ris, “Phonetic alignment: speech synthesis-based vs. Viterbi-based”, Speech Communication, vol. 40, issue 4, pp. 503–515, Jun. 2003.
-
(2003)
Speech Communication
, vol.40
, Issue.4
, pp. 503-515
-
-
Malfrère, F.1
Deroo, O.2
Dutoit, T.3
Ris, C.4
-
25
-
-
0038141325
-
Topics in decision tree based speech synthesis
-
R.E. Donovan, “Topics in decision tree based speech synthesis”, Computer Speech & Language, vol. 17, issue 1, pp. 43–67, 2003.
-
(2003)
Computer Speech & Language
, vol.17
, Issue.1
, pp. 43-67
-
-
Donovan, R.E.1
-
26
-
-
34047258869
-
Subjective Evaluation of Join Cost and Smoothing Methods for Unit Selection Speech Synthesis
-
to be published
-
J. Vepa, S. King, “Subjective Evaluation of Join Cost and Smoothing Methods for Unit Selection Speech Synthesis”, 2006 TSAP, vol. 14, to be published.
-
2006 TSAP
, vol.14
-
-
Vepa, J.1
King, S.2
-
27
-
-
28644443377
-
An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis
-
Jan.
-
T. Toda, H. Kawai, M. Tsuzaki and K. Shikano, “An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis”, Speech Communication, vol. 48, issue 1, pp. 45–56, Jan. 2006.
-
(2006)
Speech Communication
, vol.48
, Issue.1
, pp. 45-56
-
-
Toda, T.1
Kawai, H.2
Tsuzaki, M.3
Shikano, K.4
-
28
-
-
9644270575
-
Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale
-
Jan.
-
Mah. Viswanathan and Mad. Viswanathan, “Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale”, Computer Speech & Language, vol. 19, issue 1, pp. 55–83, Jan. 2005.
-
(2005)
Computer Speech & Language
, vol.19
, Issue.1
, pp. 55-83
-
-
Viswanathan, M.1
Viswanathan, M.2
-
29
-
-
33745188476
-
Toward Multiple-Language TTS: Experiments in English and Mandarin
-
R. Fernandez, W. Zhang, E. Eide, R. Bakis, W. Hamza, Y. Liu, M. Picheny, J.F. Pitrelli, Y. Qing, Z.W. Shuang, L.Q. Shen, “Toward Multiple-Language TTS: Experiments in English and Mandarin”, Interspeech, pp. 1473–1476, 2005.
-
(2005)
Interspeech
, pp. 1473-1476
-
-
Fernandez, R.1
Zhang, W.2
Eide, E.3
Bakis, R.4
Hamza, W.5
Liu, Y.6
Picheny, M.7
Pitrelli, J.F.8
Qing, Y.9
Shuang, Z.W.10
Shen, L.Q.11
-
30
-
-
0017269304
-
Letter-to-Sound Rules for Automatic Translation of English Text to Phonetics
-
H.S. Elovitz, R. Johnson, A. McHugh, J.E. Shore, “Letter-to-Sound Rules for Automatic Translation of English Text to Phonetics”, vol. 24, pp. 446–459, 1976.
-
(1976)
, vol.24
, pp. 446-459
-
-
Elovitz, H.S.1
Johnson, R.2
McHugh, A.3
Shore, J.E.4
-
31
-
-
84966441141
-
Proper name pronunciations for speech technology applications
-
M.F. Spiegel, “Proper name pronunciations for speech technology applications”, IEEE Workshop on Speech Synthesis, 2002, pp. 175–178.
-
(2002)
IEEE Workshop on Speech Synthesis
, pp. 175-178
-
-
Spiegel, M.F.1
-
32
-
-
33745215117
-
Comparative Objective and Subjective Evaluation of Three Data-Driven Techniques for Proper Name Pronunciation
-
T. Soonklang, R.I. Damper, Y. Marchand, “Comparative Objective and Subjective Evaluation of Three Data-Driven Techniques for Proper Name Pronunciation”, ICSLP 2005, pp. 1905–1908.
-
(2005)
ICSLP
, pp. 1905-1908
-
-
Soonklang, T.1
Damper, R.I.2
Marchand, Y.3
-
33
-
-
0016939081
-
Synthesis of speech from unrestricted text
-
Apr.
-
J. Allen, “Synthesis of speech from unrestricted text”, Proc. of the IEEE, vol. 64, issue 4, Apr. 1976, pp. 433–442.
-
(1976)
Proc. of the IEEE
, vol.64
, Issue.4
, pp. 433-442
-
-
Allen, J.1
-
34
-
-
21844466234
-
Synthesis of prosody using multi-level unit sequences
-
J. van Santen, A. Kain, E. Klabbers and T. Mishra, “Synthesis of prosody using multi-level unit sequences”, Speech Communication, vol. 46, issues 3–4, pp. 365–375, 2005.
-
(2005)
Speech Communication
, vol.46
, Issue.3-4
, pp. 365-375
-
-
van Santen, J.1
Kain, A.2
Klabbers, E.3
Mishra, T.4
-
35
-
-
0019632208
-
Synthesizing intonation
-
J. Pierrehumbert, “Synthesizing intonation”, JASA, vol. 70, pp. 985–995, 1981.
-
(1981)
JASA
, vol.70
, pp. 985-995
-
-
Pierrehumbert, J.1
-
36
-
-
33745183138
-
Influence of Syntax on Prosodic Boundary Prediction
-
Sep.
-
T. Ingulfsen, T. Burrows and S. Buchholz, “Influence of Syntax on Prosodic Boundary Prediction”, Interspeech, Sep. 2005, pp. 1817–1820.
-
(2005)
Interspeech
, pp. 1817-1820
-
-
Ingulfsen, T.1
Burrows, T.2
Buchholz, S.3
-
37
-
-
0141702290
-
Recent improvements to the IBM train0able speech synthesis system
-
E. Eide, A. Aaron, R. Bakis, P. Cohen, R. Donovan, W. Hamza, T. Mathes, M. Picheny, M. Polkosky, M. Smith, M. Viswanathan,” Recent improvements to the IBM train0able speech synthesis system”, ICASSP, vol. I, 2003, pp. 708–711.
-
(2003)
ICASSP
, vol.1
, pp. 708-711
-
-
Eide, E.1
Aaron, A.2
Bakis, R.3
Cohen, P.4
Donovan, R.5
Hamza, W.6
Mathes, T.7
Picheny, M.8
Polkosky, M.9
Smith, M.10
Viswanathan, M.11
-
38
-
-
33745205363
-
A probabilistic approach to unit selection for corpus-based speech synthesis
-
S. Sakai, H. Shu, “A probabilistic approach to unit selection for corpus-based speech synthesis”, Interspeech, 2005, pp. 81–84.
-
(2005)
Interspeech
, pp. 81-84
-
-
Sakai, S.1
Shu, H.2
-
39
-
-
85119213703
-
ToBI: A standard scheme for labeling prosody
-
K. Silverman, M. Beckman, J. Pierrehumbert, M. Ostendorf, C. Wightman, P. Price, and J. Hirschberg, “ToBI: A standard scheme for labeling prosody”, in Proc. 2nd Internal Conf. Spoken Language Processing (ICSLP), 1992, pp. 867–879.
-
(1992)
Proc. 2nd Internal Conf. Spoken Language Processing (ICSLP)
, pp. 867-879
-
-
Silverman, K.1
Beckman, M.2
Pierrehumbert, J.3
Ostendorf, M.4
Wightman, C.5
Price, P.6
Hirschberg, J.7
-
40
-
-
34047268342
-
Conversational speech synthesis and the need for some laughter
-
Jul.
-
N. Campbell, “Conversational speech synthesis and the need for some laughter”, TSAP, vol. 14, no. 4, pp. 1171–1178, Jul. 2006.
-
(2006)
TSAP
, vol.14
, Issue.4
, pp. 1171-1178
-
-
Campbell, N.1
-
41
-
-
53149096930
-
The IBM expressive text-to-speech synthesis system for American English
-
Jul.
-
J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandez, W. Hamza, M.A. Picheny, “The IBM expressive text-to-speech synthesis system for American English”, TSAP, vol. 14, no. 4, pp. 1301–1312, Jul. 2006.
-
(2006)
TSAP
, vol.14
, Issue.4
, pp. 1301-1312
-
-
Pitrelli, J.F.1
Bakis, R.2
Eide, E.M.3
Fernandez, R.4
Hamza, W.5
Picheny, M.A.6
-
42
-
-
0037380318
-
A corpus-based speech synthesis system with emotion
-
A. Iida, N. Campbell, F. Higuchi, M. Yasmura, “A corpus-based speech synthesis system with emotion”, Speech Comm., vol. 40, pp. 161–187, 2003.
-
(2003)
Speech Comm.
, vol.40
, pp. 161-187
-
-
Iida, A.1
Campbell, N.2
Higuchi, F.3
Yasmura, M.4
-
43
-
-
24144469759
-
Data-driven multimodal synthesis
-
R. Carlson and B. Granstrom, “Data-driven multimodal synthesis”, vol. 47, pp. 182–193.
-
, vol.47
, pp. 182-193
-
-
Carlson, R.1
Granstrom, B.2
-
44
-
-
10844288683
-
Online experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference
-
Apr.
-
C. Stevens, N. Lees, J. Vonwiller and D. Burnham, “Online experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference”, Computer Speech & Language, vol. 19, issue 2, pp. 129–146, Apr. 2005.
-
(2005)
Computer Speech & Language
, vol.19
, Issue.2
, pp. 129-146
-
-
Stevens, C.1
Lees, N.2
Vonwiller, J.3
Burnham, D.4
-
45
-
-
33745216749
-
The Blizzard Challenge-2005: Evaluating corpus-based speech synthesis on common datasets
-
A.W. Black and K. Tokuda, “The Blizzard Challenge-2005: Evaluating corpus-based speech synthesis on common datasets”, Interspeech, 2005, pp. 77–80.
-
(2005)
Interspeech
, pp. 77-80
-
-
Black, A.W.1
Tokuda, K.2
-
46
-
-
33751057590
-
The ATR multilingual speech-to-speech translation system
-
Mar.
-
S. Nakamura, K. Markov, H. Nakaiwa, G. Kikui, H. Kawai, T. Jitsuhiro, J.-S. Zhang, H. Yamamoto, E. Sumita, S. Yamamoto, “The ATR multilingual speech-to-speech translation system”, TSAP, vol. 14, no. 2, pp. 365–376, Mar. 2006.
-
(2006)
TSAP
, vol.14
, Issue.2
, pp. 365-376
-
-
Nakamura, S.1
Markov, K.2
Nakaiwa, H.3
Kikui, G.4
Kawai, H.5
Jitsuhiro, T.6
Zhang, J.-S.7
Yamamoto, H.8
Sumita, E.9
Yamamoto, S.10
|