메뉴 건너뛰기




Volumn 7, Issue 3, 2007, Pages 6-23

Modern methods of speech synthesis
[No Author Info available]

Author keywords

[No Author keywords available]

Indexed keywords


EID: 53149153855     PISSN: 1531636X     EISSN: None     Source Type: Journal    
DOI: 10.1109/MCAS.2007.904177     Document Type: Article
Times cited : (13)

References (46)
  • 2
    • 0003522447 scopus 로고    scopus 로고
    • Speech Communication: Human and Machine
    • 2nd ed., IEEE Press
    • D. O'shaughnessy, Speech Communication: Human and Machine, 2nd ed., IEEE Press, 2000.
    • (2000)
    • O'shaughnessy, D.1
  • 3
    • 28844488500 scopus 로고    scopus 로고
    • Speech coding methods, standards, and applications
    • J.D. Gibson, “Speech coding methods, standards, and applications”, IEEE Circuits and Systems Magazine, vol. 5, issue 4, pp. 30–49, 2005.
    • (2005) IEEE Circuits and Systems Magazine , vol.5 , Issue.4 , pp. 30-49
    • Gibson, J.D.1
  • 4
    • 0023407575 scopus 로고
    • Review of text-to-speech conversion for English
    • D. Klatt, “Review of text-to-speech conversion for English”, JASA, vol. 82, pp. 737–793, 1987.
    • (1987) JASA , vol.82 , pp. 737-793
    • Klatt, D.1
  • 5
    • 0003418124 scopus 로고
    • Acoustic Theory of Speech Production
    • Mouton, ‘s-Graven-hage, The Netherlands
    • G. Fant, Acoustic Theory of Speech Production, Mouton, ‘s-Graven-hage, The Netherlands, 1960.
    • (1960)
    • Fant, G.1
  • 6
    • 0016940126 scopus 로고
    • A model of articulatory dynamics and control
    • C. Coker, “A model of articulatory dynamics and control”, Proc. IEEE, vol. 64, pp. 452–460, 1976.
    • (1976) Proc. IEEE , vol.64 , pp. 452-460
    • Coker, C.1
  • 7
    • 21244495453 scopus 로고    scopus 로고
    • From Text to Speech: A Concatenative Approach
    • Kluwer
    • T. Dutoit, From Text to Speech: A Concatenative Approach, Kluwer, 1997.
    • (1997)
    • Dutoit, T.1
  • 8
    • 85008012847 scopus 로고    scopus 로고
    • Domain adaptation methods in the IBM Trainable Speech Synthesis System
    • V. Fishcer, J. Botella Ordinas, S. Kunzmann, “Domain adaptation methods in the IBM Trainable Speech Synthesis System”, ICSLP, paper WeB1403p, 2004.
    • (2004) ICSLP, paper WeB1403p
    • Fishcer, V.1    Ordinas, J.B.2    Kunzmann, S.3
  • 9
    • 33745216013 scopus 로고    scopus 로고
    • Small Footprint Concatenative Text-to-Speech Synthesis System using Complex Spectral Envelope Modeling
    • D. Chazan, R. Hoory, Z. Kons, A. Sagi, S. Shechtman and A. Sorin, “Small Footprint Concatenative Text-to-Speech Synthesis System using Complex Spectral Envelope Modeling”, ICSLP, 2005, pp. 2569–2572.
    • (2005) ICSLP , pp. 2569-2572
    • Chazan, D.1    Hoory, R.2    Kons, Z.3    Sagi, A.4    Shechtman, S.5    Sorin, A.6
  • 10
    • 0141479960 scopus 로고    scopus 로고
    • A multilingual TTS system with less than 1 mbyte footprint for embedded applications
    • R. Hoffmann, O. Jokisch, D. Hirschfeld, G. Strecha, H. Kruschke, U. Kordon, U. Koloska, “A multilingual TTS system with less than 1 mbyte footprint for embedded applications”, ICASSP, vol. I, pp. 532–535, 2003.
    • (2003) ICASSP , vol.1 , pp. 532-535
    • Hoffmann, R.1    Jokisch, O.2    Hirschfeld, D.3    Strecha, G.4    Kruschke, H.5    Kordon, U.6    Koloska, U.7
  • 11
    • 0025543906 scopus 로고
    • Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones
    • E. Moulines, F. Charpentier, “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Communication, vol. 9, pp. 453–467, 1990.
    • (1990) Speech Communication , vol.9 , pp. 453-467
    • Moulines, E.1    Charpentier, F.2
  • 12
    • 0036497601 scopus 로고    scopus 로고
    • A comparison of spectral smoothing methods for segment concatenation based speech synthesis
    • Mar.
    • D.T. Chappell and J.H.L. Hansen, “A comparison of spectral smoothing methods for segment concatenation based speech synthesis”, Speech Communication, vol. 36, issues 3–4, pp. 343–373, Mar. 2002.
    • (2002) Speech Communication , vol.36 , Issue.3-4 , pp. 343-373
    • Chappell, D.T.1    Hansen, J.H.L.2
  • 13
    • 0003559057 scopus 로고
    • Computational Analysis of Present-Day American English
    • Brown Univ.: Providence, RI
    • H. Kucera, W. Francis, “Computational Analysis of Present-Day American English, Brown Univ.: Providence, RI, 1967.
    • (1967)
    • Kucera, H.1    Francis, W.2
  • 14
    • 0035127353 scopus 로고    scopus 로고
    • Reducing audible spectral discontinuities
    • E. Klabbers, R. Veldhuis, “Reducing audible spectral discontinuities”, TSAP, vol. 9, no. 1, pp. 39–51, 2000.
    • (2000) TSAP , vol.9 , Issue.1 , pp. 39-51
    • Klabbers, E.1    Veldhuis, R.2
  • 15
    • 33745213765 scopus 로고    scopus 로고
    • Articulatory synthesis using corpus-based estimation of line spectrum pairs
    • O. Engwall, “Articulatory synthesis using corpus-based estimation of line spectrum pairs”, Interspeech, 2005, pp. 1909–1912.
    • (2005) Interspeech , pp. 1909-1912
    • Engwall, O.1
  • 16
    • 0015008817 scopus 로고
    • Effect of glottal pulse shape on the quality of natural vowels
    • A. Rosenberg, “Effect of glottal pulse shape on the quality of natural vowels”, JASA, vol. 49, pp. 583–590, 1971.
    • (1971) JASA , vol.49 , pp. 583-590
    • Rosenberg, A.1
  • 17
    • 0018986665 scopus 로고
    • Software for a cascade/parallel formant synthesizer
    • D. Klatt, “Software for a cascade/parallel formant synthesizer”, JASA, vol. 67, pp. 971–995, 1980.
    • (1980) JASA , vol.67 , pp. 971-995
    • Klatt, D.1
  • 18
    • 0020905802 scopus 로고
    • Formant synthesizers—cascade or parallel?
    • J. Holmes, “Formant synthesizers—cascade or parallel?”, Speech Comm., volume 2, pp. 251–273, 1983.
    • (1983) Speech Comm. , vol.2 , pp. 251-273
    • Holmes, J.1
  • 19
    • 34047254509 scopus 로고    scopus 로고
    • Quality-enhanced voice morphing using maximum likelihood transformations
    • Jul.
    • H. Ye, S. Young, “Quality-enhanced voice morphing using maximum likelihood transformations”, TSAP, vol. 14, no. 4, pp. 1301–1312, Jul. 2006.
    • (2006) TSAP , vol.14 , Issue.4 , pp. 1301-1312
    • Ye, H.1    Young, S.2
  • 20
    • 0035279124 scopus 로고    scopus 로고
    • Removing linear phase mismatches in concatenative speech synthesis
    • Y. Stylianou, “Removing linear phase mismatches in concatenative speech synthesis”, TSAP vol. 9, no. 3, pp. 232–239, 2001.
    • (2001) TSAP , vol.9 , Issue.3 , pp. 232-239
    • Stylianou, Y.1
  • 21
    • 0043095309 scopus 로고    scopus 로고
    • Perceptual phase quantization of speech
    • Jul.
    • Doh-Suk Kim, “Perceptual phase quantization of speech”, IEEE Tr. Speech and Audio Processing, vol. 11, issue 4, pp. 355–364, Jul. 2003.
    • (2003) IEEE Tr. Speech and Audio Processing , vol.11 , Issue.4 , pp. 355-364
    • Kim, D.-S.1
  • 22
    • 85009159765 scopus 로고    scopus 로고
    • A speech model of acoustic inventories based on asynchronous interpolation
    • A.B. Kain and J.P.H. Van Santen, “A speech model of acoustic inventories based on asynchronous interpolation”, Eurospeech, 2003, pp. 329–332.
    • (2003) Eurospeech , pp. 329-332
    • Kain, A.B.1    Van Santen, J.P.H.2
  • 23
    • 85008012860 scopus 로고    scopus 로고
    • A Global, Boundary-Centric Framework for Unit Selection Text-to-Speech Synthesis
    • J.R. Bellegarda, A Global, Boundary-Centric Framework for Unit Selection Text-to-Speech Synthesis, TSAP, 2006.
    • (2006) TSAP
    • Bellegarda, J.R.1
  • 24
    • 0037850986 scopus 로고    scopus 로고
    • Phonetic alignment: speech synthesis-based vs. Viterbi-based
    • Jun.
    • F. Malfrère, O. Deroo, T. Dutoit and C. Ris, “Phonetic alignment: speech synthesis-based vs. Viterbi-based”, Speech Communication, vol. 40, issue 4, pp. 503–515, Jun. 2003.
    • (2003) Speech Communication , vol.40 , Issue.4 , pp. 503-515
    • Malfrère, F.1    Deroo, O.2    Dutoit, T.3    Ris, C.4
  • 25
    • 0038141325 scopus 로고    scopus 로고
    • Topics in decision tree based speech synthesis
    • R.E. Donovan, “Topics in decision tree based speech synthesis”, Computer Speech & Language, vol. 17, issue 1, pp. 43–67, 2003.
    • (2003) Computer Speech & Language , vol.17 , Issue.1 , pp. 43-67
    • Donovan, R.E.1
  • 26
    • 34047258869 scopus 로고    scopus 로고
    • Subjective Evaluation of Join Cost and Smoothing Methods for Unit Selection Speech Synthesis
    • to be published
    • J. Vepa, S. King, “Subjective Evaluation of Join Cost and Smoothing Methods for Unit Selection Speech Synthesis”, 2006 TSAP, vol. 14, to be published.
    • 2006 TSAP , vol.14
    • Vepa, J.1    King, S.2
  • 27
    • 28644443377 scopus 로고    scopus 로고
    • An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis
    • Jan.
    • T. Toda, H. Kawai, M. Tsuzaki and K. Shikano, “An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis”, Speech Communication, vol. 48, issue 1, pp. 45–56, Jan. 2006.
    • (2006) Speech Communication , vol.48 , Issue.1 , pp. 45-56
    • Toda, T.1    Kawai, H.2    Tsuzaki, M.3    Shikano, K.4
  • 28
    • 9644270575 scopus 로고    scopus 로고
    • Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale
    • Jan.
    • Mah. Viswanathan and Mad. Viswanathan, “Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale”, Computer Speech & Language, vol. 19, issue 1, pp. 55–83, Jan. 2005.
    • (2005) Computer Speech & Language , vol.19 , Issue.1 , pp. 55-83
    • Viswanathan, M.1    Viswanathan, M.2
  • 30
    • 0017269304 scopus 로고
    • Letter-to-Sound Rules for Automatic Translation of English Text to Phonetics
    • H.S. Elovitz, R. Johnson, A. McHugh, J.E. Shore, “Letter-to-Sound Rules for Automatic Translation of English Text to Phonetics”, vol. 24, pp. 446–459, 1976.
    • (1976) , vol.24 , pp. 446-459
    • Elovitz, H.S.1    Johnson, R.2    McHugh, A.3    Shore, J.E.4
  • 31
    • 84966441141 scopus 로고    scopus 로고
    • Proper name pronunciations for speech technology applications
    • M.F. Spiegel, “Proper name pronunciations for speech technology applications”, IEEE Workshop on Speech Synthesis, 2002, pp. 175–178.
    • (2002) IEEE Workshop on Speech Synthesis , pp. 175-178
    • Spiegel, M.F.1
  • 32
    • 33745215117 scopus 로고    scopus 로고
    • Comparative Objective and Subjective Evaluation of Three Data-Driven Techniques for Proper Name Pronunciation
    • T. Soonklang, R.I. Damper, Y. Marchand, “Comparative Objective and Subjective Evaluation of Three Data-Driven Techniques for Proper Name Pronunciation”, ICSLP 2005, pp. 1905–1908.
    • (2005) ICSLP , pp. 1905-1908
    • Soonklang, T.1    Damper, R.I.2    Marchand, Y.3
  • 33
    • 0016939081 scopus 로고
    • Synthesis of speech from unrestricted text
    • Apr.
    • J. Allen, “Synthesis of speech from unrestricted text”, Proc. of the IEEE, vol. 64, issue 4, Apr. 1976, pp. 433–442.
    • (1976) Proc. of the IEEE , vol.64 , Issue.4 , pp. 433-442
    • Allen, J.1
  • 34
    • 21844466234 scopus 로고    scopus 로고
    • Synthesis of prosody using multi-level unit sequences
    • J. van Santen, A. Kain, E. Klabbers and T. Mishra, “Synthesis of prosody using multi-level unit sequences”, Speech Communication, vol. 46, issues 3–4, pp. 365–375, 2005.
    • (2005) Speech Communication , vol.46 , Issue.3-4 , pp. 365-375
    • van Santen, J.1    Kain, A.2    Klabbers, E.3    Mishra, T.4
  • 35
    • 0019632208 scopus 로고
    • Synthesizing intonation
    • J. Pierrehumbert, “Synthesizing intonation”, JASA, vol. 70, pp. 985–995, 1981.
    • (1981) JASA , vol.70 , pp. 985-995
    • Pierrehumbert, J.1
  • 36
    • 33745183138 scopus 로고    scopus 로고
    • Influence of Syntax on Prosodic Boundary Prediction
    • Sep.
    • T. Ingulfsen, T. Burrows and S. Buchholz, “Influence of Syntax on Prosodic Boundary Prediction”, Interspeech, Sep. 2005, pp. 1817–1820.
    • (2005) Interspeech , pp. 1817-1820
    • Ingulfsen, T.1    Burrows, T.2    Buchholz, S.3
  • 38
    • 33745205363 scopus 로고    scopus 로고
    • A probabilistic approach to unit selection for corpus-based speech synthesis
    • S. Sakai, H. Shu, “A probabilistic approach to unit selection for corpus-based speech synthesis”, Interspeech, 2005, pp. 81–84.
    • (2005) Interspeech , pp. 81-84
    • Sakai, S.1    Shu, H.2
  • 40
    • 34047268342 scopus 로고    scopus 로고
    • Conversational speech synthesis and the need for some laughter
    • Jul.
    • N. Campbell, “Conversational speech synthesis and the need for some laughter”, TSAP, vol. 14, no. 4, pp. 1171–1178, Jul. 2006.
    • (2006) TSAP , vol.14 , Issue.4 , pp. 1171-1178
    • Campbell, N.1
  • 41
    • 53149096930 scopus 로고    scopus 로고
    • The IBM expressive text-to-speech synthesis system for American English
    • Jul.
    • J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandez, W. Hamza, M.A. Picheny, “The IBM expressive text-to-speech synthesis system for American English”, TSAP, vol. 14, no. 4, pp. 1301–1312, Jul. 2006.
    • (2006) TSAP , vol.14 , Issue.4 , pp. 1301-1312
    • Pitrelli, J.F.1    Bakis, R.2    Eide, E.M.3    Fernandez, R.4    Hamza, W.5    Picheny, M.A.6
  • 42
    • 0037380318 scopus 로고    scopus 로고
    • A corpus-based speech synthesis system with emotion
    • A. Iida, N. Campbell, F. Higuchi, M. Yasmura, “A corpus-based speech synthesis system with emotion”, Speech Comm., vol. 40, pp. 161–187, 2003.
    • (2003) Speech Comm. , vol.40 , pp. 161-187
    • Iida, A.1    Campbell, N.2    Higuchi, F.3    Yasmura, M.4
  • 43
    • 24144469759 scopus 로고    scopus 로고
    • Data-driven multimodal synthesis
    • R. Carlson and B. Granstrom, “Data-driven multimodal synthesis”, vol. 47, pp. 182–193.
    • , vol.47 , pp. 182-193
    • Carlson, R.1    Granstrom, B.2
  • 44
    • 10844288683 scopus 로고    scopus 로고
    • Online experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference
    • Apr.
    • C. Stevens, N. Lees, J. Vonwiller and D. Burnham, “Online experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference”, Computer Speech & Language, vol. 19, issue 2, pp. 129–146, Apr. 2005.
    • (2005) Computer Speech & Language , vol.19 , Issue.2 , pp. 129-146
    • Stevens, C.1    Lees, N.2    Vonwiller, J.3    Burnham, D.4
  • 45
    • 33745216749 scopus 로고    scopus 로고
    • The Blizzard Challenge-2005: Evaluating corpus-based speech synthesis on common datasets
    • A.W. Black and K. Tokuda, “The Blizzard Challenge-2005: Evaluating corpus-based speech synthesis on common datasets”, Interspeech, 2005, pp. 77–80.
    • (2005) Interspeech , pp. 77-80
    • Black, A.W.1    Tokuda, K.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.