메뉴 건너뛰기




Volumn 4, Issue 6, 2010, Pages 1046-1058

Measuring the gap between HMM-based ASR and TTS

Author keywords

Speech recognition; speech synthesis; unified models

Indexed keywords

ACOUSTIC FEATURES; AUTOMATIC SPEECH RECOGNITION; EUROPEAN PROJECT; SPEAKER ADAPTATION; SPEECH-TO-SPEECH TRANSLATION; STATISTICAL MODELING; STATISTICAL MODELS; SYSTEM DESIGN; TEXT TO SPEECH SYNTHESIS; TTS SYSTEMS; UNIFIED MODEL; UNIFIED MODELING;

EID: 77953728395     PISSN: 19324553     EISSN: None     Source Type: Journal    
DOI: 10.1109/JSTSP.2010.2079315     Document Type: Article
Times cited : (11)

References (75)
  • 1
    • 84966341178 scopus 로고    scopus 로고
    • The impact of speech recognition on speech synthesis
    • Santa Monica, CA, Sep.
    • M. Ostendorf and I. Bulyko, "The impact of speech recognition on speech synthesis," in Proc. IEEE Workshop Speech Synth., Santa Monica, CA, Sep. 2002, pp. 99-106.
    • (2002) Proc. IEEE Workshop Speech Synth. , pp. 99-106
    • Ostendorf, M.1    Bulyko, I.2
  • 2
    • 70349227947 scopus 로고    scopus 로고
    • The application of hidden Markov models in speech recognition
    • M. Gales and S. Young, "The application of hidden Markov models in speech recognition," Foundat. Trends Signal Process., vol. 1, no. 3, pp. 195-304, 2007.
    • (2007) Foundat. Trends Signal Process. , vol.1 , Issue.3 , pp. 195-304
    • Gales, M.1    Young, S.2
  • 4
    • 84867203039 scopus 로고    scopus 로고
    • Unsupervised adaptation for HMM-based speech synthesis
    • Sep.
    • S. King, K. Tokuda, H. Zen, and J. Yamagishi, "Unsupervised adaptation for HMM-based speech synthesis," in Proc. Interspeech'08, Sep. 2008, pp. 1869-1872.
    • (2008) Proc. Interspeech'08 , pp. 1869-1872
    • King, S.1    Tokuda, K.2    Zen, H.3    Yamagishi, J.4
  • 5
    • 70450185735 scopus 로고    scopus 로고
    • Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models
    • Brighton, U.K. Sep.
    • M. Gibson, "Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models," in Proc. Interspeech, Brighton, U.K., Sep. 2009, pp. 1791-1794.
    • (2009) Proc. Interspeech , pp. 1791-1794
    • Gibson, M.1
  • 6
    • 78049369783 scopus 로고    scopus 로고
    • A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis
    • H. Liang, J. Dines, and L. Saheer, "A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis," in Proc. ICASSP, Dallas, TX, 2010, pp. 4598-4601.
    • (2010) Proc. ICASSP, Dallas, TX , pp. 4598-4601
    • Liang, H.1    Dines, J.2    Saheer, L.3
  • 7
    • 0142192295 scopus 로고    scopus 로고
    • Conditional random fields: Probabilistic models for segmenting and labeling sequence data
    • Williamstown, MA
    • J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. ICML, Williamstown, MA, 2001, pp. 282-289.
    • (2001) Proc. ICML , pp. 282-289
    • Lafferty, J.1    McCallum, A.2    Pereira, F.3
  • 9
    • 0036296863 scopus 로고    scopus 로고
    • Minimum phone error and I-smoothing for improved discriminative training
    • D. Povey and P. C. Woodland, "Minimum phone error and I-smoothing for improved discriminative training," in Proc. ICASSP, Orlando, FL, 2002, pp. 105-108.
    • (2002) Proc. ICASSP, Orlando, FL , pp. 105-108
    • Povey, D.1    Woodland, P.C.2
  • 10
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • Toulouse, France
    • Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP, Toulouse, France, 2006, pp. 89-92.
    • (2006) Proc. ICASSP , pp. 89-92
    • Wu, Y.-J.1    Wang, R.-H.2
  • 11
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected appi-cations in speech recognition
    • Feb.
    • L. R. Rabiner, "A tutorial on hidden Markov models and selected appi-cations in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
    • (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
    • Rabiner, L.R.1
  • 12
    • 38149010136 scopus 로고
    • A hidden Markov model approach to speech synthesis
    • Paris, France
    • A. Falaschi, M. Giustiniani, and M. Verola, "A hidden Markov model approach to speech synthesis," in Proc. Eurospeech, Paris, France, 1989, pp. 187-190.
    • (1989) Proc. Eurospeech , pp. 187-190
    • Falaschi, A.1    Giustiniani, M.2    Verola, M.3
  • 13
    • 0019555090 scopus 로고
    • Cepstral analysis technique for automatic speaker verification
    • Apr.
    • S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 2, pp. 254-272, Apr. 1981.
    • (1981) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-29 , Issue.2 , pp. 254-272
    • Furui, S.1
  • 14
    • 74149089478 scopus 로고    scopus 로고
    • Hidden semi-Markov models
    • Feb.
    • S.-Z. Yu, "Hidden semi-Markov models," Artificial Intell., vol. 174, no. 2, pp. 215-243, Feb. 2009.
    • (2009) Artificial Intell. , vol.174 , Issue.2 , pp. 215-243
    • Yu, S.-Z.1
  • 15
    • 85009231267 scopus 로고    scopus 로고
    • Trajectory modeling based on HMMs with explicit relationship between static and dynamic features
    • Geneva, Switzerland
    • K. Tokuda, H. Zen, and T. Kitamura, "Trajectory modeling based on HMMs with explicit relationship between static and dynamic features," in Proc. Eurospeech, Geneva, Switzerland, 2003, pp. 865-868.
    • (2003) Proc. Eurospeech , pp. 865-868
    • Tokuda, K.1    Zen, H.2    Kitamura, T.3
  • 16
    • 0026854213 scopus 로고
    • A generalised hidden Markov model with state-conditioned trend functions of time for the speech signal
    • Apr.
    • L. Deng, "A generalised hidden Markov model with state-conditioned trend functions of time for the speech signal," Signal Process., vol. 27, pp. 65-78, Apr. 1992.
    • (1992) Signal Process. , vol.27 , pp. 65-78
    • Deng, L.1
  • 18
    • 0023211846 scopus 로고
    • Explicit time correlation in hidden Markov models for speech recognition
    • C. Wellekens, "Explicit time correlation in hidden Markov models for speech recognition," in Proc. ICASSP, Dallas, TX, 1987, vol. 12, pp. 384-386.
    • (1987) Proc. ICASSP, Dallas, TX , vol.12 , pp. 384-386
    • Wellekens, C.1
  • 19
    • 70450175584 scopus 로고    scopus 로고
    • Autoregressive HMMs for speech synthesis
    • Brighton, U.K.
    • M. Shannon and W. Byrne, "Autoregressive HMMs for speech synthesis," in Proc. Interspeech, Brighton, U.K., 2009.
    • (2009) Proc. Interspeech
    • Shannon, M.1    Byrne, W.2
  • 20
    • 0003911245 scopus 로고    scopus 로고
    • A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics
    • L. Deng and J. Ma, "A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics," in Proc. Eurospeech, Budapest, Hungary, 1999, pp. 1499-1502.
    • (1999) Proc. Eurospeech, Budapest, Hungary , pp. 1499-1502
    • Deng, L.1    Ma, J.2
  • 21
    • 54349106040 scopus 로고    scopus 로고
    • Switching linear dynamical systems for noise robust speech recognition
    • Aug.
    • B. Mesot and D. Barber, "Switching linear dynamical systems for noise robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1850-1858, Aug. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.6 , pp. 1850-1858
    • Mesot, B.1    Barber, D.2
  • 22
    • 79959849719 scopus 로고    scopus 로고
    • Autoregressive clustering for HMM speech synthesis
    • Makuhari, Japan
    • M. Shannon and W. Byrne, "Autoregressive clustering for HMM speech synthesis," in Proc. Interspeech, Makuhari, Japan, 2010.
    • (2010) Proc. Interspeech
    • Shannon, M.1    Byrne, W.2
  • 23
    • 78649270883 scopus 로고    scopus 로고
    • Learning deep architectures for AI univ. de montréal montreal QC Canada
    • Y. Bengio, Learning Deep Architectures for AI Univ. de Montréal, Montreal, QC, Canada, Tech. Rep. 1312, 2007.
    • (2007) Tech Rep. , vol.1312
    • Bengio, Y.1
  • 24
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • G. E. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, pp. 1527-1554, 2006.
    • (2006) Neural Comput. , vol.18 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.3
  • 26
    • 78649277342 scopus 로고    scopus 로고
    • Decision trees do not generalize to new variations Univ. de Montréal Montreal QC Canada
    • Y. Bengio, O. Delalleau, and C. Simard, Decision trees do not generalize to new variations Univ. de Montréal, Montreal, QC, Canada, Tech. Rep. 1304, 2006.
    • (2006) Tech. Rep. , vol.1304
    • Bengio, Y.1    Delalleau, O.2    Simard, C.3
  • 27
    • 51449118125 scopus 로고    scopus 로고
    • Acoustic modeling with contextual additive structure for HMM-based speech recognition
    • Y. Nankaku, K. Nakamura, H. Zen, T. Toda, and K. Tokuda, "Acoustic modeling with contextual additive structure for HMM-based speech recognition," in Proc. ICASSP, Las Vegas, NV, 2008, pp. 4469-4472.
    • (2008) Proc. ICASSP, Las Vegas, NV , pp. 4469-4472
    • Nankaku, Y.1    Nakamura, K.2    Zen, H.3    Toda, T.4    Tokuda, K.5
  • 30
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Ki-tamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Ki-Tamura, T.5
  • 31
    • 78649273643 scopus 로고    scopus 로고
    • HMM-based approach to multilingual speech synthesis, Text to Speech Synthesis: New Paradigms and Advances
    • S. Narayanan andA. Alwan, Eds NJ: Prentice-Hall
    • K. Tokuda, H. Zen, and A. W. Black, "HMM-based approach to multilingual speech synthesis," in Text to Speech Synthesis: New Paradigms and Advances, S. Narayanan andA. Alwan, Eds. Upper Saddle River, NJ: Prentice-Hall, 2004.
    • (2004) Upper Saddle River
    • Tokuda, K.1    Zen, H.2    Black, A.W.3
  • 33
    • 77249139677 scopus 로고    scopus 로고
    • An HMM-based Mandarin Chinese text-to-speech system
    • Dec.
    • Y. Qian, F. Soong, Y. Chen, and M. Chu, "An HMM-based Mandarin Chinese text-to-speech system," in Proc. ISCSLP'06, Dec. 2006, pp. 223-232.
    • (2006) Proc. ISCSLP'06 , pp. 223-232
    • Qian, Y.1    Soong, F.2    Chen, Y.3    Chu, M.4
  • 35
    • 0030672098 scopus 로고    scopus 로고
    • Hybrid HMM-ANN systems for training independent tasks: Experiments on phonebook and related improvements
    • Munich, Germany Apr.
    • S. Dupont, H. Bourlard, O. Deroo, V. Fontaine, and J.-M. Boite, "Hybrid HMM-ANN systems for training independent tasks: Experiments on phonebook and related improvements," in Proc. ICASSP, Munich, Germany, Apr. 1997, pp. 1767-1770.
    • (1997) Proc. ICASSP , pp. 1767-1770
    • Dupont, S.1    Bourlard, H.2    Deroo, O.3    Fontaine, V.4    Boite, J.-M.5
  • 36
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, no. 4, pp. 1738-1752, 1990.
    • (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.4 , pp. 1738-1752
    • Hermansky, H.1
  • 37
    • 85131821539 scopus 로고
    • Mel-generalized cepstral analysis\A unified approach to speech spectral estimation
    • Sep.
    • K. Koishida, G. Hirabayashi, K. Tokuda, and T. Kobayashi, "Mel-generalized cepstral analysis\A unified approach to speech spectral estimation," in Proc. ICSLP, Yokohama, Japan, Sep. 1994, vol. 3, pp. 1043-1046.
    • (1994) Proc. ICSLP, Yokohama, Japan , vol.3 , pp. 1043-1046
    • Koishida, K.1    Hirabayashi, G.2    Tokuda, K.3    Kobayashi, T.4
  • 38
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, pp. 187-207, 1999.
    • (1999) Speech Commun. , vol.27 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    Cheveigne, A.3
  • 39
    • 59849090295 scopus 로고    scopus 로고
    • Combining spectral representations for large vocabulary continuous speech recognition
    • Mar.
    • G. Garau and S. Renals, "Combining spectral representations for large vocabulary continuous speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 3, pp. 508-518, Mar. 2008.
    • (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.3 , pp. 508-518
    • Garau, G.1    Renals, S.2
  • 40
    • 33847129573 scopus 로고    scopus 로고
    • Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
    • Feb.
    • J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training," IEICE Trans. Inf. Syst, vol. E90-D, no. 2, pp. 533-543, Feb. 2007.
    • (2007) IEICE Trans. Inf. Syst , vol.E90-D , Issue.2 , pp. 533-543
    • Yamagishi, J.1    Kobayashi, T.2
  • 41
    • 33645781551 scopus 로고    scopus 로고
    • Evaluation of a speech recognition/generation method based on HMM and STRAIGHT
    • T. Irino, Y. Minami, T. Nakatani, M. Tsuzaki, and H. Tagawa, "Evaluation of a speech recognition/generation method based on HMM and STRAIGHT," in Proc. ICSLP, Denver, CO, 2002, pp. 2545-2548.
    • (2002) Proc. ICSLP, Denver, CO , pp. 2545-2548
    • Irino, T.1    Minami, Y.2    Nakatani, T.3    Tsuzaki, M.4    Tagawa, H.5
  • 43
    • 85135145174 scopus 로고    scopus 로고
    • Acoustic modeling based on the MDL criterion for speech recognition
    • K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," in Proc. Eurospeech, Rhodes, Greece, 1997, vol. 1, pp. 99-102.
    • (1997) Proc. Eurospeech, Rhodes, Greece , vol.1 , pp. 99-102
    • Shinoda, K.1    Watanabe, T.2
  • 44
    • 33947674781 scopus 로고    scopus 로고
    • Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis
    • Toulouse, France
    • K. Prahallad, A. W. Black, and R. Mosur, "Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis," in Proc. ICASSP, Toulouse, France, 2006, pp. 853-856.
    • (2006) Proc. ICASSP , pp. 853-856
    • Prahallad, K.1    Black, A.W.2    Mosur, R.3
  • 45
    • 0033906251 scopus 로고    scopus 로고
    • MDL-based context-dependent subword modeling for speech recognition
    • Japan (E) Mar.
    • K. Shinoda and T.Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Japan (E), vol. 21, pp. 79-86, Mar. 2000.
    • (2000) J. Acoust. Soc. , vol.21 , pp. 79-86
    • Shinoda, K.1    Watanabe, T.2
  • 46
    • 67650854725 scopus 로고    scopus 로고
    • Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
    • Jan.
    • J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.1 , pp. 66-83
    • Yamagishi, J.1    Kobayashi, T.2    Nakano, Y.3    Ogata, K.4    Isogai, J.5
  • 47
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
    • (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.1
  • 48
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multi-variate Gaussian mixture observations of Markov chains
    • Apr.
    • J. Gauvain and C. Lee, "Maximum a posteriori estimation for multi-variate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Process., vol. 2, pp. 291-298, Apr. 1994.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , pp. 291-298
    • Gauvain, J.1    Lee, C.2
  • 49
    • 0030189744 scopus 로고    scopus 로고
    • Speaker adaptation using combined transformation and Bayesian methods
    • Jul.
    • V. Digalakis and L. Neumeyer, "Speaker adaptation using combined transformation and Bayesian methods," IEEE Trans. Speech Audio Process., vol. 4, no. 4, pp. 294-300, Jul. 1996.
    • (1996) IEEE Trans. Speech Audio Process. , vol.4 , Issue.4 , pp. 294-300
    • Digalakis, V.1    Neumeyer, L.2
  • 50
    • 0009623939 scopus 로고
    • Flexible speaker adaptation using maximum likelihood linear regression
    • Morgan Kaufmann
    • C. Leggetter and P. Woodland, "Flexible speaker adaptation using maximum likelihood linear regression," in Proc. ARPA Spoken Lang. Technol. Workshop, 1995, pp. 104-109, Morgan Kaufmann.
    • (1995) Proc. ARPA Spoken Lang. Technol. Workshop , pp. 104-109
    • Leggetter, C.1    Woodland, P.2
  • 51
    • 0036461005 scopus 로고    scopus 로고
    • Structural maximum a posteriori linear regression for fast hmm adaptation
    • January
    • O. Siohan, T. Myrvoll, and C.-H. Lee, "Structural maximum a posteriori linear regression for fast hmm adaptation," Computer, Speech and Language, vol. 16, no. 1, pp. 5-24, January 2002.
    • (2002) Computer, Speech and Language , vol.16 , Issue.1 , pp. 5-24
    • Siohan, O.1    Myrvoll, T.2    Lee, C.-H.3
  • 52
    • 34547496746 scopus 로고    scopus 로고
    • Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis
    • Sep.
    • Y. Nakano, M. Tachibana, J. Yamagishi, and T. Kobayashi, "Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis," in Proc. ICSLP'06, Sep. 2006, pp. 2286-2289.
    • (2006) Proc. ICSLP'06 , pp. 2286-2289
    • Nakano, Y.1    Tachibana, M.2    Yamagishi, J.3    Kobayashi, T.4
  • 54
    • 33947639066 scopus 로고    scopus 로고
    • Hidden semi-Markov model based speech recognition system using weighted finite-state transducer
    • Toulouse, France May
    • K. Oura, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, "Hidden semi-Markov model based speech recognition system using weighted finite-state transducer," in Proc. ICASSP'06, Toulouse, France, May 2006, pp. 33-36.
    • (2006) Proc. ICASSP'06 , pp. 33-36
    • Oura, K.1    Zen, H.2    Nankaku, Y.3    Lee, A.4    Tokuda, K.5
  • 55
    • 70450169407 scopus 로고    scopus 로고
    • Speech recognition with speech synthesis models by marginalising over decision tree leaves
    • Brighton, U.K. Sep.
    • J. Dines, L. Saheer, and H. Liang, "Speech recognition with speech synthesis models by marginalising over decision tree leaves," in Proc. Interspeech, Brighton, U.K., Sep. 2009, pp. 1395-1398.
    • (2009) Proc. Interspeech , pp. 1395-1398
    • Dines, J.1    Saheer, L.2    Liang, H.3
  • 56
    • 67650819492 scopus 로고    scopus 로고
    • The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge
    • Sep.
    • J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, and K. Tokuda, "The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge," in Proc. Blizzard Challenge Workshop, Sep. 2008.
    • (2008) Proc. Blizzard Challenge Workshop
    • Yamagishi, J.1    Zen, H.2    Wu, Y.-J.3    Toda, T.4    Tokuda, K.5
  • 57
    • 44449177634 scopus 로고    scopus 로고
    • A hidden semi-Markov model-based speech synthesis system
    • May
    • H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "A hidden semi-Markov model-based speech synthesis system," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825-834, May 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
    • Zen, H.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 58
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech synthesis
    • Istanbul, Turkey
    • K. Tokuda, T. K. T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP'00, Istanbul, Turkey, 2000, pp. 1315-1318.
    • (2000) Proc. ICASSP'00 , pp. 1315-1318
    • Tokuda, K.1    Masuko, T.K.T.2    Kobayashi, T.3    Kitamura, T.4
  • 59
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • May
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, May 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 60
    • 70450201930 scopus 로고
    • DARPA February 1992 pilot corpus CSR dry run" benchmark test results
    • Harriman, NY Feb.
    • D. Pallet, "DARPA February 1992 pilot corpus CSR "dry run" benchmark test results," in Proc. Workshop Speech and Natural Language, Harriman, NY, Feb. 1992, pp. 382-386.
    • (1992) Proc. Workshop Speech and Natural Language , pp. 382-386
    • Pallet, D.1
  • 63
    • 4544386225 scopus 로고
    • Bootstrap estimates for confidence intervals in ASR performance evaluation
    • Montreal, QC, Canada May
    • M. Bisani and H. Ney, "Bootstrap estimates for confidence intervals in ASR performance evaluation," in Proc. ICASSP'94, Montreal, QC, Canada, May 1994, vol. 1, pp. 409-412.
    • (1994) Proc. ICASSP'94 , vol.1 , pp. 409-412
    • Bisani, M.1    Ney, H.2
  • 64
  • 65
    • 0019146354 scopus 로고
    • Correlation analysis of subjective and objective measures for speech quality
    • T. P. Barnwell, III, "Correlation analysis of subjective and objective measures for speech quality," in Proc. ICASSP'80, 1980, pp. 706-709.
    • (1980) Proc. ICASSP'80 , pp. 706-709
    • Barnwell Iii, T.P.1
  • 66
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • Aug.
    • S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 4, pp. 357-366, Aug. 1980.
    • (1980) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis, S.1    Mermelstein, P.2
  • 67
    • 84984366853 scopus 로고
    • Speech analysis-synthesis system and quality of synthesized speech using mel-cep-strum
    • Japanese Japan (Part I: Commun.)
    • T. Kitamura, S. Imai, C. Furuichi, and T. Kobayashi, "Speech analysis-synthesis system and quality of synthesized speech using mel-cep-strum," (in Japanese)Electron. Commun. Japan (Part I: Commun.), vol. 69, no. 10, pp. 47-54, 1986.
    • (1986) Electron. Commun , vol.69 , Issue.10 , pp. 47-54
    • Kitamura, T.1    Imai, S.2    Furuichi, C.3    Kobayashi, T.4
  • 68
    • 85016140477 scopus 로고
    • An adaptive algorithm for mel-cepstral analysis of speech
    • San Francisco, CA
    • T. Fukada, K. Tokuda, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. ICASSP'92, San Francisco, CA, 1992, pp. 137-140.
    • (1992) Proc. ICASSP'92 , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Imai, S.3
  • 69
    • 0027247004 scopus 로고
    • Mel-cepstral distance measure for objective speech quality assessment
    • Comput., Signal Process., May
    • R. Kubichek, "Mel-cepstral distance measure for objective speech quality assessment," in Proc. IEEE Pacific Rim Conf. Commun., Comput., Signal Process., May 1993, vol. 1, pp. 125-128.
    • (1993) Proc. IEEE Pacific Rim Conf. Commun. , vol.1 , pp. 125-128
    • Kubichek, R.1
  • 70
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • Nov.
    • T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 72
    • 34250618146 scopus 로고    scopus 로고
    • [Online]. Available
    • The CMU Pronouncing Dictionary. [Online]. Available: http://www. speech.cs.cmu.edu/cgi-bin/cmudict
    • The CMU Pronouncing Dictionary
  • 73
    • 85030493378 scopus 로고    scopus 로고
    • Synthesis of regional English using a keyword lexicon
    • Sep.
    • S. Fitt and S. Isard, "Synthesis of regional English using a keyword lexicon," in Proc. Eurospeech, Sep. 1999, vol. 2, pp. 823-826.
    • (1999) Proc. Eurospeech , vol.2 , pp. 823-826
    • Fitt, S.1    Isard, S.2
  • 74
    • 0028515984 scopus 로고
    • Experimental evaluation of features for robust speaker identification
    • Oct.
    • D. A. Reynolds, "Experimental evaluation of features for robust speaker identification," IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 639-643, Oct. 1994.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 639-643
    • Reynolds, D.A.1
  • 75
    • 70450183638 scopus 로고    scopus 로고
    • Measuring the gap between HMM-based ASR and TTS
    • Brighton, U.K. Sep.
    • J. Dines, J. Yamagishi, and S. King, "Measuring the gap between HMM-based ASR and TTS," in Proc. Interspeech, Brighton, U.K., Sep. 2009, pp. 1391-1394.
    • (2009) Proc. Interspeech , pp. 1391-1394
    • Dines, J.1    Yamagishi, J.2    King, S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.