SCOPUS 정보 검색 플랫폼

IEEE Journal on Selected Topics in Signal Processing

Volumn 4, Issue 6, 2010, Pages 1046-1058

Measuring the gap between HMM-based ASR and TTS

(3) Dines, John a Yamagishi, Junichi b King, Simon b

a IDIAP RESEARCH INSTITUTE (Switzerland)

b UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Speech recognition; speech synthesis; unified models

Indexed keywords

ACOUSTIC FEATURES; AUTOMATIC SPEECH RECOGNITION; EUROPEAN PROJECT; SPEAKER ADAPTATION; SPEECH-TO-SPEECH TRANSLATION; STATISTICAL MODELING; STATISTICAL MODELS; SYSTEM DESIGN; TEXT TO SPEECH SYNTHESIS; TTS SYSTEMS; UNIFIED MODEL; UNIFIED MODELING;

CHARACTER RECOGNITION; HIDDEN MARKOV MODELS; SPEECH SYNTHESIS; TELEPHONE SYSTEMS;

SPEECH RECOGNITION;

EID: 77953728395 PISSN: 19324553 EISSN: None Source Type: Journal
DOI: 10.1109/JSTSP.2010.2079315 Document Type: Article

Times cited : (11)

References (75)

1
- 84966341178
- The impact of speech recognition on speech synthesis
- Santa Monica, CA, Sep.
- M. Ostendorf and I. Bulyko, "The impact of speech recognition on speech synthesis," in Proc. IEEE Workshop Speech Synth., Santa Monica, CA, Sep. 2002, pp. 99-106.
- (2002) Proc. IEEE Workshop Speech Synth. , pp. 99-106
- Ostendorf, M.¹ Bulyko, I.²

2
- 70349227947
- The application of hidden Markov models in speech recognition
- M. Gales and S. Young, "The application of hidden Markov models in speech recognition," Foundat. Trends Signal Process., vol. 1, no. 3, pp. 195-304, 2007.
- (2007) Foundat. Trends Signal Process. , vol.1 , Issue.3 , pp. 195-304
- Gales, M.¹ Young, S.²

3
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Commun., 2009, 10.1016/j.specom.2009.04.004.
- (2009) Speech Commun. 2009, 10.1016/j.specom. , pp. 04004
- Zen, H.¹ Tokuda, K.² Black, A.W.³

4
- 84867203039
- Unsupervised adaptation for HMM-based speech synthesis
- Sep.
- S. King, K. Tokuda, H. Zen, and J. Yamagishi, "Unsupervised adaptation for HMM-based speech synthesis," in Proc. Interspeech'08, Sep. 2008, pp. 1869-1872.
- (2008) Proc. Interspeech'08 , pp. 1869-1872
- King, S.¹ Tokuda, K.² Zen, H.³ Yamagishi, J.⁴

5
- 70450185735
- Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models
- Brighton, U.K. Sep.
- M. Gibson, "Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models," in Proc. Interspeech, Brighton, U.K., Sep. 2009, pp. 1791-1794.
- (2009) Proc. Interspeech , pp. 1791-1794
- Gibson, M.¹

6
- 78049369783
- A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis
- H. Liang, J. Dines, and L. Saheer, "A comparison of supervised and unsupervised cross-lingual speaker adaptation approaches for HMM-based speech synthesis," in Proc. ICASSP, Dallas, TX, 2010, pp. 4598-4601.
- (2010) Proc. ICASSP, Dallas, TX , pp. 4598-4601
- Liang, H.¹ Dines, J.² Saheer, L.³

7
- 0142192295
- Conditional random fields: Probabilistic models for segmenting and labeling sequence data
- Williamstown, MA
- J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. ICML, Williamstown, MA, 2001, pp. 282-289.
- (2001) Proc. ICML , pp. 282-289
- Lafferty, J.¹ McCallum, A.² Pereira, F.³

8
- 0003572996
- Ph.D. dissertation. Carneggie-Mellon Univ., Pittsburgh, PA
- P. Brown, "The Acoustic-Modeling Problem in Automatic Speech Recognition," Ph.D. dissertation, Carneggie-Mellon Univ., Pittsburgh, PA, 1987.
- (1987) The Acoustic-Modeling Problem in Automatic Speech Recognition
- Brown, P.¹

9
- 0036296863
- Minimum phone error and I-smoothing for improved discriminative training
- D. Povey and P. C. Woodland, "Minimum phone error and I-smoothing for improved discriminative training," in Proc. ICASSP, Orlando, FL, 2002, pp. 105-108.
- (2002) Proc. ICASSP, Orlando, FL , pp. 105-108
- Povey, D.¹ Woodland, P.C.²

10
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Toulouse, France
- Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP, Toulouse, France, 2006, pp. 89-92.
- (2006) Proc. ICASSP , pp. 89-92
- Wu, Y.-J.¹ Wang, R.-H.²

11
- 0024610919
- A tutorial on hidden Markov models and selected appi-cations in speech recognition
- Feb.
- L. R. Rabiner, "A tutorial on hidden Markov models and selected appi-cations in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
- (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
- Rabiner, L.R.¹

12
- 38149010136
- A hidden Markov model approach to speech synthesis
- Paris, France
- A. Falaschi, M. Giustiniani, and M. Verola, "A hidden Markov model approach to speech synthesis," in Proc. Eurospeech, Paris, France, 1989, pp. 187-190.
- (1989) Proc. Eurospeech , pp. 187-190
- Falaschi, A.¹ Giustiniani, M.² Verola, M.³

13
- 0019555090
- Cepstral analysis technique for automatic speaker verification
- Apr.
- S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 2, pp. 254-272, Apr. 1981.
- (1981) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-29 , Issue.2 , pp. 254-272
- Furui, S.¹

14
- 74149089478
- Hidden semi-Markov models
- Feb.
- S.-Z. Yu, "Hidden semi-Markov models," Artificial Intell., vol. 174, no. 2, pp. 215-243, Feb. 2009.
- (2009) Artificial Intell. , vol.174 , Issue.2 , pp. 215-243
- Yu, S.-Z.¹

15
- 85009231267
- Trajectory modeling based on HMMs with explicit relationship between static and dynamic features
- Geneva, Switzerland
- K. Tokuda, H. Zen, and T. Kitamura, "Trajectory modeling based on HMMs with explicit relationship between static and dynamic features," in Proc. Eurospeech, Geneva, Switzerland, 2003, pp. 865-868.
- (2003) Proc. Eurospeech , pp. 865-868
- Tokuda, K.¹ Zen, H.² Kitamura, T.³

16
- 0026854213
- A generalised hidden Markov model with state-conditioned trend functions of time for the speech signal
- Apr.
- L. Deng, "A generalised hidden Markov model with state-conditioned trend functions of time for the speech signal," Signal Process., vol. 27, pp. 65-78, Apr. 1992.
- (1992) Signal Process. , vol.27 , pp. 65-78
- Deng, L.¹

17
- 0034854701
- Trainable speech synthesis with trended hidden Markov models
- J. Dines,S. Sridharan, andM. Moody, "Trainable speech synthesis with trended hidden Markov models," in Proc. ICASSP, Salt Lake City, UT, 2001, pp. 833-836.
- (2001) Proc. ICASSP, Salt Lake City, UT , pp. 833-836
- Dines, J.¹ Sridharan, S.² Moody, M.³

18
- 0023211846
- Explicit time correlation in hidden Markov models for speech recognition
- C. Wellekens, "Explicit time correlation in hidden Markov models for speech recognition," in Proc. ICASSP, Dallas, TX, 1987, vol. 12, pp. 384-386.
- (1987) Proc. ICASSP, Dallas, TX , vol.12 , pp. 384-386
- Wellekens, C.¹

19
- 70450175584
- Autoregressive HMMs for speech synthesis
- Brighton, U.K.
- M. Shannon and W. Byrne, "Autoregressive HMMs for speech synthesis," in Proc. Interspeech, Brighton, U.K., 2009.
- (2009) Proc. Interspeech
- Shannon, M.¹ Byrne, W.²

20
- 0003911245
- A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics
- L. Deng and J. Ma, "A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics," in Proc. Eurospeech, Budapest, Hungary, 1999, pp. 1499-1502.
- (1999) Proc. Eurospeech, Budapest, Hungary , pp. 1499-1502
- Deng, L.¹ Ma, J.²

21
- 54349106040
- Switching linear dynamical systems for noise robust speech recognition
- Aug.
- B. Mesot and D. Barber, "Switching linear dynamical systems for noise robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1850-1858, Aug. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.6 , pp. 1850-1858
- Mesot, B.¹ Barber, D.²

22
- 79959849719
- Autoregressive clustering for HMM speech synthesis
- Makuhari, Japan
- M. Shannon and W. Byrne, "Autoregressive clustering for HMM speech synthesis," in Proc. Interspeech, Makuhari, Japan, 2010.
- (2010) Proc. Interspeech
- Shannon, M.¹ Byrne, W.²

23
- 78649270883
- Learning deep architectures for AI univ. de montréal montreal QC Canada
- Y. Bengio, Learning Deep Architectures for AI Univ. de Montréal, Montreal, QC, Canada, Tech. Rep. 1312, 2007.
- (2007) Tech Rep. , vol.1312
- Bengio, Y.¹

24
- 33745805403
- A fast learning algorithm for deep belief nets
- G. E. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, pp. 1527-1554, 2006.
- (2006) Neural Comput. , vol.18 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.³

25
- 78649297301
- Deep belief networks for phone recognition
- Whistler, Canada
- A.-R. Mohamed, G. Dahl, and G. Hinton, "Deep belief networks for phone recognition," in Proc. NIPS Workshop Deep Learn. Speech Recogn. Rel. Applicat., Whistler, Canada, 2009.
- (2009) Proc. NIPS Workshop Deep Learn. Speech Recogn. Rel. Applicat.
- Mohamed, A.-R.¹ Dahl, G.² Hinton, G.³

26
- 78649277342
- Decision trees do not generalize to new variations Univ. de Montréal Montreal QC Canada
- Y. Bengio, O. Delalleau, and C. Simard, Decision trees do not generalize to new variations Univ. de Montréal, Montreal, QC, Canada, Tech. Rep. 1304, 2006.
- (2006) Tech. Rep. , vol.1304
- Bengio, Y.¹ Delalleau, O.² Simard, C.³

27
- 51449118125
- Acoustic modeling with contextual additive structure for HMM-based speech recognition
- Y. Nankaku, K. Nakamura, H. Zen, T. Toda, and K. Tokuda, "Acoustic modeling with contextual additive structure for HMM-based speech recognition," in Proc. ICASSP, Las Vegas, NV, 2008, pp. 4469-4472.
- (2008) Proc. ICASSP, Las Vegas, NV , pp. 4469-4472
- Nankaku, Y.¹ Nakamura, K.² Zen, H.³ Toda, T.⁴ Tokuda, K.⁵

28
- 0003822743
- Cambridge, U.K.: Cambridge Univ. Eng. Dept. Dec.
- S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book, 3rd ed. Cambridge, U.K.: Cambridge Univ. Eng. Dept., Dec. 2006.
- (2006) The HTK Book, 3rd Ed
- Young, S.¹ Evermann, G.² Gales, M.³ Hain, T.⁴ Kershaw, D.⁵ Liu, X.⁶ Moore, G.⁷ Odell, J.⁸ Ollason, D.⁹ Povey, D.¹⁰ Valtchev, V.¹¹ Woodland, P.¹²

29
- 79952258981
- [Online] Available
- K. Tokuda, H. Zen, J. Yamagishi, T. Masuko, S. Sako, A. Black, and T. Nose, The HMM-Based Speech Synthesis System (HTS). [Online]. Available: http://hts.sp.nitech.ac.jp/
- The HMM-Based Speech Synthesis System (HTS)
- Tokuda, K.¹ Zen, H.² Yamagishi, J.³ Masuko, T.⁴ Sako, S.⁵ Black, A.⁶ Nose, T.⁷

30
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Ki-tamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Ki-Tamura, T.⁵

31
- 78649273643
- HMM-based approach to multilingual speech synthesis, Text to Speech Synthesis: New Paradigms and Advances
- S. Narayanan andA. Alwan, Eds NJ: Prentice-Hall
- K. Tokuda, H. Zen, and A. W. Black, "HMM-based approach to multilingual speech synthesis," in Text to Speech Synthesis: New Paradigms and Advances, S. Narayanan andA. Alwan, Eds. Upper Saddle River, NJ: Prentice-Hall, 2004.
- (2004) Upper Saddle River
- Tokuda, K.¹ Zen, H.² Black, A.W.³

32
- 85008006694
- A robust speaker-adaptive HMM-based text-to-speech synthesis
- Aug.
- J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling,T. Toda,K.Tokuda, S. King, and S. Renals, "A robust speaker-adaptive HMM-based text-to-speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 6, pp. 1208-1230, Aug. 2009.
- (2009) IEEE. Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

33
- 77249139677
- An HMM-based Mandarin Chinese text-to-speech system
- Dec.
- Y. Qian, F. Soong, Y. Chen, and M. Chu, "An HMM-based Mandarin Chinese text-to-speech system," in Proc. ISCSLP'06, Dec. 2006, pp. 223-232.
- (2006) Proc. ISCSLP'06 , pp. 223-232
- Qian, Y.¹ Soong, F.² Chen, Y.³ Chu, M.⁴

34
- 19944397218
- Ph.D. dissertation. Cambridge Univ., Cambridge, U.K.
- T. Hain, "Hidden model sequence models for automatic speech recognition," Ph.D. dissertation, Cambridge Univ., Cambridge, U.K., 2001.
- (2001) Hidden Model Sequence Models for Automatic Speech Recognition
- Hain, T.¹

35
- 0030672098
- Hybrid HMM-ANN systems for training independent tasks: Experiments on phonebook and related improvements
- Munich, Germany Apr.
- S. Dupont, H. Bourlard, O. Deroo, V. Fontaine, and J.-M. Boite, "Hybrid HMM-ANN systems for training independent tasks: Experiments on phonebook and related improvements," in Proc. ICASSP, Munich, Germany, Apr. 1997, pp. 1767-1770.
- (1997) Proc. ICASSP , pp. 1767-1770
- Dupont, S.¹ Bourlard, H.² Deroo, O.³ Fontaine, V.⁴ Boite, J.-M.⁵

36
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, no. 4, pp. 1738-1752, 1990.
- (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

37
- 85131821539
- Mel-generalized cepstral analysis\A unified approach to speech spectral estimation
- Sep.
- K. Koishida, G. Hirabayashi, K. Tokuda, and T. Kobayashi, "Mel-generalized cepstral analysis\A unified approach to speech spectral estimation," in Proc. ICSLP, Yokohama, Japan, Sep. 1994, vol. 3, pp. 1043-1046.
- (1994) Proc. ICSLP, Yokohama, Japan , vol.3 , pp. 1043-1046
- Koishida, K.¹ Hirabayashi, G.² Tokuda, K.³ Kobayashi, T.⁴

38
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, pp. 187-207, 1999.
- (1999) Speech Commun. , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.³

39
- 59849090295
- Combining spectral representations for large vocabulary continuous speech recognition
- Mar.
- G. Garau and S. Renals, "Combining spectral representations for large vocabulary continuous speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 3, pp. 508-518, Mar. 2008.
- (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.3 , pp. 508-518
- Garau, G.¹ Renals, S.²

40
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- Feb.
- J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training," IEICE Trans. Inf. Syst, vol. E90-D, no. 2, pp. 533-543, Feb. 2007.
- (2007) IEICE Trans. Inf. Syst , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

41
- 33645781551
- Evaluation of a speech recognition/generation method based on HMM and STRAIGHT
- T. Irino, Y. Minami, T. Nakatani, M. Tsuzaki, and H. Tagawa, "Evaluation of a speech recognition/generation method based on HMM and STRAIGHT," in Proc. ICSLP, Denver, CO, 2002, pp. 2545-2548.
- (2002) Proc. ICSLP, Denver, CO , pp. 2545-2548
- Irino, T.¹ Minami, Y.² Nakatani, T.³ Tsuzaki, M.⁴ Tagawa, H.⁵

42
- 0003805597
- Ph.D. dissertation Queens College, Univ. of Cambridge, Cambridge, U.K.
- J. J. Odell, "The use of context in large vocabulary continuous speech recognition," Ph.D. dissertation, Queens College, Univ. of Cambridge, Cambridge, U.K., 1995.
- (1995) The Use of Context in Large Vocabulary Continuous Speech Recognition
- Odell, J.J.¹

43
- 85135145174
- Acoustic modeling based on the MDL criterion for speech recognition
- K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," in Proc. Eurospeech, Rhodes, Greece, 1997, vol. 1, pp. 99-102.
- (1997) Proc. Eurospeech, Rhodes, Greece , vol.1 , pp. 99-102
- Shinoda, K.¹ Watanabe, T.²

44
- 33947674781
- Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis
- Toulouse, France
- K. Prahallad, A. W. Black, and R. Mosur, "Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis," in Proc. ICASSP, Toulouse, France, 2006, pp. 853-856.
- (2006) Proc. ICASSP , pp. 853-856
- Prahallad, K.¹ Black, A.W.² Mosur, R.³

45
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- Japan (E) Mar.
- K. Shinoda and T.Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Japan (E), vol. 21, pp. 79-86, Mar. 2000.
- (2000) J. Acoust. Soc. , vol.21 , pp. 79-86
- Shinoda, K.¹ Watanabe, T.²

46
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- Jan.
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

47
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
- (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
- Gales, M.¹

48
- 0028419019
- Maximum a posteriori estimation for multi-variate Gaussian mixture observations of Markov chains
- Apr.
- J. Gauvain and C. Lee, "Maximum a posteriori estimation for multi-variate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Process., vol. 2, pp. 291-298, Apr. 1994.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , pp. 291-298
- Gauvain, J.¹ Lee, C.²

49
- 0030189744
- Speaker adaptation using combined transformation and Bayesian methods
- Jul.
- V. Digalakis and L. Neumeyer, "Speaker adaptation using combined transformation and Bayesian methods," IEEE Trans. Speech Audio Process., vol. 4, no. 4, pp. 294-300, Jul. 1996.
- (1996) IEEE Trans. Speech Audio Process. , vol.4 , Issue.4 , pp. 294-300
- Digalakis, V.¹ Neumeyer, L.²

50
- 0009623939
- Flexible speaker adaptation using maximum likelihood linear regression
- Morgan Kaufmann
- C. Leggetter and P. Woodland, "Flexible speaker adaptation using maximum likelihood linear regression," in Proc. ARPA Spoken Lang. Technol. Workshop, 1995, pp. 104-109, Morgan Kaufmann.
- (1995) Proc. ARPA Spoken Lang. Technol. Workshop , pp. 104-109
- Leggetter, C.¹ Woodland, P.²

51
- 0036461005
- Structural maximum a posteriori linear regression for fast hmm adaptation
- January
- O. Siohan, T. Myrvoll, and C.-H. Lee, "Structural maximum a posteriori linear regression for fast hmm adaptation," Computer, Speech and Language, vol. 16, no. 1, pp. 5-24, January 2002.
- (2002) Computer, Speech and Language , vol.16 , Issue.1 , pp. 5-24
- Siohan, O.¹ Myrvoll, T.² Lee, C.-H.³

52
- 34547496746
- Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis
- Sep.
- Y. Nakano, M. Tachibana, J. Yamagishi, and T. Kobayashi, "Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis," in Proc. ICSLP'06, Sep. 2006, pp. 2286-2289.
- (2006) Proc. ICSLP'06 , pp. 2286-2289
- Nakano, Y.¹ Tachibana, M.² Yamagishi, J.³ Kobayashi, T.⁴

53
- 0030362995
- A compact model for speaker-adaptive training
- Oct.
- T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, "A compact model for speaker-adaptive training," in Proc. ICSLP'96, Oct. 1996, pp. 1137-1140.
- (1996) Proc. ICSLP'96 , pp. 1137-1140
- Anastasakos, T.¹ McDonough, J.² Schwartz, R.³ Makhoul, J.⁴

54
- 33947639066
- Hidden semi-Markov model based speech recognition system using weighted finite-state transducer
- Toulouse, France May
- K. Oura, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, "Hidden semi-Markov model based speech recognition system using weighted finite-state transducer," in Proc. ICASSP'06, Toulouse, France, May 2006, pp. 33-36.
- (2006) Proc. ICASSP'06 , pp. 33-36
- Oura, K.¹ Zen, H.² Nankaku, Y.³ Lee, A.⁴ Tokuda, K.⁵

55
- 70450169407
- Speech recognition with speech synthesis models by marginalising over decision tree leaves
- Brighton, U.K. Sep.
- J. Dines, L. Saheer, and H. Liang, "Speech recognition with speech synthesis models by marginalising over decision tree leaves," in Proc. Interspeech, Brighton, U.K., Sep. 2009, pp. 1395-1398.
- (2009) Proc. Interspeech , pp. 1395-1398
- Dines, J.¹ Saheer, L.² Liang, H.³

56
- 67650819492
- The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge
- Sep.
- J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, and K. Tokuda, "The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge," in Proc. Blizzard Challenge Workshop, Sep. 2008.
- (2008) Proc. Blizzard Challenge Workshop
- Yamagishi, J.¹ Zen, H.² Wu, Y.-J.³ Toda, T.⁴ Tokuda, K.⁵

57
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- May
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "A hidden semi-Markov model-based speech synthesis system," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825-834, May 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

58
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- Istanbul, Turkey
- K. Tokuda, T. K. T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP'00, Istanbul, Turkey, 2000, pp. 1315-1318.
- (2000) Proc. ICASSP'00 , pp. 1315-1318
- Tokuda, K.¹ Masuko, T.K.T.² Kobayashi, T.³ Kitamura, T.⁴

59
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- May
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, May 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

60
- 70450201930
- DARPA February 1992 pilot corpus CSR dry run" benchmark test results
- Harriman, NY Feb.
- D. Pallet, "DARPA February 1992 pilot corpus CSR "dry run" benchmark test results," in Proc. Workshop Speech and Natural Language, Harriman, NY, Feb. 1992, pp. 382-386.
- (1992) Proc. Workshop Speech and Natural Language , pp. 382-386
- Pallet, D.¹

61
- 70450161300
- Thousands of voices for HMM-based speech synthesis
- Brighton, U.K. Sep.
- J. Yamagishi, B. Usabaev, S. King, O. Watts, J. Dines, J. Tian, R. Hu, K. Oura, K. Tokuda, R. Karhila, and M. Kurimo, "Thousands of voices for HMM-based speech synthesis," in Proc. Interspeech, Brighton, U.K., Sep. 2009, pp. 420-423.
- (2009) Proc. Interspeech , pp. 420-423
- Yamagishi, J.¹ Usabaev, B.² King, S.³ Watts, O.⁴ Dines, J.⁵ Tian, J.⁶ Hu, R.⁷ Oura, K.⁸ Tokuda, K.⁹ Karhila, R.¹⁰ Kurimo, M.¹¹

62
- 77953708096
- Thousands of voices for HMM-based speech synthesis-analysis and application of TTS systems built on various ASR corpora
- Jul.
- J. Yamagishi, B. Usabaev, S. King, O. Watts, J. Dines, J. Tian, R. Hu, K. Oura, K. Tokuda, R. Karhila, and M. Kurimo, "Thousands of voices for HMM-based speech synthesis-analysis and application of TTS systems built on various ASR corpora," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 5, pp. 984-1004, Jul. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.5 , pp. 984-1004
- Yamagishi, J.¹ Usabaev, B.² King, S.³ Watts, O.⁴ Dines, J.⁵ Tian, J.⁶ Hu, R.⁷ Oura, K.⁸ Tokuda, K.⁹ Karhila, R.¹⁰ Kurimo, M.¹¹

63
- 4544386225
- Bootstrap estimates for confidence intervals in ASR performance evaluation
- Montreal, QC, Canada May
- M. Bisani and H. Ney, "Bootstrap estimates for confidence intervals in ASR performance evaluation," in Proc. ICASSP'94, Montreal, QC, Canada, May 1994, vol. 1, pp. 409-412.
- (1994) Proc. ICASSP'94 , vol.1 , pp. 409-412
- Bisani, M.¹ Ney, H.²

64
- 0017097474
- Distance measures for speech processing
- Oct.
- A. Gray, Jr. and J. Markel, "Distance measures for speech processing," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, no. 5, pp. 380-391, Oct. 1976.
- (1976) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-24 , Issue.5 , pp. 380-391
- A. Jr. Gray¹ Markel, J.²

65
- 0019146354
- Correlation analysis of subjective and objective measures for speech quality
- T. P. Barnwell, III, "Correlation analysis of subjective and objective measures for speech quality," in Proc. ICASSP'80, 1980, pp. 706-709.
- (1980) Proc. ICASSP'80 , pp. 706-709
- Barnwell Iii, T.P.¹

66
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Aug.
- S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 4, pp. 357-366, Aug. 1980.
- (1980) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-28 , Issue.4 , pp. 357-366
- Davis, S.¹ Mermelstein, P.²

67
- 84984366853
- Speech analysis-synthesis system and quality of synthesized speech using mel-cep-strum
- Japanese Japan (Part I: Commun.)
- T. Kitamura, S. Imai, C. Furuichi, and T. Kobayashi, "Speech analysis-synthesis system and quality of synthesized speech using mel-cep-strum," (in Japanese)Electron. Commun. Japan (Part I: Commun.), vol. 69, no. 10, pp. 47-54, 1986.
- (1986) Electron. Commun , vol.69 , Issue.10 , pp. 47-54
- Kitamura, T.¹ Imai, S.² Furuichi, C.³ Kobayashi, T.⁴

68
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- San Francisco, CA
- T. Fukada, K. Tokuda, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. ICASSP'92, San Francisco, CA, 1992, pp. 137-140.
- (1992) Proc. ICASSP'92 , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Imai, S.³

69
- 0027247004
- Mel-cepstral distance measure for objective speech quality assessment
- Comput., Signal Process., May
- R. Kubichek, "Mel-cepstral distance measure for objective speech quality assessment," in Proc. IEEE Pacific Rim Conf. Commun., Comput., Signal Process., May 1993, vol. 1, pp. 125-128.
- (1993) Proc. IEEE Pacific Rim Conf. Commun. , vol.1 , pp. 125-128
- Kubichek, R.¹

70
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- Nov.
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

71
- 67650790758
- The blizzard challenge 2008
- Brisbane, Australia Sep.
- V.Karaiskos,S. King, R. A. J. Clark, and C. Mayo, "The Blizzard Challenge 2008," in Proc. Blizzard Challenge Workshop, Brisbane, Australia, Sep. 2008.
- (2008) Proc. Blizzard Challenge Workshop
- King, V.KaraiskosS.¹ Clark, R.A.J.² Mayo, C.³

72
- 34250618146
- [Online]. Available
- The CMU Pronouncing Dictionary. [Online]. Available: http://www. speech.cs.cmu.edu/cgi-bin/cmudict
- The CMU Pronouncing Dictionary

73
- 85030493378
- Synthesis of regional English using a keyword lexicon
- Sep.
- S. Fitt and S. Isard, "Synthesis of regional English using a keyword lexicon," in Proc. Eurospeech, Sep. 1999, vol. 2, pp. 823-826.
- (1999) Proc. Eurospeech , vol.2 , pp. 823-826
- Fitt, S.¹ Isard, S.²

74
- 0028515984
- Experimental evaluation of features for robust speaker identification
- Oct.
- D. A. Reynolds, "Experimental evaluation of features for robust speaker identification," IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 639-643, Oct. 1994.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 639-643
- Reynolds, D.A.¹

75
- 70450183638
- Measuring the gap between HMM-based ASR and TTS
- Brighton, U.K. Sep.
- J. Dines, J. Yamagishi, and S. King, "Measuring the gap between HMM-based ASR and TTS," in Proc. Interspeech, Brighton, U.K., Sep. 2009, pp. 1391-1394.
- (2009) Proc. Interspeech , pp. 1391-1394
- Dines, J.¹ Yamagishi, J.² King, S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.