SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2013, Pages 7962-7966

Statistical parametric speech synthesis using deep neural networks

(3) Ze, Heiga a Senior, Andrew a Schuster, Mike a

a GOOGLE INC (United States)

Author keywords

Deep neural network; Hidden Markov model; Statistical parametric speech synthesis

Indexed keywords

CONTEXT DEPENDENCY; CONTEXT DEPENDENT; CONVENTIONAL APPROACH; DEEP NEURAL NETWORKS; HIDDEN MARKOV MODELS (HMMS); HMM-BASED SYSTEMS; PROBABILITY DENSITIES; STATISTICAL PARAMETRIC SPEECH SYNTHESIS;

DECISION TREES; HIDDEN MARKOV MODELS; PROBABILITY; SIGNAL PROCESSING; SPEECH SYNTHESIS;

NEURAL NETWORKS;

EID: 84890490547 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2013.6639215 Document Type: Conference Paper

Times cited : (838)

References (40)

1
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

2
- 0029765811
- Unit selection in a concatenative speech syn-thesis system using a large speech database
- A. Hunt and A. Black, "Unit selection in a concatenative speech syn-thesis system using a large speech database," in Proc. ICASSP, 1996, pp. 373-376.
- (1996) Proc. ICASSP , pp. 373-376
- Hunt, A.¹ Black, A.²

3
- 0034842740
- Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
- M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR," in Proc. ICASSP, 2001, pp. 805-808.
- (2001) Proc. ICASSP , pp. 805-808
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

4
- 85135145847
- Speaker interpolation in HMM-based speech synthesis system
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Speaker interpolation in HMM-based speech synthesis system," in Proc. Eurospeech, 1997, pp. 2523-2526.
- (1997) Proc. Eurospeech , pp. 2523-2526
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

5
- 85009257840
- Eigenvoices for HMM-based speech synthesis
- K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Eigenvoices for HMM-based speech synthesis," in Proc. ICSLP, 2002, pp. 1269-1272.
- (2002) Proc. ICSLP , pp. 1269-1272
- Shichiri, K.¹ Sawabe, A.² Tokuda, K.³ Masuko, T.⁴ Kobayashi, T.⁵ Kitamura, T.⁶

6
- 51449114529
- A style control technique for HMM-based expressive speech synthesis
- T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.9 , pp. 1406-1413
- Nose, T.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

7
- 33846462839
- Miniaturization of HMM-based speech synthesis
- (in Japanese)
- Y. Morioka, S. Kataoka, H. Zen, Y. Nankaku, K. Tokuda, and T. Ki-tamura, "Miniaturization of HMM-based speech synthesis," in Proc. Autumn Meeting of ASJ, 2004, pp. 325-326, (in Japanese).
- (2004) Proc. Autumn Meeting of ASJ , pp. 325-326
- Morioka, Y.¹ Kataoka, S.² Zen, H.³ Nankaku, Y.⁴ Tokuda, K.⁵ Ki-Tamura, T.⁶

8
- 33846935000
- HMM-based Korean speech synthesis system for hand-held devices
- S.-J. Kim, J.-J. Kim, and M.-S. Hahn, "HMM-based Korean speech synthesis system for hand-held devices," IEEE Trans. Consum. Elec-tron., vol. 52, no. 4, pp. 1384-1390, 2006.
- (2006) IEEE Trans. Consum. Elec-Tron. , vol.52 , Issue.4 , pp. 1384-1390
- Kim, S.-J.¹ Kim, J.-J.² Hahn, M.-S.³

9
- 79959839868
- Quantized HMMs for low footprint text-to-speech synthesis
- A. Gutkin, X. Gonzalvo, S. Breuer, and P. Taylor, "Quantized HMMs for low footprint text-to-speech synthesis," in Proc. Interspeech, 2010, pp. 837-840.
- (2010) Proc. Interspeech , pp. 837-840
- Gutkin, A.¹ Gonzalvo, X.² Breuer, S.³ Taylor, P.⁴

10
- 85008006694
- Robust speaker-adaptive HMM-based text-to-speech synthesis
- J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, "Robust speaker-adaptive HMM-based text-to-speech synthesis," IEEE Trans. Audio Speech Lang. Process., vol. 17, no. 6, pp. 1208-1230, 2009.
- (2009) IEEE Trans. Audio Speech Lang. Process. , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

11
- 67651002140
- Statistical parametric speech syn-thesis
- H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech syn-thesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

12
- 84966348891
- An HMM-based speech synthesis system applied to English
- K. Tokuda, H. Zen, and A. Black, "An HMM-based speech synthesis system applied to English," in Proc. IEEE Speech Synthesis Workshop, 2002, CD-ROM Proceeding.
- Proc. IEEE Speech Synthesis Workshop, 2002, CD-ROM Proceeding
- Tokuda, K.¹ Zen, H.² Black, A.³

13
- 0003805597
- Ph.D. thesis, Cambridge University
- J. Odell, The use of context in large vocabulary speech recognition, Ph.D. thesis, Cambridge University, 1995.
- (1995) The Use of Context in Large Vocabulary Speech Recognition
- Odell, J.¹

14
- 85135145174
- Acoustic modeling based on the MDL criterion for speech recognition
- K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," in Proc. Eurospeech, 1997, pp. 99-102.
- (1997) Proc. Eurospeech , pp. 99-102
- Shinoda, K.¹ Watanabe, T.²

15
- 0032658258
- Decision tree state tying based on penalized Bayesian information criterion
- W. Chou and W. Reichl, "Decision tree state tying based on penalized Bayesian information criterion," in Proc. ICASSP, 1999, vol. 1, pp. 345-348.
- (1999) Proc. ICASSP , vol.1 , pp. 345-348
- Chou, W.¹ Reichl, W.²

16
- 33947650089
- HMM state clustering based on efficient cross-validation
- T. Shinozaki, "HMM state clustering based on efficient cross-validation," in Proc. ICASSP, 2006, pp. 1157-1160.
- (2006) Proc. ICASSP , pp. 1157-1160
- Shinozaki, T.¹

17
- 80051615235
- Decision tree-based context clustering based on cross validation and hierarchical priors
- H. Zen and M.J.F. Gales, "Decision tree-based context clustering based on cross validation and hierarchical priors," in Proc. ICASSP, 2011, pp. 4560-4563.
- (2011) Proc. ICASSP , pp. 4560-4563
- Zen, H.¹ Gales, M.J.F.²

18
- 34249043508
- Anytime learning of decision trees
- S. Esmeir and S. Markovitch, "Anytime learning of decision trees," J. Mach. Learn. Res., vol. 8, pp. 891-933, 2007.
- (2007) J. Mach. Learn. Res. , vol.8 , pp. 891-933
- Esmeir, S.¹ Markovitch, S.²

19
- 79955538498
- Context adaptive train-ing with factorized decision trees for HMM-based statistical parametric speech synthesis
- K. Yu, H. Zen, F. Mairesse, and S. Young, "Context adaptive train-ing with factorized decision trees for HMM-based statistical parametric speech synthesis," Speech Commun., vol. 53, no. 6, pp. 914-923, 2011.
- (2011) Speech Commun. , vol.53 , Issue.6 , pp. 914-923
- Yu, K.¹ Zen, H.² Mairesse, F.³ Young, S.⁴

20
- 69349090197
- Learning deep architectures for AI
- Y. Bengio, "Learning deep architectures for AI," Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
- (2009) Foundations and Trends in Machine Learning , vol.2 , Issue.1 , pp. 1-127
- Bengio, Y.¹

21
- 84877760312
- Large scale distributed deep networks
- J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, "Large scale distributed deep networks," in Proc. NIPS, 2012.
- (2012) Proc. NIPS
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Le, Q.⁶ Mao, M.⁷ Ranzato, M.⁸ Senior, A.⁹ Tucker, P.¹⁰ Yang, K.¹¹ Ng, A.¹²

22
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Process. Magazine, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Process. Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

23
- 84878403872
- Deep architectures for articulatory inversion
- B. Uria, I. Murray, S. Renals, and K. Richmond, "Deep architectures for articulatory inversion," in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Uria, B.¹ Murray, I.² Renals, S.³ Richmond, K.⁴

24
- 78049293342
- Speech synthesis with neural networks
- O. Karaali, G. Corrigan, and I. Gerson, "Speech synthesis with neural networks," in Proc. World Congress on Neural Networks, 1996, pp. 45-50.
- (1996) Proc. World Congress on Neural Networks , pp. 45-50
- Karaali, O.¹ Corrigan, G.² Gerson, I.³

25
- 84867200235
- Generating natural F0 trajectory with additive trees
- Y. Qian, H. Liang, and F. Soong, "Generating natural F0 trajectory with additive trees," in Proc. Interspeech, 2008, pp. 2126-2129.
- (2008) Proc. Interspeech , pp. 2126-2129
- Qian, Y.¹ Liang, H.² Soong, F.³

26
- 51449118125
- Acoustic modeling with contextual additive structure for HMM-based speech recognition
- Y. Nankaku, K. Nakamura, H. Zen, and K. Tokuda, "Acoustic modeling with contextual additive structure for HMM-based speech recognition," in Proc. ICASSP, 2008, pp. 4469-4472.
- (2008) Proc. ICASSP , pp. 4469-4472
- Nankaku, Y.¹ Nakamura, K.² Zen, H.³ Tokuda, K.⁴

27
- 70450153447
- Master thesis, Nagoya Institute of Technology, (in Japanese
- K. Saino, A clustering technique for factor analysis-based eigenvoice models, Master thesis, Nagoya Institute of Technology, 2008, (in Japanese).
- (2008) A Clustering Technique for Factor Analysis-based Eigenvoice Models
- Saino, K.¹

28
- 85008525798
- Product of experts for statistical parametric speech synthesis
- H. Zen, M. Gales, Y. Nankaku, and K. Tokuda, "Product of experts for statistical parametric speech synthesis," IEEE Trans. Audio Speech Lang. Process., vol. 20, no. 3, pp. 794-805, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , Issue.3 , pp. 794-805
- Zen, H.¹ Gales, M.² Nankaku, Y.³ Tokuda, K.⁴

29
- 78049376926
- Word-level emphasis modelling in HMM-based speech synthesis
- K. Yu, F. Mairesse, and S. Young, "Word-level emphasis modelling in HMM-based speech synthesis," in Proc. ICASSP, 2010, pp. 4238-4241.
- (2010) Proc. ICASSP , pp. 4238-4241
- Yu, K.¹ Mairesse, F.² Young, S.³

30
- 85032782045
- Deep learning and its applications to signal and information processing
- D. Yu and L. Deng, "Deep learning and its applications to signal and information processing," IEEE Signal Process. Magazine, vol. 28, no. 1, pp. 145-154, 2011.
- (2011) IEEE Signal Process. Magazine , vol.28 , Issue.1 , pp. 145-154
- Yu, D.¹ Deng, L.²

31
- 0022667694
- Speaker independent isolated word recognition using dy-namic features of speech spectrum
- S. Furui, "Speaker independent isolated word recognition using dy-namic features of speech spectrum," IEEE Trans. Acoust. Speech Signal Process., vol. 34, pp. 52-59, 1986.
- (1986) IEEE Trans. Acoust. Speech Signal Process. , vol.34 , pp. 52-59
- Furui, S.¹

32
- 0033708106
- Speech parameter generation algorithms for HMM-based speech syn-thesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech syn-thesis," in Proc. ICASSP, 2000, pp. 1315-1318.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

33
- 33846405723
- Details of the nitech hmm-based speech synthesis system for the blizzard challenge 2005
- H. Zen, T. Toda, M. Nakamura, and T. Tokuda, "Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005," IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, T.⁴

34
- 85016140477
- An adaptive algo-rithm for mel-cepstral analysis of speech
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algo-rithm for mel-cepstral analysis of speech," in Proc. ICASSP, 1992, pp. 137-140.
- (1992) Proc. ICASSP , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

35
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "A hidden semi-Markov model-based speech synthesis system," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825-834, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

36
- 0036522887
- Multi-space probability distribution HMM
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Multi-space probability distribution HMM," IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455-464, 2002.
- (2002) IEICE Trans. Inf. Syst. , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

37
- 85008023596
- Continuous F0 modelling for HMM based sta-tistical parametric speech synthesis
- K. Yu and S. Young, "Continuous F0 modelling for HMM based sta-tistical parametric speech synthesis," IEEE Trans. Audio Speech Lang. Process., vol. 19, no. 5, pp. 1071-1079, 2011.
- (2011) IEEE Trans. Audio Speech Lang. Process. , vol.19 , Issue.5 , pp. 1071-1079
- Yu, K.¹ Young, S.²

38
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

39
- 78049361102
- Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis," IEICE Trans. Inf. Syst., vol. J87-D-II, no. 8, pp. 1563-1571, 2004.
- (2004) IEICE Trans. Inf. Syst. , vol.J87-D-II , Issue.8 , pp. 1563-1571
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

40
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 8, pp. 2222-2235, 2007
- (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.