SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2014, Pages 3844-3848

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

(2) Zen, Heiga a Senior, Andrew a

a GOOGLE INC (United States)

Author keywords

deep neural networks; hidden Markov models; mixture density networks; Statistical parametric speech synthesis

Indexed keywords

HIDDEN MARKOV MODELS; MIXTURES; PROBABILITY DENSITY FUNCTION; SIGNAL PROCESSING;

ACOUSTIC FEATURES; DEEP NEURAL NETWORKS; MIXTURE DENSITY; OBJECTIVE AND SUBJECTIVE EVALUATIONS; OBJECTIVE FUNCTIONS; PREDICTION ACCURACY; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; SYNTHESIZED SPEECH;

SPEECH SYNTHESIS;

EID: 84905262874 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2014.6854321 Document Type: Conference Paper

Times cited : (215)

References (35)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commn., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commn. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

2
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

3
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. ICASSP, 1996
- (1996) Proc. ICASSP
- Hunt, A.¹ Black, A.²

4
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features
- H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features," Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007.
- (2007) Comput. Speech Lang. , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

5
- 84872190545
- Autoregressive models for statistical parametric speech synthesis
- M. Shannon, H. Zen, and W. Byrne, "Autoregressive models for statistical parametric speech synthesis," IEEE Trans. Acoust. Speech Lang. Process., vol. 21, no. 3, pp. 587-597, 2013.
- (2013) IEEE Trans. Acoust. Speech Lang. Process. , vol.21 , Issue.3 , pp. 587-597
- Shannon, M.¹ Zen, H.² Byrne, W.³

6
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP, 2006, pp. 89-92.
- (2006) Proc. ICASSP , pp. 89-92
- Wu, Y.-J.¹ Wang, R.-H.²

7
- 85008039410
- Improved prosody generation by maximizing joint probability of state and longer units
- Y. Qian, Z.-Z.Wu, B.-Y. Gao, and F. Soong, "Improved prosody generation by maximizing joint probability of state and longer units," IEEE Trans. Acoust. Speech Lang. Process., vol. 19, no. 6, pp. 1702-1710, 2011.
- (2011) IEEE Trans. Acoust. Speech Lang. Process , vol.19 , Issue.6 , pp. 1702-1710
- Qian, Y.¹ Wu, Z.-Z.² Gao, B.-Y.³ Soong, F.⁴

8
- 85008525798
- Product of experts for statistical parametric speech synthesis
- H. Zen, M. Gales, Y. Nankaku, and K. Tokuda, "Product of experts for statistical parametric speech synthesis," IEEE Trans. Audio Speech Lang. Process., vol. 20, no. 3, pp. 794-805, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , Issue.3 , pp. 794-805
- Zen, H.¹ Gales, M.² Nankaku, Y.³ Tokuda, K.⁴

9
- 84901817264
- Statistical parametric speech synthesis based on Gaussian process regression
- T. Koriyama, T. Nose, and T. Kobayashi, "Statistical parametric speech synthesis based on Gaussian process regression," IEEE Journal of Selected Topics in Signal Process., 2013.
- (2013) IEEE Journal of Selected Topics in Signal Process.
- Koriyama, T.¹ Nose, T.² Kobayashi, T.³

10
- 84890527090
- Multi-distribution deep belief network for speech synthesis
- S.-Y. Kang, X.-J. Qian, and H. Meng, "Multi-distribution deep belief network for speech synthesis," in Proc. ICASSP, 2013, pp. 8012-8016.
- (2013) Proc. ICASSP , pp. 8012-8016
- Kang X-J Qian, S.-Y.¹ Meng, H.²

11
- 84901237776
- Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Acoust. Speech Lang. Process., vol. 21, no. 10, pp. 2129-2139, 2013.
- (2013) IEEE Trans. Acoust. Speech Lang. Process. , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

12
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Process. Magazine, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Process. Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

13
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, 2013, pp. 7962-7966.
- (2013) Proc. ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

14
- 84890522099
- F0 contour prediction with a deep belief network-Gaussian process hybrid model
- R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "f0 contour prediction with a deep belief network-Gaussian process hybrid model," in Proc. ICASSP, 2013, pp. 6885-6889.
- (2013) Proc. ICASSP , pp. 6885-6889
- Fernandez, R.¹ Rendel, A.² Ramabhadran, B.³ Hoory, R.⁴

15
- 84929157442
- Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
- H. Lu, S. King, and O. Watts, "Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis," in Proc. ISCA SSW8, 2013, pp. 281-285.
- (2013) Proc. ISCA SSW8 , pp. 281-285
- Lu, H.¹ King, S.² Watts, O.³

16
- 84966348891
- An HMM-based speech synthesis system applied to English
- K. Tokuda, H. Zen, and A. Black, "An HMM-based speech synthesis system applied to English," in Proc. IEEE Speech Synthesis Workshop, 2002, CD-ROM Proceeding.
- Proc. IEEE Speech Synthesis Workshop, 2002, CD-ROM Proceeding
- Tokuda, K.¹ Zen, H.² Black, A.³

17
- 84994214710
- Deep learning in speech synthesis
- H. Zen, "Deep learning in speech synthesis," in Keynote speech given at ISCA SSW8, 2013, http://research.google.com/pubs/ archive/41539.pdf.
- (2013) Keynote Speech Given at ISCA SSW8
- Zen, H.¹

18
- 0004113976
- Tech. Rep. NCRG/94/004, Neural Computing Research Group, Aston University
- C. Bishop, "Mixture density networks," Tech. Rep. NCRG/94/004, Neural Computing Research Group, Aston University, 1994.
- (1994) Mixture Density Networks
- Bishop, C.¹

19
- 0008471243
- Ph.D. thesis, Nara Institute of Science and Technology
- M. Schuster, On supervised learning from sequential data with applications for speech recognition, Ph.D. thesis, Nara Institute of Science and Technology, 1999.
- (1999) On Supervised Learning from Sequential Data with Applications for Speech Recognition
- Schuster, M.¹

20
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, 2000, pp. 1315-1318.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

21
- 70450172128
- Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems
- K. Oura, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, "Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems," in Proc. Interspeech, 2009, pp. 1759-1762.
- (2009) Proc. Interspeech , pp. 1759-1762
- Oura, K.¹ Zen, H.² Nankaku, Y.³ Lee, A.⁴ Tokuda, K.⁵

22
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

23
- 38549178971
- Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion
- Springer
- K. Richmond, "Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion," in Advances in Nonlinear Speech Processing, pp. 263-272. Springer, 2007.
- (2007) Advances in Nonlinear Speech Processing , pp. 263-272
- Richmond, K.¹

24
- 84878403872
- Deep architectures for articulatory inversion
- B. Uria, I. Murray, S. Renals, and K. Richmond, "Deep architectures for articulatory inversion," in Proc. Interspeech, 2012, pp. 867-870.
- (2012) Proc. Interspeech , pp. 867-870
- Uria, B.¹ Murray, I.² Renals, S.³ Richmond, K.⁴

25
- 33846405723
- Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge
- H. Zen, T. Toda, M. Nakamura, and T. Tokuda, "Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005," IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007.
- (2005) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, T.⁴

26
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. ICASSP, 1992, pp. 137-140.
- (1992) Proc. ICASSP , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

27
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "A hidden semi-Markov model-based speech synthesis system," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825-834, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

28
- 0036522887
- Multi-space probability distribution HMM
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Multi-space probability distribution HMM," IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455-464, 2002.
- (2002) IEICE Trans. Inf. Syst. , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

29
- 85135145174
- Acoustic modeling based on the MDL criterion for speech recognition
- K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," in Proc. Eurospeech, 1997, pp. 99-102.
- (1997) Proc. Eurospeech , pp. 99-102
- Shinoda, K.¹ Watanabe, T.²

30
- 85008023596
- Continuous F0 modelling for HMM based statistical parametric speech synthesis
- K. Yu and S. Young, "Continuous F0 modelling for HMM based statistical parametric speech synthesis," IEEE Trans. Audio Speech Lang. Process., vol. 19, no. 5, pp. 1071-1079, 2011.
- (2011) IEEE Trans. Audio Speech Lang. Process. , vol.19 , Issue.5 , pp. 1071-1079
- Yu, K.¹ Young, S.²

31
- 84887388950
- An empirical study of learning rates in deep neural networks for speech recognition
- A. Senior, G. Heigold, M. Ranzato, and K. Yang, "An empirical study of learning rates in deep neural networks for speech recognition," in Proc. ICASSP, 2013, pp. 6724-6728.
- (2013) Proc. ICASSP , pp. 6724-6728
- Senior, A.¹ Heigold, G.² Ranzato, M.³ Yang, K.⁴

32
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," The Journal of Machine Learning Research, pp. 2121-2159, 2011.
- (2011) The Journal of Machine Learning Research , pp. 2121-2159
- Duchi, J.¹ Hazan, E.² Singer, Y.³

33
- 84890471125
- On rectified linear units for speech processing
- M. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.-V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. Hinton, "On rectified linear units for speech processing," in Proc. ICASSP, 2013, pp. 3517-3521.
- (2013) Proc. ICASSP , pp. 3517-3521
- Zeiler, M.¹ Ranzato, M.² Monga, R.³ Mao, M.⁴ Yang Q-V Le, K.⁵ Nguyen, P.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Dean, J.⁹ Hinton, G.¹⁰

34
- 78049361102
- Incorporation of mixed excitation model and postfilter into HMMbased text-to-speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMMbased text-to-speech synthesis," IEICE Trans. Inf. Syst., vol. J87-D-II, no. 8, pp. 1563-1571, 2004.
- (2004) IEICE Trans. Inf. Syst. , vol.J87-D-II , Issue.8 , pp. 1563-1571
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

35
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 8, pp. 2222-2235, 2007.
- (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.