SCOPUS 정보 검색 플랫폼

Volumn 32, Issue 3, 2015, Pages 35-52

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

(8) Ling, Zhen Hua a Kang, Shi Yin b Zen, Heiga c Senior, Andrew d Schuster, Mike e Qian, Xiao Jun f Meng, Helen g Deng, Li h

a UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

b CHINESE UNIVERSITY OF HONG KONG (Hong Kong)

c NAGOYA INSTITUTE OF TECHNOLOGY (Japan)

d UNIVERSITY OF CAMBRIDGE (United Kingdom)

e UNIVERSITY OF DUISBURG ESSEN (Germany)

f FUDAN UNIVERSITY (China)

g Department of Systems Engineering and Engineering Management ^* (China)

h Deep Learning Technology Center (China)

Author keywords

[No Author keywords available]

Indexed keywords

COMPLEX NETWORKS; HIDDEN MARKOV MODELS; MARKOV PROCESSES; SPEECH; TRELLIS CODES;

ACOUSTIC FEATURES; AUTOMATIC SPEECH RECOGNITION; DEEP NEURAL NETWORKS; GAUSSIAN MIXTURE MODEL (GMMS); HIDDEN MARKOV MODELS (HMMS); HIERARCHICAL PROCESS; NON-LINEAR RELATIONSHIPS; PARAMETRIC APPROACH;

SPEECH RECOGNITION;

EID: 85032750981 PISSN: 10535888 EISSN: None Source Type: Journal
DOI: 10.1109/MSP.2014.2359987 Document Type: Review

Times cited : (238)

References (86)

1
- 84876687945
- Speech synthesis based on hidden Markov models
- K. Tokuda, Y. Nankaku, T. Toda, H. Zen, H. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models," Proc. IEEE, vol. 101, no. 5, pp. 1234-1252, 2013.
- (2013) Proc. IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, H.⁵ Oura, K.⁶

2
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 8, pp. 2222-2235, 2007.
- (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

3
- 33846405723
- Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005
- H. Zen, T. Toda, M. Nakamura, and K. Tokuda, "Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005," IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

4
- 67650851754
- USTC system for Blizzard Challenge 2006: An improved HMM-based speech synthesis method
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, "USTC system for Blizzard Challenge 2006: An improved HMM-based speech synthesis method," in Proc. Blizzard Challenge Workshop, 2006.
- Proc. Blizzard Challenge Workshop, 2006
- Ling, Z.-H.¹ Wu, Y.-J.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

5
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis,"Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

6
- 85009139544
- Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,"in Proc. Eurospeech, 1999, pp. 2347-2350.
- Proc. Eurospeech, 1999 , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

7
- 0033708106
- Speech parameter-generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter-generation algorithms for HMM-based speech synthesis," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2000, vol. 3, pp. 1315-1318.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2000 , vol.3 , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

8
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
- H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2006.
- (2006) Comput. Speech Lang. , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

9
- 85008525798
- Product of experts for statistical parametric speech synthesis
- H. Zen, M. Gales, Y. Nankaku, and K. Tokuda, "Product of experts for statistical parametric speech synthesis," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 3, pp. 794-805, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.3 , pp. 794-805
- Zen, H.¹ Gales, M.² Nankaku, Y.³ Tokuda, K.⁴

10
- 84897902941
- Statistical parametric speech synthesis based on Gaussian process regression
- T. Koriyama, T. Nose, and T. Kobayashi, "Statistical parametric speech synthesis based on Gaussian process regression," IEEE J. Select. Topics Signal Processing, vol. 8, no. 2, pp. 173-183, 2014.
- (2014) IEEE J. Select.Topics Signal Processing , vol.8 , Issue.2 , pp. 173-183
- Koriyama, T.¹ Nose, T.² Kobayashi, T.³

11
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2006, pp. 89-92.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2006 , pp. 89-92
- Wu, Y.-J.¹ Wang, R.-H.²

12
- 84867214032
- Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis
- Y.-J. Wu and K. Tokuda, "Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis," in Proc. Interspeech, 2008, pp. 577-580.
- Proc. Interspeech, 2008 , pp. 577-580
- Wu, Y.-J.¹ Tokuda, K.²

13
- 38549096029
- A speech parameter-generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "A speech parameter-generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

14
- 77953715694
- Statistical text-to-speech synthesis based on segment-wise representation with a norm constraint
- T. Tiomkin, D. Malah, and S. Shechtman, "Statistical text-to-speech synthesis based on segment-wise representation with a norm constraint," IEEE Trans. Audio Speech Lang. Processing, vol. 18, no. 5, pp. 1077-1082, 2010.
- (2010) IEEE Trans. Audio Speech Lang. Processing , vol.18 , Issue.5 , pp. 1077-1082
- Tiomkin, T.¹ Malah, D.² Shechtman, S.³

15
- 84901793334
- Minimum Kullback-Leibler divergence parametergeneration for HMM-based speech synthesis
- Z.-H. Ling and L.-R. Dai, "Minimum Kullback-Leibler divergence parametergeneration for HMM-based speech synthesis," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 5, pp. 1492-1502, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.5 , pp. 1492-1502
- Ling, Z.-H.¹ Dai, L.-R.²

16
- 33745805403
- A fast learning algorithm for deep belief nets
- G. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural Computat., vol. 18, no. 7, pp. 1527-1554, 2006.
- (2006) Neural Computat. , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.¹ Osindero, S.² Teh, Y.-W.³

17
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.¹ Salakhutdinov, R.²

18
- 78651276374
- Ph.D. thesis, Univ. of Toronto
- R. Salakhutdinov, "Learning deep generative models," Ph.D. thesis, Univ. of Toronto, 2009.
- (2009) Learning Deep Generative Models
- Salakhutdinov, R.¹

19
- 0000329993
- Information processing in dynamical systems: Foundations of harmony theory
- D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA: MIT Press ch. 6
- P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory," in Parallel Distributed Processing, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA: MIT Press, 1986, vol. 1, ch. 6, pp. 194-281.
- (1986) Parallel Distributed Processing , vol.1 , pp. 194-281
- Smolensky, P.¹

20
- 79959842828
- Binary coding of speech spectrograms using a deep auto-encoder
- L. Deng, M. Seltzer, D. Yu, A. Acero, A. Mohamed, and G. Hinton, "Binary coding of speech spectrograms using a deep auto-encoder," in Proc. Interspeech, 2010, pp. 1692-1695.
- Proc. Interspeech, 2010 , pp. 1692-1695
- Deng, L.¹ Seltzer, M.² Yu, D.³ Acero, A.⁴ Mohamed, A.⁵ Hinton, G.⁶

21
- 79551480483
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
- Dec.
- P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion," J. Mach. Learn. Res., vol. 11, pp. 3371-3408, Dec. 2010.
- (2010) J. Mach. Learn. Res. , vol.11 , pp. 3371-3408
- Vincent, P.¹ Larochelle, H.² Lajoie, I.³ Bengio, Y.⁴ Manzagol, P.⁵

22
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Mag., vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Processing Mag. , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

23
- 84973365185
- Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 7825-7829.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2013 , pp. 7825-7829
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

24
- 84901237776
- Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis,"IEEE Trans. Audio Speech Lang. Processing, vol. 21, no. 10, pp. 2129-2139, 2013.
- (2013) IEEE Trans. Audio Speech Lang. Processing , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

25
- 84890527090
- Multi-distribution deep belief network for speech synthesis
- S.-Y. Kang, X.-J. Qian, and H. Meng, "Multi-distribution deep belief network for speech synthesis," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8012-8016.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2013 , pp. 8012-8016
- Kang, S.-Y.¹ Qian, X.-J.² Meng, H.³

26
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 7962-7966.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2013 , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

27
- 84906225084
- Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion
- L.-H. Chen, Z.-H. Ling, Y. Song, and L.-R. Dai, "Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion," in Proc. Interspeech, 2013, pp. 3052-3056.
- Proc. Interspeech, 2013 , pp. 3052-3056
- Chen, L.-H.¹ Ling, Z.-H.² Song, Y.³ Dai, L.-R.⁴

28
- 84906280857
- Voice conversion in high-order eigen space using deep belief nets
- T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, "Voice conversion in high-order eigen space using deep belief nets," in Proc. Interspeech, 2013, pp. 369-372.
- Proc. Interspeech, 2013 , pp. 369-372
- Nakashika, T.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

29
- 84889579519
- Conditional restricted Boltzmann machine for voice conversion
- Z.-Z Wu, E.S. Chng, and H.-Z. Li, "Conditional restricted Boltzmann machine for voice conversion," in Proc. ChinaSIP, 2013, pp. 104-108.
- Proc. ChinaSIP, 2013 , pp. 104-108
- Wu, Z.-Z.¹ Chng, E.S.² Li, H.-Z.³

30
- 84906262433
- Speech enhancement based on deep denoising autoencoder
- X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Speech enhancement based on deep denoising autoencoder," in Proc. Interspeech, 2013, pp. 436-440.
- Proc. Interspeech, 2013 , pp. 436-440
- Lu, X.¹ Tsao, Y.² Matsuda, S.³ Hori, C.⁴

31
- 84906279378
- Speech Enhancement with Weighted Denoising Autoencoder
- B.-Y. Xia and C.-C. Bao, "Speech enhancement with weighted denoising autoencoder,"in Proc. Interspeech, 2013, pp. 3444-3448.
- Proc. Interspeech, 2013 , pp. 3444-3448
- Xia, B.-Y.¹ Bao, C.-C.²

32
- 84889257121
- An experimental study on speech enhancement based on deep neural networks
- Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, "An experimental study on speech enhancement based on deep neural networks," IEEE Signal Processing Lett., vol. 21, no. 1, pp. 65-68, 2014.
- (2014) IEEE Signal Processing Lett. , vol.21 , Issue.1 , pp. 65-68
- Xu, Y.¹ Du, J.² Dai, L.-R.³ Lee, C.-H.⁴

33
- 84890522099
- 0 contour prediction with a deep belief network-Gaussian process hybrid model
- 0 contour prediction with a deep belief network-Gaussian process hybrid model," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 6885-6889.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2013 , pp. 6885-6889
- Fernandez, R.¹ Rendel, A.² Ramabhadran, B.³ Hoory, R.⁴

34
- 84929157442
- Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
- H. Lu, S. King, and O. Watts, "Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis," in Proc. ISCA SSW8, 2013, pp. 261-265.
- Proc. ISCA SSW8, 2013 , pp. 261-265
- Lu, H.¹ King, S.² Watts, O.³

35
- 84910030421
- Statistical parametric speech synthesis using weighted multi-distribution deep belief network
- S.-Y. Kang and H. Meng, "Statistical parametric speech synthesis using weighted multi-distribution deep belief network," in Proc. Interspeech, 2014, pp. 1959-1963.
- Proc. Interspeech, 2014 , pp. 1959-1963
- Kang, S.-Y.¹ Meng, H.²

36
- 84905251808
- On the training aspects of deep neural networks (DNN) for parametric TTS synthesis
- Y. Qian, Y.-C. Fan, W.-P. Hu, and F. K. Soong, "On the training aspects of deep neural networks (DNN) for parametric TTS synthesis," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 3857-3861.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2014 , pp. 3857-3861
- Qian, Y.¹ Fan, Y.-C.² Hu, W.-P.³ Soong, F.K.⁴

37
- 84905262874
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
- H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 3872-3876.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2014 , pp. 3872-3876
- Zen, H.¹ Senior, A.²

38
- 84905252390
- Voice conversion in time-invariant speaker-independent space
- T. Nakashika, T. Takiguchi, and Y. Ariki, "Voice conversion in time-invariant speaker-independent space," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 7939-7943.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2014 , pp. 7939-7943
- Nakashika, T.¹ Takiguchi, T.² Ariki, Y.³

39
- 84921735339
- Voice conversion using deep neural networks with layer-wise generative training
- L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, "Voice conversion using deep neural networks with layer-wise generative training," in Proc. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, 2014, pp. 1859-1872.
- (2014) Proc. IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.22 , Issue.12 , pp. 1859-1872
- Chen, L.-H.¹ Ling, Z.-H.² Liu, L.-J.³ Dai, L.-R.⁴

40
- 85032764981
- Dynamic noise aware training for speech enhancement based on deep neural networks
- to be published
- Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, "Dynamic noise aware training for speech enhancement based on deep neural networks," in Proc. Interspeech (to be published).
- Proc. Interspeech
- Xu, Y.¹ Du, J.² Dai, L.-R.³ Lee, C.-H.⁴

41
- 0028996993
- Speech parameter-generation from HMM using dynamic features
- K. Tokuda, T. Kobayashi, and S. Imai, "Speech parameter-generation from HMM using dynamic features," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 1995, pp. 660-663.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 1995 , pp. 660-663
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

42
- 85008006694
- Robust speaker-adaptive HMM-based text-to-speech synthesis
- J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, "Robust speaker-adaptive HMM-based text-to-speech synthesis," IEEE Trans. Audio Speech Lang. Processing, vol. 17, no. 6, pp. 1208-1230, 2009.
- (2009) IEEE Trans. Audio Speech Lang. Processing , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

43
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training," IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533-543, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

44
- 24144497811
- Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis
- J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, "Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E88-D, no. 3, pp. 503-509, 2005.
- (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.3 , pp. 503-509
- Yamagishi, J.¹ Onishi, K.² Masuko, T.³ Kobayashi, T.⁴

45
- 29144475179
- Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing
- M. Tachibana, J. Yamagishi, T. Masuko, and T. Kobayashi, "Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing," IEICE Trans. Inf. Syst., vol. E88-D, no. 11, pp. 2484-2491, 2005.
- (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.11 , pp. 2484-2491
- Tachibana, M.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

46
- 51449114529
- A style control technique for HMM-based expressive speech synthesis
- T. Nose, J. Yamagishi, and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.9 , pp. 1406-1413
- Nose, T.¹ Yamagishi, J.² Kobayashi, T.³

47
- 84862291337
- Vocal tract length normalization for statistical parametric speech synthesis
- L. Saheer, J. Dines, and P. N. Garner, "Vocal tract length normalization for statistical parametric speech synthesis," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 7, pp. 2134-2148, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.7 , pp. 2134-2148
- Saheer, L.¹ Dines, J.² Garner, P.N.³

48
- 84859765673
- Statistical parametric speech synthesis based on speaker and language factorization
- H. Zen, N. Braunschweiler, S. Buchholz, M.J.F. Gales, K. Knill, S. Krstulovic, and J. Latorre, "Statistical parametric speech synthesis based on speaker and language factorization," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 6, pp. 1713-1724, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.6 , pp. 1713-1724
- Zen, H.¹ Braunschweiler, N.² Buchholz, S.³ Gales, M.J.F.⁴ Knill, K.⁵ Krstulovic, S.⁶ Latorre, J.⁷

49
- 84869440340
- Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression
- Z.-H. Ling, K. Richmond, and J. Yamagishi, "Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression," IEEE Trans. Audio Speech Lang. Processing, vol. 21, no. 1, pp. 207-219, 2013.
- (2013) IEEE Trans. Audio Speech Lang. Processing , vol.21 , Issue.1 , pp. 207-219
- Ling, Z.-H.¹ Richmond, K.² Yamagishi, J.³

50
- 84966348891
- An HMM-based speech synthesis system applied to English
- K. Tokuda, H. Zen, and A. Black, "An HMM-based speech synthesis system applied to English," in Proc. IEEE Speech Synthetic Workshop, 2002, CD-ROM Proc.
- Proc. IEEE Speech Synthetic Workshop, 2002, CD-ROM Proc.
- Tokuda, K.¹ Zen, H.² Black, A.³

51
- 79955538498
- Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
- K. Yu, H. Zen, F. Mairesse, and S. Young, "Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis,"Speech Commun., vol. 53, no. 6, pp. 914-923, 2011.
- (2011) Speech Commun. , vol.53 , Issue.6 , pp. 914-923
- Yu, K.¹ Zen, H.² Mairesse, F.³ Young, S.⁴

52
- 0003805597
- Ph.D. thesis, Cambridge Univ.
- J. Odell, "The use of context in large vocabulary speech recognition," Ph.D. thesis, Cambridge Univ., 1995.
- (1995) The Use of Context in Large Vocabulary Speech Recognition
- Odell, J.¹

53
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "A hidden semi-Markov model-based speech synthesis system," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825-834, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

54
- 85093445139
- Duration modeling in HMM-based speech synthesis system
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Duration modeling in HMM-based speech synthesis system," in Proc. ICSLP, 1998, vol. 2, pp. 29-32.
- Proc. ICSLP, 1998 , vol.2 , pp. 29-32
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

55
- 85004448479
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," J. Acoust. Soc. Jpn. (E), vol. 11, no. 2, pp. 71-76, 1990.
- (1990) J. Acoust. Soc. Jpn. (E) , vol.11 , Issue.2 , pp. 71-76
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwabara, H.⁴

56
- 0032026483
- Continuous probabilistic transform for voice conversion
- Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Audio Speech Lang. Processing, vol. 6, no. 2, pp. 131-142, 1998.
- (1998) IEEE Trans. Audio Speech Lang. Processing , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

57
- 77953727123
- Voice conversion based on weighted frequency warping
- D. Erro, A. Moreno, and A. Bonafonte, "Voice conversion based on weighted frequency warping," IEEE Trans. Audio Speech Lang. Processing, vol. 18, no. 5, pp. 922-931, 2010.
- (2010) IEEE Trans. Audio Speech Lang. Processing , vol.18 , Issue.5 , pp. 922-931
- Erro, D.¹ Moreno, A.² Bonafonte, A.³

58
- 77953707533
- Spectral mapping using artificial neural networks for voice conversion
- S. Desai, A. W. Black, B. Yegnanarayana, and K. Prahallad, "Spectral mapping using artificial neural networks for voice conversion," IEEE Trans. Audio Speech Lang. Processing, vol. 18, no. 5, pp. 954-964, 2010.
- (2010) IEEE Trans. Audio Speech Lang. Processing , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, A.W.² Yegnanarayana, B.³ Prahallad, K.⁴

59
- 84856141218
- Voice conversion using dynamic kernel partial least squares regression
- E. Helander, H. Silen, T. Virtanen, and M. Gabbouj, "Voice conversion using dynamic kernel partial least squares regression," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 3, pp. 806C817, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.3 , pp. 806C817
- Helander, E.¹ Silen, H.² Virtanen, T.³ Gabbouj, M.⁴

60
- 84859768504
- Statistical voice conversion based on noisy channel model
- D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, "Statistical voice conversion based on noisy channel model," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 6, pp. 1784-1794, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.6 , pp. 1784-1794
- Saito, D.¹ Watanabe, S.² Nakamura, A.³ Minematsu, N.⁴

61
- 0033692729
- Narrowband to wideband conversion of speech using GMM based transformation
- K. Park and H. Kim, "Narrowband to wideband conversion of speech using GMM based transformation," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2000, pp. 1843-1846.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2000 , pp. 1843-1846
- Park, K.¹ Kim, H.²

62
- 58149308063
- A spectral conversion approach to single-channel speech enhancement
- A. Mouchtaris, J. Van der Spiegel, P. Mueller, and P. Tsakalides, "A spectral conversion approach to single-channel speech enhancement," IEEE Trans. Audio Speech Lang. Processing, vol. 15, no. 4, pp. 1180-1193, 2007.
- (2007) IEEE Trans. Audio Speech Lang. Processing , vol.15 , Issue.4 , pp. 1180-1193
- Mouchtaris, A.¹ Van Der Spiegel, J.² Mueller, P.³ Tsakalides, P.⁴

63
- 84865698185
- Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
- T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 9, pp. 2505-2517, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.9 , pp. 2505-2517
- Toda, T.¹ Nakagiri, M.² Shikano, K.³

64
- 38649140222
- Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
- T. Toda, A. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model," Speech Commun., vol. 50, pp. 215-227, 2008.
- (2008) Speech Commun. , vol.50 , pp. 215-227
- Toda, T.¹ Black, A.² Tokuda, K.³

65
- 84994214710
- Deep learning in speech synthesis
- Available
- H. Zen. (2013). Deep learning in speech synthesis. Keynote speech given at ISCA SSW8. [Online]. Available: http://research.google.com/pubs/archive/41539.pdf
- (2013) Keynote Speech Given at ISCA SSW8. [Online]
- Zen, H.¹

66
- 4243117872
- New York: Marcel Dekker
- L. Deng and D. O'Shaughnessy, Speech Processing: A Dynamic and Optimization-Oriented Approach. New York: Marcel Dekker, 2003.
- (2003) Speech Processing: A Dynamic and Optimization-Oriented Approach
- Deng, L.¹ O'Shaughnessy, D.²

67
- 0036165806
- An overlapping-feature based phonological model incorporating linguistic constraints: Applications to speech recognition
- J. Sun and L. Deng, "An overlapping-feature based phonological model incorporating linguistic constraints: Applications to speech recognition," J. Acoust. Soc. Am., vol. 111, pp. 1086-1101, 2002.
- (2002) J. Acoust. Soc. Am. , vol.111 , pp. 1086-1101
- Sun, J.¹ Deng, L.²

68
- 0031198059
- Production models as a structural basis for automatic speech recognition
- Aug.
- L. Deng, G. Ramsay, and D. Sun, "Production models as a structural basis for automatic speech recognition," Speech Commun., vol. 33, nos. 2-3, pp. 93-111, Aug. 1997.
- (1997) Speech Commun. , vol.33 , Issue.2-3 , pp. 93-111
- Deng, L.¹ Ramsay, G.² Sun, D.³

69
- 33744966595
- Switching dynamic system models for speech articulation and acoustics
- New York: Springer-Verlag
- L. Deng, "Switching dynamic system models for speech articulation and acoustics," in Mathematical Foundations of Speech and Language Processing. New York: Springer-Verlag, 2003, pp. 115-134.
- (2003) Mathematical Foundations of Speech and Language Processing , pp. 115-134
- Deng, L.¹

70
- 84055163920
- Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition
- D. Yu, L. Deng, and G. Dahl, "Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition," in Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2010.
- Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2010
- Yu, D.¹ Deng, L.² Dahl, G.³

71
- 80051616844
- Large vocabulary continuous speech recognition with context-dependent DBN-HMMs
- G. Dahl, D. Yu, L. Deng, and A. Acero, "Large vocabulary continuous speech recognition with context-dependent DBN-HMMs," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 4688-4691.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2011 , pp. 4688-4691
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

72
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Speech Audio Processing, vol. 20, no. 1, pp. 30-42, 2012.
- (2012) IEEE Trans. Speech Audio Processing , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

73
- 84055211743
- Acoustic modeling using deep belief networks
- A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 1, pp. 14-22, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

74
- 84886829539
- Optimization techniques to improve training speed of deep neural networks for large speech tasks
- T.N. Sainath, B. Kingsbury, H. Soltau, and B. Ramabhadran, "Optimization techniques to improve training speed of deep neural networks for large speech tasks," IEEE Trans. Audio Speech Lang. Processing, vol. 21, no. 11, pp. 2267-2276, 2013.
- (2013) IEEE Trans. Audio Speech Lang. Processing , vol.21 , Issue.11 , pp. 2267-2276
- Sainath, T.N.¹ Kingsbury, B.² Soltau, H.³ Ramabhadran, B.⁴

75
- 84872300403
- Deep belief networks based voice activity detection
- X.-L. Zhang and Ji Wu, "Deep belief networks based voice activity detection,"IEEE Trans. Audio Speech Lang. Processing, vol. 21, no. 4, pp. 697-710, 2013.
- (2013) IEEE Trans. Audio Speech Lang. Processing , vol.21 , Issue.4 , pp. 697-710
- Zhang, X.-L.¹ Wu, J.²

76
- 84874282835
- A deep neural network for acoustic-articulatory speech inversion
- B. Uria, S. Renals, and K. Richmond, "A deep neural network for acoustic-articulatory speech inversion," in Proc. NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
- Proc. NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning, 2011
- Uria, B.¹ Renals, S.² Richmond, K.³

77
- 0013344078
- Training products of experts by minimizing contrastive divergence
- G. Hinton, "Training products of experts by minimizing contrastive divergence,"Neural Computat., vol. 14, no. 8, pp. 1711-1800, 2002.
- (2002) Neural Computat. , vol.14 , Issue.8 , pp. 1711-1800
- Hinton, G.¹

78
- 0033350721
- Products of experts
- th Int. Conf. Artificial Neural Networks, 1999, pp. 1-6.
- th Int. Conf. Artificial Neural Networks, 1999 , pp. 1-6
- Hinton, G.¹

79
- 84864026688
- Modeling human motion using binary latent variables
- G. Taylor, G. Hinton, and S. Roweis, "Modeling human motion using binary latent variables," in Proc. Advances in Neural Information Processing Systems, 2007, pp. 1345-1352.
- Proc. Advances in Neural Information Processing Systems, 2007 , pp. 1345-1352
- Taylor, G.¹ Hinton, G.² Roweis, S.³

80
- 44049116681
- Connectionist learning of belief networks
- R. Neal, "Connectionist learning of belief networks," Artificial Intell., vol. 56, no. 1, pp. 71-113, 1992.
- (1992) Artificial Intell. , vol.56 , Issue.1 , pp. 71-113
- Neal, R.¹

81
- 0022471098
- Learning representations by backpropagating errors
- D. Rumelhart, G. Hinton, and R. Williams, "Learning representations by backpropagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986.
- (1986) Nature , vol.323 , Issue.6088 , pp. 533-536
- Rumelhart, D.¹ Hinton, G.² Williams, R.³

82
- 0041914606
- Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
- S. Kremer and J. Kolen, Eds. Piscataway, NJ: IEEE Press
- S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, "Gradient flow in recurrent nets: The difficulty of learning long-term dependencies," in A Field Guide to Dynamical Recurrent Neural Networks, S. Kremer and J. Kolen, Eds. Piscataway, NJ: IEEE Press, 2001, pp. 237-244.
- (2001) A Field Guide to Dynamical Recurrent Neural Networks , pp. 237-244
- Hochreiter, S.¹ Bengio, Y.² Frasconi, P.³ Schmidhuber, J.⁴

83
- 84864073449
- Greedy layer-wise training of deep networks
- Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy layer-wise training of deep networks," in Proc. Advances in Neural Information Processing Systems, 2007, pp. 153-160.
- Proc. Advances in Neural Information Processing Systems, 2007 , pp. 153-160
- Bengio, Y.¹ Lamblin, P.² Popovici, D.³ Larochelle, H.⁴

84
- 0032673049
- Restructuring speech representations using pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigne, "Restructuring speech representations using pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, nos. 3-4, pp. 187-207, 1999.
- (1999) Speech Commun. , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigne, A.³

85
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 1992, vol. 1, pp. 137-140.
- Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 1992 , vol.1 , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

86
- 33646796046
- Ph.D. thesis, Nagoya Inst. of Tech.
- T. Yoshimura, "Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for HMM-based text-to-speech systems," Ph.D. thesis, Nagoya Inst. of Tech., 2002.
- (2002) Simultaneous Modeling of Phonetic and Prosodic Parameters, and Characteristic Conversion for HMM-Based Text-to-Speech Systems
- Yoshimura, T.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.