SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2015-August, Issue , 2015, Pages 4215-4219

Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis

(2) Tokuday, Keiichi a,b Zen, Heiga a

a GOOGLE (United Kingdom)

b NAGOYA INSTITUTE OF TECHNOLOGY (Japan)

Author keywords

adaptive cepstral analysis; neural network; Statistical parametric speech synthesis

Indexed keywords

DECISION TREES; EXTRACTION; FEATURE EXTRACTION; NEURAL NETWORKS; SPEECH COMMUNICATION; SPEECH SYNTHESIS; TREES (MATHEMATICS);

ACOUSTIC FEATURE EXTRACTION; ACOUSTIC MODEL; ACOUSTIC MODEL TRAININGS; CEPSTRAL ANALYSIS; CONVENTIONAL APPROACH; SPEECH WAVEFORMS; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; TREE STRUCTURES;

AUDIO SIGNAL PROCESSING;

EID: 84946077883 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2015.7178765 Document Type: Conference Paper

Times cited : (40)

References (23)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commn., vol. 51, no. 11, pp. 1039-1064,2009
- (2009) Speech Commn , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

2
- 0003323711
- Unbiased estimation of log spectrum
- S. Imai and C. Furuichi, "Unbiased estimation of log spectrum," in Proc. EURASIP, 1988, pp. pp.203-206
- (1988) Proc. EURASIP , pp. 203-206
- Imai, S.¹ Furuichi, C.²

3
- 0001810975
- Line spectrum representation of linear predictor coefficients of speech signals
- F. Itakura, "Line spectrum representation of linear predictor coefficients of speech signals," The Journal of the Acoust. SocietyofAmerica, vol. 57, no. SI, pp. S35-S35, 1975
- (1975) The Journal of the Acoust. SocietyofAmerica , vol.57 , Issue.51 , pp. S35-S35
- Itakura, F.¹

4
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
- H. Kawahara,]' Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight," in Proc. MAVEBA, 2001, pp. 13-15
- (2001) Proc. MAVEBA , pp. 13-15
- Kawahara Estill, H.¹ Fujimura, O.²

5
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

6
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP. 2013, pp. 7962-7966
- (2013) Proc. ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

7
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Y-]. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP. 2006, pp. 89-92
- (2006) Proc. ICASSP , pp. 89-92
- Wu, Y.¹ Wang, R.-H.²

8
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features
- H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features," Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007
- (2007) Comput. Speech Lang , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

9
- 0009553788
- A statistical method for estimation of speech spectral density and formant frequencies
- F. Itakura and S. Saito, "A statistical method for estimation of speech spectral density and formant frequencies," IEICE Trans. Fundamentals Uapanese Edition), vol. ]53-A. no. 1, pp. 35-42, 1970
- (1970) IEICE Trans. Fundamentals Uapanese Edition) , vol.53-A , Issue.1 , pp. 35-42
- Itakura, F.¹ Saito, S.²

10
- 0029406853
- Adaptive cepstral analysis of speech
- K. Tokuda, T. Kobayashi, and S. Imai, "Adaptive cepstral analysis of speech," IEEE Trans. Speech Audio Process., vol. 3, no. 6, pp. 481-489, 1995
- (1995) IEEE Trans. Speech Audio Process , vol.3 , Issue.6 , pp. 481-489
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

11
- 84867214032
- Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis
- Y-]. Wu and K. Tokuda, "Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis," in Proc. Interspeech, 2008, pp. 577-580
- (2008) Proc. Interspeech , pp. 577-580
- Wu, Y.¹ Tokuda, K.²

12
- 51449096059
- Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory hmm
- T. Toda and K. Tokuda, "Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory hmm," in Proc. ICASSP, 2008, pp. 3925-3928
- (2008) Proc. ICASSP , pp. 3925-3928
- Toda, T.¹ Tokuda, K.²

13
- 84865797109
- Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters
- R. Maia, H. Zen, and M. Gales, "Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters," in Proc. ISCA SSW?, 2010, pp. 88-93
- (2010) Proc. ISCA SSW? , pp. 88-93
- Maia, R.¹ Zen, H.² Gales, M.³

14
- 84901760651
- Integration of spectral feature extraction and modeling for HMM-based speech synthesis
- K. Nakamura, K. Hashimoto, Y Nankaku, and K. Tokuda, "Integration of spectral feature extraction and modeling for HMM-based speech synthesis," IEICE Trans Inf Syst., vol. 97, no. 6, pp. 1438-1448,2014
- (2014) IEICE Trans Inf Syst , vol.97 , Issue.6 , pp. 1438-1448
- Nakamura, K.¹ Hashimoto, K.² Nankaku, Y.³ Tokuda, K.⁴

15
- 0003805597
- Ph.D. thesis, Cambridge University
- J. Odell, The use of context in large vocabulary speech recognition, Ph.D. thesis, Cambridge University, 1995
- (1995) The Use of Context in Large Vocabulary Speech Recognition
- Odell, J.¹

16
- 84994214710
- Deep learning in speech synthesis
- H. Zen, "Deep learning in speech synthesis," in Keynote speech given at ISCA SSW8, 2013, http://research. google. com/pubs/archive/41539.pdf
- (2013) Keynote Speech Given at ISCA SSW8
- Zen, H.¹

17
- 0004108066
- Springer-Verlag
- K. Dzhaparidze, Parameter estimation and hypothesis testing in spectral analysis of stationary time series, Springer-Verlag, 1986
- (1986) Parameter Estimation and Hypothesis Testing in Spectral Analysis of Stationary Time Series
- Dzhaparidze, K.¹

18
- 0003513556
- Prentice Hall
- A.v. Oppenhem and R.w. Schafer, Descrete-Time Signal Processing, Prentice Hall, 1989
- (1989) Descrete-Time Signal Processing
- Oppenhem, A.V.¹ Schafer, R.W.²

19
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for lowlatency speech synthesis
- (accepted)
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for lowlatency speech synthesis," in Proc. ICASSP, 2015 (accepted)
- (2015) Proc. ICASSP
- Zen, H.¹ Sak, H.²

20
- 78049361102
- Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis," IEICE Trans. Inf Syst., vol. ]87-D-II, no. 8, pp. 1563-1571, 2004
- (2004) IEICE Trans. Inf Syst , vol.87-D-11 , Issue.8 , pp. 1563-1571
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

21
- 84877760312
- Large scale distributed deep networks
- J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, "Large scale distributed deep networks," in Proc. NIPS, 2012
- (2012) Proc. NIPS
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Le, Q.⁶ Mao, M.⁷ Ranzato, M.⁸ Senior, A.⁹ Tucker, P.¹⁰ Yang, K.¹¹ Ng, A.¹²

22
- 0001609567
- An efficient gradient-based algorithm for on-line training of recurrent network trajectories
- R. Williams and J. Peng, "An efficient gradient-based algorithm for on-line training of recurrent network trajectories," Neural Comput., vol. 2, no. 4, pp. 490-501,1990
- (1990) Neural Comput , vol.2 , Issue.4 , pp. 490-501
- Williams, R.¹ Peng, J.²

23
- 84910046405
- Long short-term memory recurrent neural network architectures for large scale acoustic modeling
- H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Proc. Interspeech, 2014
- (2014) Proc. Interspeech
- Sak, H.¹ Senior, A.² Beaufays, F.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.