SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2016-May, Issue , 2016, Pages 5600-5604

Trajectory training considering global variance for speech synthesis based on neural networks

(4) Hashimoto, Kei a Oura, Keiichiro a Nankaku, Yoshihiko a Tokuda, Keiichi a

a NAGOYA INSTITUTE OF TECHNOLOGY (Japan)

Author keywords

global variance; neural network; Speech synthesis; statistical model; trajectory model

Indexed keywords

EID: 84973375140 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2016.7472749 Document Type: Conference Paper

Times cited : (28)

References (21)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

2
- 84876687945
- Speech synthesis based on hidden Markov models
- K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models, " Proceedings of the IEEE, vol. 101, no. 5, pp. 1234-1252, 2013
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

3
- 85009139544
- Simultaneous modeling of spectrum, pitch and du-ration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Ki-tamura, "Simultaneous modeling of spectrum, pitch and du-ration in HMM-based speech synthesis, " Proceedings of Eu-rospeech 1999, pp. 2347-2350, 1999
- (1999) Proceedings of Eu-rospeech 1999 , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Ki-Tamura, T.⁵

4
- 84973384246
- Tree-based state ty-ing for high accuracy acoustic modelling
- S. Young, J. J. Odell, and P. Woodland, "Tree-based state ty-ing for high accuracy acoustic modelling, " Proceedings of ARPA Workshop on Human Language Technology, pp. 307-312, 1994
- (1994) Proceedings of ARPA Workshop on Human Language Technology , pp. 307-312
- Young, S.¹ Odell, J.J.² Woodland, P.³

5
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Ki-tamura, "Speech parameter generation algorithms for HMM-based speech synthesis, " Proceedings of ICASSP 2000, pp. 936-939, 2000
- (2000) Proceedings of ICASSP 2000 , pp. 936-939
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Ki-Tamura, T.⁵

6
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kings-bury, "Deep neural networks for acoustic modeling in speech recognition, " IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kings-Bury, B.¹¹

7
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " Proceedings of ICASSP 2013, pp. 7962-7966, 2013
- (2013) Proceedings of ICASSP 2013 , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

8
- 84929157442
- Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
- H. Lu, S. King, and O. Watts, "Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis, " Proceedings of ISCA SSW8, pp. 281-285, 2013
- (2013) Proceedings of ISCA SSW8 , pp. 281-285
- Lu, H.¹ King, S.² Watts, O.³

9
- 84905251808
- On the training aspects of deep neural network (DNN) for parametric TTS syn-thesis
- Y. Qian, Y. Fan, H. Wenping, and F. K. Soong, "On the training aspects of deep neural network (DNN) for parametric TTS syn-thesis, " Proceedings of ICASSP 2014, pp. 3857-3861, 2014
- (2014) Proceedings of ICASSP 2014 , pp. 3857-3861
- Qian, Y.¹ Fan, Y.² Wenping, H.³ Soong, F.K.⁴

10
- 38549096029
- A speech parameter generation algo-rithm considering global variance for HMM-based speech syn-thesis
- T. Toda and K. Tokuda, "A speech parameter generation algo-rithm considering global variance for HMM-based speech syn-thesis, " IEICE Transactions on Information & Systems, vol. E90-D, no. 5, pp. 816-824, 2007
- (2007) IEICE Transactions on Information & Systems , vol.E90D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

11
- 84890495160
- Fast, low-artifact speech synthesis considering global variance
- M. Shannon andW. Byrne, "Fast, low-artifact speech synthesis considering global variance, " Proceedings of ICASSP 2013, pp. 7869-7873, 2013
- (2013) Proceedings of ICASSP 2013 , pp. 7869-7873
- Shannon, M.¹ Byrne, W.²

12
- 67650826181
- Trajectory training considering global variance for HMM-based speech synthesis
- T. Toda and S. Young, "Trajectory training considering global variance for HMM-based speech synthesis, " Proceedings of ICASSP 2009, pp. 4025-4028, 2009
- (2009) Proceedings of ICASSP 2009 , pp. 4025-4028
- Toda, T.¹ Young, S.²

13
- 84946074523
- The effect of neural networks in statistical parametric speech syn-thesis
- K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, "The effect of neural networks in statistical parametric speech syn-thesis, " Proceedings of ICASSP 2015, pp. 4455-4459, 2015
- (2015) Proceedings of ICASSP 2015 , pp. 4455-4459
- Hashimoto, K.¹ Oura, K.² Nankaku, Y.³ Tokuda, K.⁴

14
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features
- H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features, " Computer Speech and Language, vol. 21, no. 1, pp. 153-173, 2007
- (2007) Computer Speech and Language , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

15
- 84959135757
- Minimum trajectory error training for deep neural networks, combined with stacked bottleneck fea-tures
- Z. Wu and S. King, "Minimum trajectory error training for deep neural networks, combined with stacked bottleneck fea-tures, " Proceedings of Interspeech 2015, pp. 309-313, 2015
- (2015) Proceedings of Interspeech 2015 , pp. 309-313
- Wu, Z.¹ King, S.²

16
- 84959172579
- Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis
- Y. Fan, Y. Qian, F. K. Soong, and L. He, "Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis, " Proceedings of Interspeech 2015, pp. 864-868, 2015
- (2015) Proceedings of Interspeech 2015 , pp. 864-868
- Fan, Y.¹ Qian, Y.² Soong, F.K.³ He, L.⁴

17
- 84910087395
- Sequence er-ror SE minimization training of neural network for voice con-version
- F. L. Xie, Y. Qian, Y. Fan, F. K. Soong, and H. Li, "Sequence er-ror SE minimization training of neural network for voice con-version, " Proceedings of Interspeech 2014, pp. 2283-2287, 2014
- (2014) Proceedings of Interspeech 2014 , pp. 2283-2287
- Xie, F.L.¹ Qian, Y.² Fan, Y.³ Soong, F.K.⁴ Li, H.⁵

18
- 0025475528
- ATR Japanese speech database as a tool of speech recognition and synthesis
- A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR Japanese speech database as a tool of speech recognition and synthesis, " Speech Commu-nication, vol. 9, pp. 357-363, 1990
- (1990) Speech Commu-nication , vol.9 , pp. 357-363
- Kurematsu, A.¹ Takeda, K.² Sagisaka, Y.³ Katagiri, S.⁴ Kuwabara, H.⁵ Shikano, K.⁶

19
- 0032673049
- Re-structuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne, "Re-structuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech Communication, vol. 27, pp. 187-207, 1999
- (1999) Speech Communication , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.³

20
- 85135145174
- Acoustic modeling based on the MDL criterion for speech recognition
- K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition, " Proceedings of Eu-rospeech 1997, pp. 99-102, 1997
- (1997) Proceedings of Eu-rospeech 1997 , pp. 99-102
- Shinoda, K.¹ Watanabe, T.²

21
- 84905262874
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
- H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis, " Proceedings of ICASSP 2014, pp. 3872-3876, 2014.
- (2014) Proceedings of ICASSP 2014 , pp. 3872-3876
- Zen, H.¹ Senior, A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.