SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2017, Pages 4900-4904

Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis

(3) Saito, Yuki a Takamichi, Shinnosuke a Saruwatari, Hiroshi a

Author keywords

anti spoofing verification; DNN based speech synthesis; generative adversarial training; multitask learning; training algorithm

Indexed keywords

EID: 85023772724 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2017.7953088 Document Type: Conference Paper

Times cited : (32)

References (31)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Communication, Vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

2
- 84876687945
- Speech synthesis based on hidden Markov models
- K. Tokuda, Y Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models," Proceedings of the IEEE, Vol. 101, no. 5, pp. 1234-1252, 2013.
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

3
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques And future trends
- Z. H. Ling, S. Y Kang, H. Zen, A. Senior, M. Schuster, X. J. Qian, H. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends," IEEE Signal Processing Magazine, Vol. 32, no. 3, pp. 35-52, 2015.
- (2015) IEEE Signal Processing Magazine , vol.32 , Issue.3 , pp. 35-52
- Ling, Z.H.¹ Kang, S.Y.² Zen, H.³ Senior, A.⁴ Schuster, M.⁵ Qian, X.J.⁶ Meng, H.⁷ Deng, L.⁸

4
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Toulouse, France, May
- Y J. Wu and R. H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP, Toulouse, France, May 2006, pp. 89-92.
- (2006) Proc. ICASSP , pp. 89-92
- Wu, Y.J.¹ Wang, R.H.²

5
- 84978086501
- Improving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training
- Z. Wu and S. King, "Improving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 24, no. 7, pp. 1255-1265, 2016.
- (2016) IEEE Transactions on Audio, Speech, and Language Processing , vol.24 , Issue.7 , pp. 1255-1265
- Wu, Z.¹ King, S.²

6
- 84994361374
- The voice conversion challenge 2016
- California, U.S.A., Sep.
- T. Toda, L. H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, and J. Yamagishi, "The Voice Conversion Challenge 2016," in Proc. INTERSPEECH, California, U.S.A., Sep. 2016, pp. 1632-1636.
- (2016) Proc. INTERSPEECH , pp. 1632-1636
- Toda, T.¹ Chen, L.H.² Saito, D.³ Villavicencio, F.⁴ Wester, M.⁵ Wu, Z.⁶ Yamagishi, J.⁷

7
- 84994234512
- Objective evaluation using association between dimensions within spectral features for statistical parametric speech synthesis
- California, U.S.A., Sep.
- Y. Ijima, T. Asami, and H. Mizuno, "Objective evaluation using association between dimensions within spectral features for statistical parametric speech synthesis," in Proc. INTERSPEECH, California, U.S.A., Sep. 2016, pp. 337-341.
- (2016) Proc. INTERSPEECH , pp. 337-341
- Ijima, Y.¹ Asami, T.² Mizuno, H.³

8
- 57749193836
- Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, no. 8, pp. 2222-2235, 2007.
- (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

9
- 84878387899
- Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP
- Portland, U.S.A Sep.
- Y. Ohtani, M. Tamura, M. Morita, T. Kagoshima, and M. Akamine, "Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP," in Proc. INTERSPEECH, Portland, U.S.A., Sep. 2012.
- (2012) Proc. INTERSPEECH
- Ohtani, Y.¹ Tamura, M.² Morita, M.³ Kagoshima, T.⁴ Akamine, M.⁵

10
- 84962834006
- Postfilters to modify the modulation spectrum for statistical parametric speech synthesis
- S. Takamichi, T. Toda, A. W. Black, G. Neubig, S. Sakti, and S. Nakamura, "Postfilters to modify the modulation spectrum for statistical parametric speech synthesis," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 24, no. 4, pp. 755-767, 2016.
- (2016) IEEE Transactions on Audio, Speech, and Language Processing , vol.24 , Issue.4 , pp. 755-767
- Takamichi, S.¹ Toda, T.² Black, A.W.³ Neubig, G.⁴ Sakti, S.⁵ Nakamura, S.⁶

11
- 84946033919
- Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion
- Brisbane, Australia, Apr.
- S. Takamichi, T. Toda, A. W. Black, and S. Nakamura, "Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion," in Proc. ICASSP, Brisbane, Australia, Apr. 2015, pp. 4859-4863.
- (2015) Proc. ICASSP , pp. 4859-4863
- Takamichi, S.¹ Toda, T.² Black, A.W.³ Nakamura, S.⁴

12
- 84973375140
- Trajectory training considering global variance for speech synthesis based on neural networks
- Shanghai, China, Mar.
- K. Hashimoto, K. Oura, Y Nankaku, and K. Tokuda, "Trajectory training considering global variance for speech synthesis based on neural networks," in Proc. ICASSP, Shanghai, China, Mar. 2016, pp. 5600-5604.
- (2016) Proc. ICASSP , pp. 5600-5604
- Hashimoto, K.¹ Oura, K.² Nankaku, Y.³ Tokuda, K.⁴

13
- 84910088495
- Analysis of spectral enhancement using global variance in HMM-based speech synthesis
- MAX Atria, Singapore, May
- T. Nose and A. Ito, "Analysis of spectral enhancement using global variance in HMM-based speech synthesis," in Proc. INTERSPEECH, MAX Atria, Singapore, May 2014, pp. 2917-2921.
- (2014) Proc. INTERSPEECH , pp. 2917-2921
- Nose, T.¹ Ito, A.²

14
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- Vancouver, Canada, May
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, Vancouver, Canada, May 2013, pp. 7962-7966.
- (2013) Proc. ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

15
- 84962901047
- Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance
- Z. Wu, P. L. D. Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, and J. Yamagishi, "Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 24, no. 4, pp. 768-783, 2016.
- (2016) IEEE Transactions on Audio, Speech, and Language Processing , vol.24 , Issue.4 , pp. 768-783
- Wu, Z.¹ Leon, P.L.D.² Demiroglu, C.³ Khodabakhsh, A.⁴ King, S.⁵ Ling, Z.⁶ Saito, D.⁷ Stewart, B.⁸ Toda, T.⁹ Wester, M.¹⁰ Yamagishi, J.¹¹

16
- 84959178048
- Robust deep feature for spoofing detection - The SJTU system for ASVspoof 2015 challenge
- Dresden, Germany, Sep.
- N. Chen, Y Qian, H. Dinkel, B. Chen, and K. Yu, "Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 Challenge," in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2097-2101.
- (2015) Proc. INTERSPEECH , pp. 2097-2101
- Chen, N.¹ Qian, Y.² Dinkel, H.³ Chen, B.⁴ Yu, K.⁵

17
- 84937849144
- Generative adversarial nets
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems 27, pp. 2672-2680. 2014.
- (2014) Advances in Neural Information Processing Systems 27 , pp. 2672-2680
- Goodfellow, I.¹ Pouget-Abadie, J.² Mirza, M.³ Xu, B.⁴ Warde-Farley, D.⁵ Ozair, S.⁶ Courville, A.⁷ Bengio, Y.⁸

18
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, Vol. 313, no. 5786, pp. 504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

19
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- Brisbane, Australia, Apr.
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. ICASSP, Brisbane, Australia, Apr. 2015, pp. 4470-4474.
- (2015) Proc. ICASSP , pp. 4470-4474
- Zen, H.¹ Sak, H.²

20
- 84959090360
- Multitask learning deep neural networks for speech feature denoising
- Dresden, Germany, Sep.
- B. Huang, D. Ke, H. Zheng, B. Xu, Y Xu, and K. Su, "Multitask learning deep neural networks for speech feature denoising," in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2464-2468.
- (2015) Proc. INTERSPEECH , pp. 2464-2468
- Huang, B.¹ Ke, D.² Zheng, H.³ Xu, B.⁴ Xu, Y.⁵ Su, K.⁶

21
- 84998636515
- Generative adversarial text-to-image synthesis
- S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, "Generative adversarial text-to-image synthesis," in Proc. ICML, 2016, pp. 1060-1069.
- (2016) Proc. ICML , pp. 1060-1069
- Reed, S.¹ Akata, Z.² Yan, X.³ Logeswaran, L.⁴ Schiele, B.⁵ Lee, H.⁶

22
- 84984985889
- "Why should I trust you?": Explaining the predictions of any classifier
- San Francisco, U.S.A., Aug.
- T. R. Marco, S. Sameer, and G. Carlos, ""Why should I trust you?": Explaining the predictions of any classifier," in Proc. KDD, San Francisco, U.S.A., Aug. 2016, pp. 1135-1164.
- (2016) Proc. KDD , pp. 1135-1164
- Marco, T.R.¹ Sameer, S.² Carlos, G.³

23
- 83755163018
- D. N. Reshef, Y A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti, "Detecting novel associations in large data sets," vol. 334, no. 6062, pp. 1518-1524, 2011.
- (2011) Detecting Novel Associations in Large Data Sets , vol.334 , Issue.6062 , pp. 1518-1524
- Reshef, D.N.¹ Reshef, Y.A.² Finucane, H.K.³ Grossman, S.R.⁴ McVean, G.⁵ Turnbaugh, P.J.⁶ Lander, E.S.⁷ Mitzenmacher, M.⁸ Sabeti, P.C.⁹

24
- 0142210563
- no. TR-I-0166M
- M. Abe, Y Sagisaka, T. Umeda, and H. Kuwabara, "ATR technical repoart,", no. TR-I-0166M, 1990.
- (1990) ATR Technical Repoart
- Abe, M.¹ Sagisaka, Y.² Umeda, T.³ Kuwabara, H.⁴

25
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
- Firentze, Italy, Sep.
- H. Kawahara, Jo Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT," in MAVEBA 2001, Firentze, Italy, Sep. 2001, pp. 1-6.
- (2001) MAVEBA 2001 , pp. 1-6
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

26
- 44949143155
- Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
- Pittsburgh, U.S.A., Sep.
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation," in Proc. INTERSPEECH, Pittsburgh, U.S.A., Sep. 2006, pp. 2266-2269.
- (2006) Proc. INTERSPEECH , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

27
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. D. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Communication, Vol. 27, no. 3-4, pp. 187-207, 1999.
- (1999) Speech Communication , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.D.³

28
- 84994252904
- The NAIST text-to-speech system for the blizzard challenge 2015
- Berlin, Germany, Sep.
- S. Takamichi, K. Kobayashi, K. Tanaka, T. Toda, and S. Nakamura, "The NAIST text-to-speech system for the Blizzard Challenge 2015," in Proc. Blizzard Challenge workshop, Berlin, Germany, Sep. 2015.
- (2015) Proc. Blizzard Challenge Workshop
- Takamichi, S.¹ Kobayashi, K.² Tanaka, K.³ Toda, T.⁴ Nakamura, S.⁵

29
- 84862294866
- Deep sparse rectifier neural networks
- Lauderdale, U.S.A., Apr.
- X. Glorot, A. Bordes, and Y Bengio, "Deep sparse rectifier neural networks," in Proc. AISTATS, Lauderdale, U.S.A., Apr. 2011, pp. 315-323.
- (2011) Proc. AISTATS , pp. 315-323
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

30
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- J. Duchi, E. Hazan, and Y Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, Vol. 12, pp. 2121-2159, 2011.
- (2011) Journal of Machine Learning Research , vol.12 , pp. 2121-2159
- Duchi, J.¹ Hazan, E.² Singer, Y.³

31
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol. 9, no. 8, pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.